Ad
  • Custom User Avatar

    Thanks ticking. It's interesting to consider the trade offs here.

    I think this question is lot easier in Haskell where there's a concrete data structure to work with :p

  • Custom User Avatar

    @soapie yes, while count itself does not hold on to the head, A will.

    So A has potentially rather high memory usage, but is fastest for the most widely used collection (vectors), while B is mind numbingly slow on those but has far better memory performance on the most widely used abstraction (seqs).

    The thread is relevant, I think I linked it somewhere as well, but I am not shure.

    So pick your poison ;)

  • Custom User Avatar

    The more I think about it, the less I like A. last will take linear time on a vector. If someone could verify that A leaks and B doesn't on a lazy list then I'll consider B better. And say not to use it on something that you know is a vector, just like you shouldn't use last.

    Edit: found this by searching "clojure last linear time": https://groups.google.com/forum/#!topic/clojure/apkNXk08Xes seems relevant!

  • Custom User Avatar

    I'm not very familiar with Clojure so I could be talking rubbish here but I've thought of one argument in favour of B & C. Ideally we want to get the penultimate element of a lazy sequence using only constant memory. In A I assume counting a lazy sequence requires traversing that sequence but you're still holding onto the head of the sequence so that you can then index into it. Space leak!

    So I don't know if there's a nice solution that's both fast on vectors and memory efficient on lazy sequences (without explicitly asking lst "what are you" I guess).

  • Custom User Avatar

    This comment is hidden because it contains spoiler information about the solution

  • Custom User Avatar

    The point is that lst might not be a linked list. The docstring is "Gets the second to last element of an ISeq". I wasn't suggesting converting lst to a vector so I don't understand the point of your AA. All I can do is second what ticking says above.

    Edit: wrote this before I saw your second reply.

  • Custom User Avatar

    @ticking,

    Could you rerun the benchmarks with (vec (range 1 1000))?

    Because then you'll see why O(1) is preferrable to O(n).

    Here it is:

    (----------------------------------- < AA
    | avg    | 0.0263 ms
    | min    | 0.0 ms
    | max    | 1.0 ms
    | stddev | 0.226322208751621 ms
    | total  | 267 ms
    -----------------------------------
    ----------------------------------- < A
    | avg    | 8.0E-4 ms
    | min    | 0.0 ms
    | max    | 1.0 ms
    | stddev | 0.03998599614851074 ms
    | total  | 10 ms
    -----------------------------------
    ----------------------------------- < B
    | avg    | 0.026 ms
    | min    | 0.0 ms
    | max    | 4.0 ms
    | stddev | 0.22509701656551076 ms
    | total  | 264 ms
    -----------------------------------
    ----------------------------------- < C
    | avg    | 0.0155 ms
    | min    | 0.0 ms
    | max    | 3.0 ms
    | stddev | 0.17472479181640344 ms
    | total  | 160 ms
    -----------------------------------
    nil nil nil nil)
    

    You are right!
    I missed this. In this case of course A will be much faster...

    And now I am confused :)

    So here is what we have:

    A: fastest for vectors, slowest for linked lists

    (AA: same fast for all, but much slower than A for vectors, and still slower than C)

    C: not fastest as can be for vectors, but fastest for linked lists.

    And now comes the question: What would be the most idiomatic clojure way to implement such penultimate function assuming it will live somewhere in clojure.core?

  • Custom User Avatar

    This comment is hidden because it contains spoiler information about the solution

  • Custom User Avatar

    Could you rerun the benchmarks with (vec (range 1 1000))?
    You'll brobably see that A is O(1) while the others are O(n).

  • Custom User Avatar

    Surely, readability factors into "best practices", especially when the difference between (arguably) the most readable solution and the most performant one is an average of around 0.02ms.

  • Custom User Avatar

    Argument for A: I think that your benchmarks would change if the sequence is something that you can index quickly into (like a vector). It also seems like the most natural approach. If someone hands you an unknown data structure that happens to be a sequence and asks for the penultimate member, just index into it and trust that nth will do the right thing for that data structure.

    I agree with you that best practice votes would be more useful if each vote was linked to a comment (so when you vote BP, either you make a new comment or upvote an existing one).

  • Custom User Avatar

    This comment is hidden because it contains spoiler information about the solution