Ad
  • Custom User Avatar
  • Custom User Avatar

    Great example, lets have a look:

    import itertools as i
    
    i.count("hello") # TypeError: a number is required
    i.accumulate(8, 8) # TypeError: 'int' is not iterable (error references the call site, not deep in some library somewhere)
    i.islice(None, []) # ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize
    

    But maybe that is unfair, since its a builtin library. Lets try another good one, numpy:

    import numpy as np
    
    np.ndarray("foo") # TypeError: 'str' object cannot be interpreted as an integer. (Again referencing the call site, not deep in some library)
    np.ndarray(1).fill(lambda x: x) # TypeError: float() argument must be a string or a number, not 'function' (Once again, referencing call site)
    

    These all obviously thew errors, but most of them were explicit about what went wrong, and all of them threw the error from the call site, not from deep in some library. None of these errors would require delving into any 3rd party source code to debug.

  • Custom User Avatar

    and therefore much prefer a library which handled basic problems (which most good libraries do anyway)

    I'd like to know what those libraries are, because I'm quite sure if you took any popular library, even builtin one's like Python's itertools (which you'd have to agree is quite a good library?), and instead of calling a method with an iterable as expected, you provided it with an integer, or a list of generators that produced tcp socket handles (or any of the millions of terrible examples one could come up with), it would just error out at whatever point in the library code the provided value was used. I don't expect it to first test if my value is of the correct container type, with the correct elements, with the correct length, and that the moon is also currently aligned with Sirius. It simply crashes. And it's not because the library is badly designed, it's because you tried to use it in a way it was not designed to work.

  • Custom User Avatar

    I disagree that you are equipped to deal with it. In real life yes, once you learn how to access the source code, however that is not even a possibility here. The first time, or the tenth time, the best option is still to merely guess at what your code might have done wrong.

    I may not call such a library badly designed, however I would have a much better experience, and therefore much prefer a library which handled basic problems (which most good libraries do anyway). Do you not? I also don't think there are even that many thing which can go wrong. If the library relies on a data having length, then check that it has length. If it relies on it being a specific type, then you check that it is that type.

  • Custom User Avatar

    Yes! And how terribly annoying of an experience that is.

    Indeed, it's not fun, but once you learn about it, you are then suddenly equipped to deal with it. To take this further: would you claim a library is "badly designed" if it doesn't somehow provide you with a beautiful error message politely and accurately pointing out exactly which of the millions of invalid things you specifically did to cause the error? Just how many cases are tests expected to anticipate and preempt? So you check for the correct type/subclass, the correct length, the proper types of values in a container, etc. etc., all before you actually dare try to assess the correctness of a solution. At what point do the tests stop being about checking the correctnes of a solution and more about ensuring the user isn't just throwing random stuff at them and then getting confused when things don't work as expected (whatever "expected" means in that context)?

  • Custom User Avatar

    In both cases the stack trace will point to the exact line where the test code is that is using the user solution.

    I don't think it's generally true, and if it is, it's incidental rather than deliberate. This argument is true only in languages which do so, with testing frameworks which do so, and when the code of tests is (accidentally) built in a way to not erase this information.

    the issue would be identical to using a 3rd party library, e.g. calling an API with invalid arguments. The "crash" will originate from somewhere in the depths of the library, but you should realise that with a high probability, you caused it.

    My general experience is opposite (but YMMV), and I would propose an experiment: let's take a random library (not a C one tho) and feed it with invalid arguments and see how many of them responds with "NullPointerException, bye bye loser!", and how many with "Hey, name cannot be null". I am honestly curious what would the results be.

  • Custom User Avatar

    (on the other hand, the kata is :cough: 4 kyu. So one could also expect from a user attempting that level some minimal degree of autonomy, couldn't they?)

  • Custom User Avatar

    The thing is, at least for submission tests, is that they are not visible to the solver.

    however simply seeing NoneType is not iterable originating from the tests when you are already fairly sure that your code never returns None, is the opposite of helpful.

    In both cases the stack trace will point to the exact line where the test code is that is using the user solution. It is obvious, unless you assume that your solution is perfect and bugfree because you are some kind of devine programmer, and really there's just a bug in the tests. That is the mentality we need to snuff out, though!

    And while yes, I agree that hidden tests like they are presented on Codewars is much different from real life test suites, the issue would be identical to using a 3rd party library, e.g. calling an API with invalid arguments. The "crash" will originate from somewhere in the depths of the library, but you should realise that with a high probability, you caused it. Imagine the run on the issue page of every github repo if this "is it me? No, it's everyone else who is wrong" mentality prevailed in this way outside of platforms like this.

    ...when you are already fairly sure that your code never returns None

    Indeed, how about we force people to be more than merely "fairly" sure before writing complaint letters about defective products? :)

  • Custom User Avatar

    I tend to agree with Hob, while I don't think its a big issue for this kata, ideally the tests should not ever fail.

    you'll have to put on your detective hat and work over your code to see where and what you got wrong.

    The thing is, at least for submission tests, is that they are not visible to the solver. So all that is left to do for the solver is a) try to use the stack trace/other tools to get a look at the test code (which in itself would be a problem), or b) just take guesses at what might be the problem. It isn't merely just an extra step from debugging your own solution, it transforms from debugging a solution to debugging a black box.

    In this particular case, the bug is obvious from the code. However I could easily imagine someone writing a much longer solution, and somewhere accidentally returning None (for example, by mistakenly trying return some_list.reverse()). For this kata, the stack trace combined with the code in the sample tests may be enough to figure it out, however simply seeing NoneType is not iterable originating from the tests when you are already fairly sure that your code never returns None, is the opposite of helpful.

  • Custom User Avatar

    "Index out of bounds" and "Parse errors" are value errors, not type errors, so really a different matter with different considerations.
    I disagree. All of the "Index out of bounds", "Parse error", "X is not a property of null", "sort is not a function" etc have the same root cause: too optimistic assumptions of tests when handling the actual value.

    Considering that this is a platform for developers to hone their skills, the question arises as to just how much handholding we want to provide. After all, a test that fails due to it making valid assumptions about your incorrect return value/type is basically just another type of bug, and really no different from your solution returning incorrect (albeit non-crashing) values: you'll have to put on your detective hat and work over your code to see where and what you got wrong. The error originating from inside a test is then merely a single extra step in the deduction chain.

    Continuing the theme above, what about education? While completely avoiding such errors may be seen as a matter of convenience or UX, it also robs the user of a potential teachable moment, leaving them ignorant of very real pitfalls they might encounter in real life, whilst also possibly reinforcing their instinctive reaction that "the problem lies elsewhere, not with me".

    I understand this argument, but I do not agree with it fully for two reasons:

    • I am not sure I would equate "explicit feedback" with handholding. It is a favorite argument of one prominent author of kata which cause the dashboard to be spammed with "X created an ISSUE for kata Y" type of posts, and my take is: explicit feedback does not take away the requirement of diagnosing the mistake and fixing it. A user still has to debug their solution and fix it, this part cannot be avoided.
    • Continuing the theme above, I think that (with exception of some specific domains) finding bugs in a black box of not your code is not exactly educational. I would argue that usually you have access to whole code (incluing tests) and can dig into it to see why it gets affected by your mistakes. Debugging a black box (of your code or not) is not exactly something a coder usually does, especially in context of testing.
  • Custom User Avatar

    I don't mind continuing this, so here's a few more points:

    • "Index out of bounds" and "Parse errors" are value errors, not type errors, so really a different matter with different considerations.
    • Considering that this is a platform for developers to hone their skills, the question arises as to just how much handholding we want to provide. After all, a test that fails due to it making valid assumptions about your incorrect return value/type is basically just another type of bug, and really no different from your solution returning incorrect (albeit non-crashing) values: you'll have to put on your detective hat and work over your code to see where and what you got wrong. The error originating from inside a test is then merely a single extra step in the deduction chain.
    • Continuing the theme above, what about education? While completely avoiding such errors may be seen as a matter of convenience or UX, it also robs the user of a potential teachable moment, leaving them ignorant of very real pitfalls they might encounter in real life, whilst also possibly reinforcing their instinctive reaction that "the problem lies elsewhere, not with me".
    • Perhaps a middle-ground approach would be to ensure that the sample tests catch such errors. E.g. if the test suite is going to perform a sort on the user's return value, the sample tests should do so too. A glance at the sample tests would then make it obvious enough why a test crashed when you return something that can't be sorted.
    • "What bothers me more are some kata which are either not that clear ("return a regex" might mean a string expression, or an actual object)": This is an issue with bad descriptions, and another topic altogether. My argument here is that perhaps instead of trying to get authors to diligently check a return type before doing any other kind of analysis on every test ever authored, a first step would be to get authors to finally stop being ambiguous in their descriptions and provide a proper, exhaustive spec. Perhaps we could finally put an AI to good use: "Hey, it looks like you just spent 20 minutes writing a completely irrelevant, four paragraph intro story for your kata. Might I suggest you first provide a complete specification about the inputs and outputs of your challenge?". This would extend to every user participating in the beta process, and make it "obligatory culture" that they demand proper descriptions before allowing a kata to publish.
  • Custom User Avatar

    I think I sidetracked the discussion here and I argue the general approach too much, and disconnected it from this specific kata and this also causes some misunderstandings. I am sorry for that. But if you let me continue... :)

    The "if the user caused it, they have to deal with it" is not bad on itself, but it's tricky: someone, somehow, has to estimate what caused the problem. If you additionally consider the fact that a reporter was not able to get things right and caused a problem in the first place, their judgement on the cause of the problem might be not really good (especially if it's supported by feedback pointing outside of user solution). As a result, it induces a support event, someone else has to find out the cause, and this can be more expensive to handle than preemptively detecting the problem in the first place.
    Main premise behind my stance of "tests should not crash" is based on assumptions that:

    • a user caused the crash in the first place, so they cannot be relied to be able to diagnose it,
    • crashes are easy to prevent (I would hope it boils down to adding a single line with a single assertion), and
    • one time effort of making tests better pays off when compared to potentially many (i.e. more than one) support requests. Adding a (figuratively) single line of assert_equals(actualtype, expectedtype, "Unexpected type {actualtype}") is literally smaller effort than answering two questions.

    If any of the above does not hold for some specific case, then sure, it can be treated as out of scope of this argument.

    Chrono's example of "you know it won;t work from the start" is a good one, and he's right: it can be totally obvius (logically, or from specs, or whatever) that you need to return a list. Then fine, such questions can be replied just with "Hey, you need to return a list, cant you read?" and I'd be fine with that. What bothers me more are some kata which are either not that clear ("return a regex" might mean a string expression, or an actual object), or attempt to perform some deeper analysis (for idx in 0..len(expected): verify(actual[i]) fails with "index out of bounds", or actual.split(' ').map(parseInt) fails with "Invalid format"). My idea is that in such cases, preventing a crash is a one-time effort which saves our time later, and it makes the "the maintainence burden will increase" not true in the long run.

  • Custom User Avatar

    You know it won't work from the start.

    You know, he knows, I know. But CoolNewb420 doesn't, which was kind of the argument I made above: at some point, a user might be expected to be able to figure this out on their own. The root of the issue is in their code, even if the tests/trace doesn't make this explicit. The cynic in me wants also wants to rant about this general "If something goes wrong, first blame everyone else before even considering to look at my own code, and even then I can't spot any problems so I know my code is perfect, which is ironic because the issue arose precisely because I was to ignorant and unexperienced to see what I was doing wrong."

    That said, Hobs does make a few good points. Perhaps I am indeed too stuck on this "if the user caused it, they have to deal with it" mentality. After all, nowhere in the above list of arguments has there ever been mention of a detriment to providing more helpful, detailed tests, so really, it can only be beneficial (though the maintainence burden will increase)

  • Custom User Avatar

    Wait a minute, because I want to make myself clear:

    def permutations(s):
        print(1)
    

    Writing that, and expecting something else not breaking is wishfull thinking. At least, return an empty list, you can see the expected type in the description and in the sample tests, why would you not do that? That code looks more lazy writting than the tests. You know it won't work from the start.

  • Custom User Avatar

    Perhaps the best guidance would be to consider how real-world unit tests are designed: do they also test explicitly for a return type before performing any other test?

    Unfortunately, I am afraid that I have not enough experience with poorly typed languages to be qualified to answer this question. However what I would imagine:

    • At the moment of writing, the "real life" tests would be verified against some most probable kinds of mishaps, like returning None, and verified if feedback is clear enough. Even if a coder forgets something, then it's not a major issue because they can be improved later (but see below).
    • If tests have been already running for some time and worked well, and then an error popped up, then things would depend on how easy it is to diagnose a problem from the symptoms. If it's an unexpected stacktrace but it's not necessary to dig too deep in search of a cause, then fine, leave it as it is. If the message from tests forces you to read code of tests just to diagnose what can be a problem with the tested function, then it's not great. I think that in such case, it would be advised to improve feedback of tests. The sooner you find bugs in tested code, the better.
    • "Real life tests" are different from CW tests that usually you can read their code and see them as a starting point for diagnosing problems with the tested function. It's not the case on CW.
    • "Real life tests" are a subject of maintenance like any other (i.e. production) code: when they are not sufficiently helpful, you improve them. Contrary, on CW there seems to be some kind of resistance against maintenance of content in general.
    • "Real life tests" serve a bit different purpose than tests on Codewars, so practices can differ. For example, random tests do not necessarily have to give a pedantically detailed feedback if potential issues are clearly reported by fixed tests or property tests.

    As an anectode, I once heard someone saying that "dynamically typed languages are not bad, you just need to write many many many many more tests". But maybe someone actually working with them would have some insight to share.

  • Loading more items...