Draft

Merge within Tolerance

Description
Loading description...
Fundamentals
View
AllIssues3QuestionsSuggestionsShow Resolved
  • Please sign in or sign up to leave a comment.
  • Blind4Basics Avatar

    Hi,

    The description still lacks some corrections (typos, mostly) and one clarification, imo:

    • "The keyword argument" -> considering what you're using in the tests, it's not a "keyword argument", but only a "default argument". It'd be a keyword argument if you were calling the function using ...(..., tol=3), which you're not doing currently (I didn't check the sample tests, tho). That's slightly different in the way that, if you say "keyword", you should check the user actually used tol as argument name. If you don't care about the user changing the argument name in the function declaration, call it a "default argument". Advise: switch to "default"... x) (or use a totally meaningful name => tolerance, but taht's a bit verbose... x) )
    • "The returnED value of the function should be..."
    • In the last sentence before the example, you should state that the join operation is done using the first element of each row as key (general principle: warn the user about what you'll do. Without that, anyone reading the example will start to search for the way you did join the rows and there are pretty much no chances they end up with the correct idea. So you're misleading your audience if that info isn't given in the first place).
    • Reading the description, it's not clear (without studying the example carefully) that the same element from b can be joined several times with different elements of a. => This needs to be added.

    Cheers

    • Fbasham Avatar

      Made most of the changes you recommended and added additiona random tests with small lists. Maybe I'm biased because I'm the author, but I don't see where else I can clarify what needs to be done especially given the fixed tests and the example/description. Thanks for your input and suggestions!

    • Blind4Basics Avatar

      about the last point, maybe just add "(the same row of b may be merged withe different rows of a if it's always the best candidate according to the rules)" or somethig like that?

    • Fbasham Avatar

      Added, but I think I need to get better at writing descriptions and keeping it simple ;p

    • Blind4Basics Avatar

      :+1:

      ;)

      Issue marked resolved by Blind4Basics 5 years ago
  • FArekkusu Avatar

    There should be smaller random tests too because debugging when almost every error message is 20 lines long is very hard.

  • FArekkusu Avatar

    It should be stated whether all options should be taken or only one (and which one exactly), e.g. what's the result of these:

    a = [1, A]    b = [0, X]    t = 1
                      [1, Y]
                      [2, Z]
    
    
    a = [2, A]    b = [0, X]    t = 2
                      [3, Y]
    
    • Fbasham Avatar

      In the first example it would be [1,...] from list b since it has an exact match. For the second example it would be [3,...] since it minimizes the difference and is within tolerance.

      I re-worded the bullet point in the 'Notes' section where it's mentioned.

    • FArekkusu Avatar

      What about identical distances?

      a = [2, A]    b = [0, X]    t = 2
                        [4, Y]
      
    • Fbasham Avatar

      always the first match, so [0,X]

    • FArekkusu Avatar

      This info should be added to the description, and such cases must be tested in the fixed tests (and preferably sample tests).

    • FArekkusu Avatar

      Also what about this?

      a = [2, A]    b = [2, X]
                        [2, Y]
      

      Or are such cases guaranteed to not be generated?

    • Fbasham Avatar

      it would be [2,X]

      I can modify the description and and fixed tests. I'm a little reluctant to change the random tests, but I could add more? The pattern here is that we're minimizing the absolute difference between list a and list b in column 1. If that is minimized and the value is within tolerance, it will always pick the first value it can find. There are likely cases of this in the random tests.

      Edit: By the way, this is mimicking a pandas function and I've checked it's intended behaviour with this function. Both line up ;)

    • FArekkusu Avatar

      The random tests are alright. I've implemented everything except the "pick first closest" scenario and failed 200+ tests. But all this stuff must still be tested in the fixed tests where the input size is small, expected result doesn't change all the time, and different values are used in the last column, e.g.:

      a = [[2, 5]]    b = [[2, 1], [2, 9]]    result = [[2, 5, 1]]
      
      a = [[2, 5]]    b = [[2, 9], [2, 1]]    result = [[2, 5, 9]]
      
    • FArekkusu Avatar

      Also, it'd be good if the example in the description covered as many rules/details as possible, similar to how I rewrote the example in your other kata, and now it shows that: multiple columns are aggregated separately, rows with identical keys don't have to come one after another to be grouped, different combinations of keys are not grouped together.

    • Fbasham Avatar

      That's half the fun of guessing what the hell the author wrote ;p

      I'll let you know I've finished changes.

    • Fbasham Avatar

      Okay, I've changed the fixed tests and implemented the two examples you gave above. Not sure about description. I'm on the fence since it lays out pretty much everything I can think of.

    • FArekkusu Avatar

      There're still no tests like these:

      Equal difference - pick first:
      a = [[2, A]]    b = [[0, X], [4, Y]]    t = 2    result = [[2, A, X]]
      a = [[2, A]]    b = [[4, Y], [0, X]]    t = 2    result = [[2, A, Y]]
      
      Non-equal difference - pick one with small difference:
      a = [[2, A]]    b = [[0, X], [1, Y]]    t = 2    result = [[2, A, Y]]
      a = [[2, A]]    b = [[3, X], [4, Y]]    t = 2    result = [[2, A, X]]
      
    • Fbasham Avatar

      Changed. I think the user should be good to visit the test cases to answer any questions they might have. One thing about long kata descriptions is people generally don't read them and immediately jump into the coding.

    • Blind4Basics Avatar

      (still waiting for a btach of small random tests... => trying another kind of implementation and it's passing the fixed tests but not the random ones)

      EDIT: mmmh... XD Sounds like they are still a bit "too big", then.