6 kyu
Earth-mover's distance
18 of 96geoffp
Loading description...
Fundamentals
Algorithms
Mathematics
Arrays
View
This comment has been reported as {{ abuseKindText }}.
Show
This comment has been hidden. You can view it now .
This comment can not be viewed.
- |
- Reply
- Edit
- View Solution
- Expand 1 Reply Expand {{ comments?.length }} replies
- Collapse
- Spoiler
- Remove
- Remove comment & replies
- Report
{{ fetchSolutionsError }}
-
-
Your rendered github-flavored markdown will appear here.
-
Label this discussion...
-
No Label
Keep the comment unlabeled if none of the below applies.
-
Issue
Use the issue label when reporting problems with the kata.
Be sure to explain the problem clearly and include the steps to reproduce. -
Suggestion
Use the suggestion label if you have feedback on how this kata can be improved.
-
Question
Use the question label if you have questions and/or need help solving the kata.
Don't forget to mention the language you're using, and mark as having spoiler if you include your solution.
-
No Label
- Cancel
Commenting is not allowed on this discussion
You cannot view this solution
There is no solution to show
Please sign in or sign up to leave a comment.
python:
tests are affected by solutions mutating input, and should change to decorator-style and import test lib/solution
I would have submitted a fork, but python (same as all the other languages) are directly added from within the kata editor so I'm not sure how I'd go about that.
Kata forks work well no matter if language was added with a translation or with editor. That was one of the reasons of introducing the functionality of forking a current state of kata.
I've submitted an update with various touchups
Thanks for doing this.
R version has no random tests
The random tests are generated using a fixed pseudo-random seed, so they are the same every time. This ensures that any bugs in your code will be reproducible.
This comment has been hidden.
This is a conversation I've had several times before on Codewars. For software testing in general, unpredictable unit tests are a really bad idea - they lead to bugs that occur only sometimes, unpredictably. Such bugs can be very hard to find and fix. But in an adversarial context like Codewars, something is needed to prevent the adversary easily defeating the system. There are some compromise solutions possible, which I've experimented with in my other kata. But for a simple kata like this one, I'll concede the point and switch to using unpredictable random seeds.
Since you've had this conversation before, I don't want to make you repeat yourself. However, if you could point me to a more detailed conversation you've had, I'd be interested in reading it.
On Codewars, I find that the benefits of random tests outlined in the docs most often far outweigh the downsides I've seen, so I don't know if there is something outside my experience that you are talking about. In short, I could understand that unpredictable unit tests could be bad for software testing in principle, but which of those downsides readily transfer to the Codewars specific case in practice?
My Optical Character Recognition kata had some discussion of this about 3 years ago.
As for practical Codewars-specific downsides: I actually came across one only yesterday, while changing the R version of this kata. I removed the call to
set.seed()
, checked that everything still seemed to be working (with different test cases on each submission), and hit "Re-publish" - only to be told that the kata couldn't be published because my solution was failing tests. (Wait, what?) It turned out that the test code itself contained a rarely-occurring bug (arising from the way thesample()
function in R behaves differently when its main argument is a vector of length 1). This hadn't been an issue before, because it didn't affect any of the 512 random test cases generated by the fixed random seed I was using. But with a different seed on each submission, you occasionally get test cases that are affected, causing a crash. It was sheer good luck that such a test case was generated by Codewars' (one and only!) final check before publication. If that hadn't happened, then some hapless codewarrior would have had to discover the problem, slowly and painfully realize that the problem wasn't in their own code, get frustrated, raise an issue on the discussion board, etc. - all the stuff that unit testing is supposed to prevent.Having unpredictable random test cases makes it harder to create good kata that work reliably, because the users will be exposed to test cases that the kata author never used or considered when developing the kata. Often this doesn't cause any problems, but sometimes - as I was reminded yesterday - it does.
Another Codewars-specific problem relates to execution time. With some kata (7x7 Skyscrapers is a good example) randomly-generated test cases can vary widely in required execution time. Big-O is formally defined to be for the worst case, but what Codewars measures is more like the best case, because you can keep re-trying until you get a test set that doesn't include any hard examples. I've certainly had the experience of writing a not-really-good-enough kata solution and getting it to pass anyway by repeatedly hitting "Attempt" until it gets a problem set that it can solve within the required 12 seconds.
There are some compromise solutions. You can have a fixed set of test cases, but present them in an unpredictable random order; this at least makes cheating a bit more difficult. Or you can minimize the number of unpredictable test cases (only a few are really needed to catch the cheaters). It would also help if the Codewars test frameworks all stopped execution after the first failed test, instead of trying all the tests and showing what all the expected answers are.
In the end, I guess it comes down to what the purpose of Codewars is. If it's just to help people learn to program, then perhaps cheating doesn't matter too much and we should emphasize reliable, well-tested, maintainable code. But if the value of 1kyu status goes beyond bragging rights with your friends (if it's helping people get real-world jobs, for example), then it needs to be as difficult as possible to defeat.
Anyway, thanks for your interest. If you have any further thoughts, please post them here.
Thanks for the detailed response. I am acutely aware of the execution time issue, since this effectively shelved a kata I had made. The algorithm (which, judging by your OCR kata, you are likely familiar with) I was promoting is quite sensitive to differing inputs and Voile found an alternative algorithm which can usually solve the problem, but can be made to time out or fail. However, this also resulted in the reference algorithm timing out occasionally. If I used a set random seed to find a quickly executing input set, then I could probably get around this, but I suppose this could (unfairly?) promote my (standard but still arbitrary) initialization.
The
sample()
bug is an interesting case as well. I've had "fun" with making translations when I've come across these little quirks in a language. They can definitely be frustratingly educational.As a somewhat relevant aside... I recently solved a coding problem on an educational platform which uses fully fixed tests. When I found that codewars also had a kata with the same problem, I tried the algorithm that I had come up with. While it passed all the tests on the other site, it failed here. It turns out there were some edge cases which their test suite had missed, but the random tests on codewars found.
I guess finding the right balance isn't as straightforward as I had initially thought. Cheers!
I would argue that this actually helps to contribute to making better kata. Occasionally people run into situations which was never considered by the author, they raise it as an issue, it gets fixed and now the kata is better. While it is true that its not great that rarely a user will run into this kind of bug, I also think its not great to have (rare) kata which are actually incomplete due to author solutions simply not testing inputs which they fail on.
I have to admit that I generally agree with Kacarott here, but I do greatly value the insight from geoffp. Indeed, I used it to reconfigure the testset on the kata I mentioned above that I had made and given up on. I think now it could work, so I republished it, albeit with a variation of the ideas that we've been discussing. Namely I used a set of favorable seeds from which one is chosen at random to run the testset. The seeds take out the extreme sensitivity of the algorithm to differring inputs, which allows the alternative algo to consistantly fail.
It's still not the strongest way I could make the tests, but I think it could be a good alternative to what I had before, while still requiring relatively little extra time in test production.
Somewhat off-topic, but I really like to read detailed and insightful exchanges like this one. If you power users could make more of it, it would be really great (to me, at least).
This is for C: While the probability values look ok, the position values of test #4 in Random_Tests are consistently wrong and seem to be very large and not in order.
11,8 {(-16807268829305383559342375576237254726043224244224.000000, 0.054688), (-16441893419972657829791454368058183971129241108480.000000, 0.109375), (24845527834625349609462642156176811334150853230592.000000, 0.015625), (-8769009823985417509222108996297698117935595257856.000000, 0.234375), (28499281927952606904971854237967518883290684588032.000000, 0.132812), (365375409332725729550921208179070754913983135744.000000, 0.046875), (24480152425292623879911720947997740579236870094848.000000, 0.070312), (28133906518619881175420933029788448128376701452288.000000, 0.085938), (10595886870649046156976715037193051892505510936576.000000, 0.070312), (-2923003274661805836407369665432566039311865085952.000000, 0.039062), (-8038259005319966050120266579939556608107628986368.000000, 0.140625)}
{(-13884265554643577722935005910804688686731359158272.000000, 0.109375), (-7307508186654514591018424163581415098279662714880.000000, 0.156250), (-21191773741298092313953430074386103785011021873152.000000, 0.031250), (-12788139326645400534282242286267476421989409751040.000000, 0.234375), (-30326158974616235552726460278862872657860600266752.000000, 0.078125), (-5115255730658160213712896914506990568795763900416.000000, 0.171875), (-25210903243958075339013563364355882089064836366336.000000, 0.171875), (-2192252455996354377305527249074424529483898814464.000000, 0.046875)}
Those numbers are correct. The test case just has some big numbers in it, that's all. True, they aren't sorted into any kind of order -- but the kata description never promised they would be.
This comment has been hidden.
R translation is broken.
Test cases include random
NA
values inpx
andpy
, and there is no consistency in the expected test result. Sometimes the test wants the function to returnNA
other times it seems to want a missingpx
value to be treated as1
.Nowhere in the description is it suggested there may be
NA
values in the test cases, if you intend to include them you must indicate the proper way to handle them.Thanks for giving the R translation a try! The bug that was creating the NA values should be fixed now.
This comment has been hidden.
Your code changes the content of the arrays
x
,y
,px
, andpy
. Arrays in Javascript are passed by reference, so this can cause problems if the code that called your function needed that data for other purposes.I've just tweaked the test setup so that it doesn't use those arrays for anything after calling your function; your solution should pass now.
In general, though, if you are passed an array, it's safest to avoid changing its content unless you know for sure that it's not being used anywhere else.
This comment has been hidden.
This comment has been hidden.
I used that same algorithm, just need to tweak for the general case.
In C++, is there some particular reason to have params passed as const value? Why not const reference, or just mutable value?
Sorry, my mistake. It was meant to be const reference.
Java has no sample Tests.
Maybe mention in the description that the
x
andy
are not sorted.Thanks, the Java sample tests are there now. Some of them have the
x
andy
not sorted, so anyone who skips the sorting isn't going to get far.This comment has been hidden.
First, you can make your code look good in comments using markdown formatting
Regarding your code, while I do agree that 6kyu is a bit low for this kata, it's not a 'math problem' requiring a 'formula' at all. In fact, at a glance your approach is valid, and you should consider ways it could be made faster.
This comment has been hidden.
Maybe test.expect would be better than test.assert_equals for the last test prohibiting the use of the module.
Done. Thanks for the suggestion.
I'll admit I was disconcerted to see my kata approved so soon. I'd have preferred to leave it in beta for a while longer to see some solutions in other languages and gather opinions on the right rank for it. I'd thought 5-6 kyu, but wouldn't have been put out if it ended up 4 kyu.
I knew about the existing R package that solves this problem, but wasn't aware of the Python module. Evidently some people did, though; I guess that's the beta process working as intended. By the time I'd added a test to block the use of the library in Python, the kata was already approved.
The reason this kata appears to be from 2017 is that I had a couple of draft kata that I'd been using for casual experimentation and trying stuff out. Then in 2020 I decided I didn't need both of them, and turned one of them into the Earth-mover's distance kata.
To prevent exactly such cases of premature approval, I usually create an issue in my kata with comment like "do not resolve this issue, kata is not ready yet and I want more feedback". This usually stops accept-trigger-happy users overly enjoying their new approval priviledges to click buttons blindly.
This is only a math problem,don't need to know it and solve it.waste time!
=> is math mere timewaste?
O_O
when it end's up ranked 6 kyu while it should be around 4, yes...
Remember this?
2017 kata, estimated rank = 5, community feedback rank = 6. No |ZEDCWT, mrtp0| feedback
While approving libraries weren't disabled => I thought author didn't want to disable that => Just a library kata. (mark this as spoiler)
lol...
Except that 1 week ago, there weren't any rank suggestion and you perfectly know it since you're the first to have completed it. So it's just like it's from 2020. You're dishonest to a point it makes me sick...
And I'm curious about who are the voters...?
(eager to see your flagged message...)
Good for you if you know about that. I don't.
unflagged. :D
and why do I feel like remembering a blue estimation on this, then...?
Oh, B4B raging, now that's a sight. :o
it;s usual now a days ;)
Haven't looked at comments much recently. ¯\_(ツ)_/¯
Maybe difficulty across languages differs alot, but I used the simplest of algorithms to solve this kata in JS in 3 seconds. It does look 6 kyu to me, at least in JS.
In the description, you write
(1/6) * (2-1) + (2/6) * (4-2) + (1/6) * (5-3) = 1
.But if I'm not wrong this is:
1/6 * 1 + 2/6 * 2 + 1/6 * 2 = 1/6 + 4/6 + 2/6 = 7/6 != 1
Fixed.
Thanks.