5 kyu
Mean without outliers
679 of 783kingcobra
Loading description...
Recursion
Statistics
Algorithms
Data Science
View
This comment has been reported as {{ abuseKindText }}.
Show
This comment has been hidden. You can view it now .
This comment can not be viewed.
- |
- Reply
- Edit
- View Solution
- Expand 1 Reply Expand {{ comments?.length }} replies
- Collapse
- Spoiler
- Remove
- Remove comment & replies
- Report
{{ fetchSolutionsError }}
-
-
Your rendered github-flavored markdown will appear here.
-
Label this discussion...
-
No Label
Keep the comment unlabeled if none of the below applies.
-
Issue
Use the issue label when reporting problems with the kata.
Be sure to explain the problem clearly and include the steps to reproduce. -
Suggestion
Use the suggestion label if you have feedback on how this kata can be improved.
-
Question
Use the question label if you have questions and/or need help solving the kata.
Don't forget to mention the language you're using, and mark as having spoiler if you include your solution.
-
No Label
- Cancel
Commenting is not allowed on this discussion
You cannot view this solution
There is no solution to show
Please sign in or sign up to leave a comment.
In Python, it seems I had the same problem as some others as well: In the random tests, my result was sometimes off by 0.01. This happened a few times, every time on only one of the tests. I was able to pass by attempting a bunch of times though...
pls help, i got all test ok, exept 2: "27.99 should equal 27.98", "5.4 should equal 5.39". Get std and mean of sample -> create new_sample (remove outlier) -> check if len(sample) != len(new_sample) call clean_mean(new_sample, cutoff) -> return the result. For example with sample = [1.01, 0.99, 1.02, 1.01, 0.99, 0.97, 1.03, 0.99, 1.02, 0.99, 3, 10] and cutoff = 2 i call clean_mean 3 tiems (get rid of 10, 3, and return mean of [1.01, 0.99, 1.02, 1.01, 0.99, 0.97, 1.03, 0.99, 1.02, 0.99]
This comment has been hidden.
Python: Test should use approximate equality (
test.assert_approx_equals
) instead of rounding +test.assert_equals
when comparing floating point numbers [Doc]python new test framework is required. updated in this fork
Nice Kata, thanks
This comment has been hidden.
Please don't post solutions in discourse. Read this: https://docs.codewars.com/training/troubleshooting/#post-discourse
my soloution
i got it right but it not working
I had some problems with correct result, until I carefully read whole description :) Nice kata! Thx :)
Your python test suite is a little inefficient, specifically this part:
It looks like you want a cutoff to be a real number chosen randomly from a uniform distribution between 2.5 and 5, rounded to 2 decimal points.
Might I suggest you replace that with:
round(random.uniform(2.5, 5), 2)
Thank you! A bit of a hack really, and not a particularly clever one. Your suggestion is much better, I'll change it.
R Translation
Please carefully review and approve. The reference solution is commented, as is the test suite, to help you understand an unfamiliar language (in case you're not familiar with R).
I used the same basic tests as python, and similar parameters for the random tests. Though I chose a different structure to generate the random arguments.
The main difference is that I added a guaranteed outlier 50% of the time (rather than a likely outlier 2% of the time).
Looks good to me! Thank you for taking the time to comment the code. I have some knowledge of R, but your comments were a good help.
Approved
Almost gave up due to the nuisance rounding. All in all, a great Kata.
Thank you for mentioning that! I've updated the description to make it clearer that you are only supposed to round at the end.
I didn't understand a word of what was expected with
cutoff
andoutliers
since I read these lines in Discourses :I think these words should be placed in Description instead of Discourse !
Happy coding! ; ) )
There is a similar explanation in the description, but I agree that this one may be clearer. I'll add it to the description, thanks!
Thank you. (As a non native english speaker nor 'native'-statitician the cutoff-outlier playing game was a real headache to me, I asked g00gle to translate description but it was even harder to understand, and I think there're some non native english statitician in CV)
; ) )
This comment has been hidden.
This comment has been hidden.
You just need to round off your result :)
Well now I feel like an idiot, I reread that paragraph several times to be sure I wasn't missing anything!
The description is pretty long, I must admit... thank you for your patience!
Seems I'm not well awake, today... Or there is some issue?
What!? (I know, I didn't round the result. But rounding leads to
1.2
, so... :/ )Note: information about rounding is wrong, I think: 5.5 is rounded to 1 decimal place, not 2 (at least, that is the way to say it in french... :o ).
You need to perform the process multiple times until there are no outliers.
I think you need to keep repeating the process until your sample set doesn't change any more, only then do you return your mean value.
oh damn... :o
I think it would be usefull to rewrite this sentence of the description:
in
I was trying not to be too explicit on that for a bit more of a challenge, but maybe it is only confusing and not actually challenging in the true sense. I'll update the description, thank you for your observation!
This set of data has a mean and stdev of
14.090909090909092
and28.637229424141385
, which gives a maximum bound of100.00259736333325
.Since the cutoff is 3, according to my calculations the maximum bound would be ≈ 28.673 * 3, or 85.911.
Actually, your calculation of stdev is incorrect.
But you need to add the mean to that bound :P
Otherwise consider this:
The maximum bound of
85.911
will cut off everything.Well, it depends how you look at it. ;)
From my point of view, we are interested only in the standard deviation and the distance of each observation to the mean. For example :
If you don't add the mean to the range, then it's relative to
0
, not to the mean.An outlier is defined as having a large distance to the mean.
Hmm. To recapitulate :
If the cutoff is 3, then any value that is more than 3 standard deviations from the mean must be removed. We first calculate the mean and standard deviation. Then we multiply the standard deviation by 3 to get our actual cutoff value. Then for each value in the sample, we calculate its distance from the mean, i.e. abs(xi - x̅). If this distance is greater than our cutoff value, then the value is an outlier. No?
Yeah, that's correct.
Looks like it's good to go, I debugged my solution code and now it's working. :)
Thanks for the upvote on the kata! :)
@Voile that is a much cleaner way to do what I ended up doing..
:)