5 kyu

Mean without outliers

679 of 783kingcobra
Description
Loading description...
Recursion
Statistics
Algorithms
Data Science
  • Please sign in or sign up to leave a comment.
  • richardjana Avatar

    In Python, it seems I had the same problem as some others as well: In the random tests, my result was sometimes off by 0.01. This happened a few times, every time on only one of the tests. I was able to pass by attempting a bunch of times though...

  • kit_sho_ets Avatar

    pls help, i got all test ok, exept 2: "27.99 should equal 27.98", "5.4 should equal 5.39". Get std and mean of sample -> create new_sample (remove outlier) -> check if len(sample) != len(new_sample) call clean_mean(new_sample, cutoff) -> return the result. For example with sample = [1.01, 0.99, 1.02, 1.01, 0.99, 0.97, 1.03, 0.99, 1.02, 0.99, 3, 10] and cutoff = 2 i call clean_mean 3 tiems (get rid of 10, 3, and return mean of [1.01, 0.99, 1.02, 1.01, 0.99, 0.97, 1.03, 0.99, 1.02, 0.99]

  • Just4FunCoder Avatar

    This comment has been hidden.

  • Just4FunCoder Avatar

    Python: Test should use approximate equality (test.assert_approx_equals) instead of rounding + test.assert_equals when comparing floating point numbers [Doc]

  • saudiGuy Avatar

    python new test framework is required. updated in this fork

  • transan Avatar

    Nice Kata, thanks

  • Vedanta war Avatar

    This comment has been hidden.

  • Vedanta war Avatar

    my soloution

  • Vedanta war Avatar

    i got it right but it not working

  • Hunter_71 Avatar

    I had some problems with correct result, until I carefully read whole description :) Nice kata! Thx :)

  • mentalplex Avatar

    Your python test suite is a little inefficient, specifically this part:

    cutoff = random.random()
    while cutoff < 0.5:
        cutoff = random.random()
    cutoff = round(cutoff * 5, 2)
    

    It looks like you want a cutoff to be a real number chosen randomly from a uniform distribution between 2.5 and 5, rounded to 2 decimal points.
    Might I suggest you replace that with: round(random.uniform(2.5, 5), 2)

  • mentalplex Avatar

    R Translation

    Please carefully review and approve. The reference solution is commented, as is the test suite, to help you understand an unfamiliar language (in case you're not familiar with R).

    I used the same basic tests as python, and similar parameters for the random tests. Though I chose a different structure to generate the random arguments.

    The main difference is that I added a guaranteed outlier 50% of the time (rather than a likely outlier 2% of the time).

  • Voile Avatar

    Approved

  • KenKamau Avatar

    Almost gave up due to the nuisance rounding. All in all, a great Kata.

  • ZozoFouchtra Avatar

    I didn't understand a word of what was expected with cutoff and outliers since I read these lines in Discourses :

    If the cutoff is 3, then any value that is more than 3 standard deviations from the mean must be removed. We first calculate the mean and standard deviation. Then we multiply the standard deviation by 3 to get our actual cutoff value. Then for each value in the sample, we calculate its distance from the mean, i.e. abs(xi - x̅). If this distance is greater than our cutoff value, then the value is an outlier. No?

    I think these words should be placed in Description instead of Discourse !

    Happy coding! ; ) )

  • Blind4Basics Avatar

    This comment has been hidden.

  • ChristianECooper Avatar

    This comment has been hidden.

  • Blind4Basics Avatar

    Seems I'm not well awake, today... Or there is some issue?

    sample [1.01, 0.99, 1.02, 1.01, 0.99, 0.97, 1.03, 0.99, 1.02, 0.99, 3, 10]
    mean 1.91833333333
    sd 2.49805201885
    cutoff 2
    [1.01, 0.99, 1.02, 1.01, 0.99, 0.97, 1.03, 0.99, 1.02, 0.99, 3]
    
    => 1.1836363636363636 should equal 1.0
    

    What!? (I know, I didn't round the result. But rounding leads to 1.2, so... :/ )

    Note: information about rounding is wrong, I think: 5.5 is rounded to 1 decimal place, not 2 (at least, that is the way to say it in french... :o ).

  • Voile Avatar
    sample = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100]
    cutoff = 3
    test.assert_equals(clean_mean(sample, cutoff), 5.5)
    

    This set of data has a mean and stdev of 14.090909090909092 and 28.637229424141385, which gives a maximum bound of 100.00259736333325.