Beta

Remove outliers

Description:

You are given an array of numbers, some of which are outliers. Your goal is to remove all outliers and return the result.


Detecting outliers

An outlier is defined here as a number that is:

  • Greater than the calculated upper limit UQ + 1.5(UQ - LQ)
  • Less than the calculated lower limit LQ - 1.5(UQ - LQ)

Where LQ is the lower quartile and UQ is the upper quartile.

Note: If an element is equal to the upper or lower limit, it is not an outlier!

More about Quartiles

Quartiles separate the data set into four equally-sized quarters. It is an extension of the concept of the median, which splits the data into two equally-sized halves. When each half of the data is split in half again, you end up with four quartiles.

For example, consider the data set:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] // length = 11

The median of the entire data set is the "middle" value of the sorted array. Splitting the data in half yields the following:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
                ^ Median = 6
                
[1, 2, 3, 4, 5]   [7, 8, 9, 10, 11]

Each of these halves have their own "median", which are the upper and lower quartiles.

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
                ^ Median = 6
                
[1, 2, 3, 4, 5]   [7, 8, 9, 10, 11]
       ^ LQ = 3          ^ UQ = 9

More rules for calculating median and quartiles:

  • If the number of elements in the array is even, the median is the average of the two center elements.
  • If the number of elements in the array is odd, the median is the single center element.
  • If the median is a single element, do not include it in either half when calculating a quartile.
  • If the median is an average of two elements, the two elements which make up the median are included when calculating a quartile.

Here are some further examples you can use for your own tests:

  [1, 2]                    => Median 1.5   LQ 1    UQ 2
  [1, 2, 3]                 => Median 2     LQ 1    UQ 3
  [1, 2, 3, 4]              => Median 2.5   LQ 1.5  UQ 3.5
  [1, 2, 3, 4, 5]           => Median 3     LQ 1.5  UQ 4.5
  [1, 2, 3, 4, 5, 6]        => Median 3.5   LQ 2    UQ 5
  [1, 2, 3, 4, 5, 6, 7]     => Median 4     LQ 2    UQ 6
  [1, 2, 3, 4, 5, 6, 7, 8]  => Median 4.5   LQ 2.5  UQ 6.5

Futher reading on Wikipedia.


The input

You will be given an array of floating-point numbers. As an example:

[1, 4, 2, 5, 1000000, 3, 9]

Note: The median of this data is 4, the lower quartile is 2, and the upper quartile is 9.

The output

You will return an array of floating-point numbers that equals the original array, removing all outliers.

Note: since outlier detection relies on the interquartile range, which will change when elements are removed, you may need to repeat the process more than once to remove all outliers.

You may return the remaining elements in any order.

Statistics
Mathematics

Similar Kata:

More By Author:

Check out these other kata created by tchaflich

Stats:

CreatedFeb 3, 2025
PublishedFeb 3, 2025
Warriors Trained20
Total Skips4
Total Code Submissions42
Total Times Completed8
TypeScript Completions8
Total Stars3
% of votes with a positive feedback rating90% of 5
Total "Very Satisfied" Votes4
Total "Somewhat Satisfied" Votes1
Total "Not Satisfied" Votes0
Total Rank Assessments5
Average Assessed Rank
6 kyu
Highest Assessed Rank
5 kyu
Lowest Assessed Rank
7 kyu
Ad
Contributors
  • tchaflich Avatar
Ad