The median, range and interquartile range
The most versatile statistical tools for numerically describing the centre and spread of a distribution are:
the median (the middle value) as a measure of centre;
the range (the maximum spread of the data values), and the interquartile range (the spread of the middle half of data values) as measures of spread.
While these statistical values (median, range, interquartile range) could be estimated only approximately from a histogram, they can be determined exactly when we use either a dot or a stem plot.
MEDIAN
Activity 1: Finding the median value in a data set.
Order each of the following data sets, locate the median, and record its value.
(a) 2 9 1 8 3 5 3 8 1
For an odd number of data values, the median will be the middle data value.
Write down the data set in order: 1 1 2 3 3 5 8 8 9
Locate the middle data value by eye or use the rule:
1 1 2 3 3 5 8 8 9
Write down the median: Median = 3
(b) 10 1 3 4 8 6 10 1 2 9
For an even number of data values, themedian will be the average of the two middle data values.
Write down the data set in order: 1 1 2 3 4 6 8 9 10 10
Locate the two middle data values and find their average or use the
rule:
1 1 2 3 4 6 8 9 10 10
Median is the average of the 5th and 6thvalues
Write down the median:
Activity 2: Finding the median value from a dot plotThe dot plot below displays the age distribution (in years) of the 13 members of a local cricket team.
Determine the median age of these cricketers and mark its location on the dot plot.- The median value is the middle data value in the dot plot.
- Locate the middle data value (or use the rule) and identify it on the dot plot.
- Write down its value: Median = 22 yearsActivity 3: Finding the median value from a stem plot
The stem plot below displays the maximum temperature (in ◦C) for 12 days in January.
Determine the median maximum temperature for these 12 days.
- For an even number of data values, as in this example, the median will be the average of the two middle data values.
- Locate the two middle data values in the dot plot by eye (or use the rule) and identify them on the plot.
- Determine the median by finding the average of these two data values.
Multiple-choice questions
Q1.
Q2. HOMEWORK
Q3.
NOTE: Because the range depends only on the two extreme values in the data ( 18 and 33), it is not always an informative measure of spread. For example, one or the other of these two values might be an outlier. Furthermore, any data with the same highest and lowest values will have the same range (15) , irrespective of the way in which the data are spread out in between.
A more refined measure of spread that overcomes these limitations of the range is the interquartile range (IQR).
INTERQUARTILE RANGE
We can interpret the interquartile range as follows:
Since Q1, the first quartile, is the median of the lower half of the observations, then it follows that 25% of the data values are less than Q1, and 75% are greater than Q1.
Since Q3, the third quartile, is the median of the upper half of the observations, then it follows that 75% of the data values are less than Q3, and 25% are greater than Q3.
Thus, the interquartile range (IQR) gives the spread of the middle 50% of data values.
Example 2:
- There are 18 values in total. This means that there are nine values in the lower ‘half’, and ninein the upper ‘half’.
- The median of the lower half (Q1) is the middle of lower nine values, which is the 5th value from the bottom.
- The median of the upper half (Q3) is the middle of the upper nine values, which is the 5th value from the top.
- Determine the IQR using IQR = Q3 − Q1.
Example 3: To check that these quartiles are correct, write the data values in order, and mark the median and the quartiles. If correct, the median divides the data set up into four equal groups. Question: Why is the IQR a more useful measure of spread than the range? The IQR is a measure of the spread of a distribution that includes the middle 50% of observations. Since the upper 25% and lower 25% of observations are discarded, the interquartile range is generally not affected by the presence of outliers.
EXERCISES
Q1.
For 25 (odd number) of data values, the median will be the middle data value, which is the 13th value: Median = 28
ReplyDeleteGood answer.
DeletePlease continue with another question.
Q1. Median = 30
ReplyDeleteUnfortunately, your answer is wrong!
DeletePlease redo.
Q1: 28
ReplyDeleteQ2: 55
Q3a: 5
Q3b: 12
Q4: 1
Q5: 14
Q6:
Mean A ≈ 42.8, Median A = 37
Mean B ≈ 39.4, Median B = 37
Almost all answers are RIGHT, except for question 2. The right answer is option B ( Median is approximately 53)
DeleteThis comment has been removed by the author.
ReplyDeleteQuestion 1: The median is the 13th value of the data. It's 28. Option is C.
ReplyDeleteQ2 B
ReplyDeleteGood answer. Congrats.
DeleteQuestion 2: The median of boxplot A is closest to 53.
ReplyDeleteQuestion 6b: The mean of data set A is larger than the mean of data set B. This is due to the last value being 96 in data set A, compared to 66 in data set B. The medians of the two data sets are the same, as each data set has the same number of data values and all data values are the same, except the last value.
ReplyDeleteQuestion 6c: The mean or median can be used as a good measure of central tendency when there are no outliers or extreme values. The median is best to use when there are outliers or extreme values in the data.
ReplyDeleteThe median is the 17th = (33 + 1)/2 th = 14 score when the data is ordered from lowest to highest.
ReplyDeleteExactly. Keep it up.
DeleteQ5: The median is the 17th = (33 + 1)/2 th = 14 score when the data is ordered from lowest to highest. My opition is C.
ReplyDeleteGood answer.
ReplyDeleteYou can continue with other questions.
Q3a: Median = (4+6)/2 = 5
ReplyDeleteQ3b: Median = 12
Well done. Keep it up!
DeleteIs there an outlier for the data set A?
ReplyDeleteYes, the outlier is 96. Why? Could you please provide your explanation?
DeleteHow about the data set B?
Q1. median 2, Q1.1 Q3:3, IOR: 2, Rane:7
ReplyDeleteYou gave the right answers. But Range = 7 ( not Rane)
DeleteQ1
ReplyDeleteMedian:2.5
Q1: 2
Q3: 4
IQR: 2
Range: 7
Q2
Shape: Right-skewed
Median: 26
Q1: 17.5
Q3: 30.5
IQR: 13
Range: 29
Q3
Shape: Right-skewed, outlier at 6
Median: 1
IQR: 2
Q1. (a) Shape of the distribution: approximately symmetric with no outliers. Why? Can you tell me why there are no outliers?
ReplyDelete(b),(c),(d) all RIGHT.
Q3. Your answers are not good. See again
ReplyDelete(a) positively skewed with a possible outlier at 6.
(b) M=0
(c) IQR = 1
a) 5.09.9
ReplyDeleteb) Maximum value of the IQR : 19.9