MEASURES OF CENTRE AND SPREAD

 The median, range and interquartile range

The most versatile statistical tools for numerically describing the centre and spread of a distribution are:

 the median (the middle value) as a measure of centre;

 the range (the maximum spread of the data values), and the interquartile range (the spread of the middle half of data values) as measures of spread.

While these statistical values (median, range, interquartile range) could be estimated only approximately from a histogram, they can be determined exactly when we use either a dot or a stem plot.

                                         MEDIAN


Activity 1: Finding the median value in a data set.

Order each of the following data sets, locate the median, and record its value.

(a)  2 9 1 8 3 5 3 8 1 

For an odd number of data values, the median will be the middle data value.
Write down the data set in order:  1 1 2 3 3 5 8 8 9
Locate the middle data value by eye or use the rule:
1 1 2 3 3 5 8 8 9
Write down the median:  Median = 3

(b) 10 1 3 4 8 6 10 1 2 9
For an even number of data values, themedian will be the average of  the two middle data values.
Write down the data set in order:  1 1 2 3 4 6 8 9 10 10
Locate the two middle data values and find their average or use the
rule:
1 1 2 3 4 6 8 9 10 10
Median is the average of the 5th and 6thvalues
Write down the median: 

Activity 2: Finding the median value from a dot plot
The dot plot below displays the age distribution (in years) of the 13 members of a local cricket team.

Determine the median age of these cricketers and mark its location on the dot plot.

The median value is the middle data value in the dot plot.

- Locate the middle data value (or use the rule) and identify it on the dot plot. 


Write down its value:   Median = 22 years

Activity 3: Finding the median value from a stem plot

The stem plot below displays the maximum temperature (in ◦C) for 12 days in January.
Determine the median maximum temperature for these 12 days.

- For an even number of data values, as in this example, the median will be the average of the two middle data values.

- Locate the two middle data values in the dot plot by eye (or use the rule) and identify them on the plot.


 

- Determine the median by finding the average of these two data values.



Multiple-choice questions

Q1. 



Q2. 



HOMEWORK 

Q3. 

Q4.
Q5. 
Q6.
                                                                  RANGE

Example 1: 

SOLUTION


NOTE: Because the range depends only on the two extreme values in the data ( 18 and 33), it is not always an informative measure of spread. For example, one or the other of these two values might be an outlier. Furthermore, any data with the same highest and lowest values will have the same range (15) , irrespective of the way in which the data are spread out in between. 

A more refined measure of spread that overcomes these limitations of the range is the interquartile range (IQR).
               
                     INTERQUARTILE RANGE 

We can interpret the interquartile range as follows:
 Since Q1, the first quartile, is the median of the lower half of the observations, then it follows that 25% of the data values are less than Q1, and 75% are greater than Q1.
 Since Q3, the third quartile, is the median of the upper half of the observations, then it follows that 75% of the data values are less than Q3, and 25% are greater than Q3.
 Thus, the interquartile range (IQR) gives the spread of the middle 50% of data values.

Example 2: 

Solution
- There are 18 values in total. This means that there are nine values in the lower ‘half’, and ninein the upper ‘half’.
- The median of the lower half (Q1) is the middle of lower nine values, which is the 5th value from the bottom.
- The median of the upper half (Q3) is the middle of the upper nine values, which is the 5th value from the top.
- Determine the IQR using IQR = Q3 − Q1.

Example 3: 

Solution
To check that these quartiles are correct, write the data values in order, and mark the median and the quartiles. If correct, the median divides the data set up into four equal groups.


Example 4: 

Solution

Question: Why is the IQR a more useful measure of spread than the range?
The IQR is a measure of the spread of a distribution that includes the middle 50% of observations. Since the upper 25% and lower 25% of observations are discarded, the interquartile range is generally not affected by the presence of outliers.

EXERCISES 
Q1. 
Q2.

Q3.

Q4.


Comments

  1. For 25 (odd number) of data values, the median will be the middle data value, which is the 13th value: Median = 28

    ReplyDelete
    Replies
    1. Good answer.
      Please continue with another question.

      Delete
  2. Replies
    1. Unfortunately, your answer is wrong!
      Please redo.

      Delete
  3. Q1: 28
    Q2: 55
    Q3a: 5
    Q3b: 12
    Q4: 1
    Q5: 14
    Q6:

    Mean A ≈ 42.8, Median A = 37

    Mean B ≈ 39.4, Median B = 37

    ReplyDelete
    Replies
    1. Almost all answers are RIGHT, except for question 2. The right answer is option B ( Median is approximately 53)

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Question 1: The median is the 13th value of the data. It's 28. Option is C.

    ReplyDelete
  6. Question 2: The median of boxplot A is closest to 53.

    ReplyDelete
  7. Question 6b: The mean of data set A is larger than the mean of data set B. This is due to the last value being 96 in data set A, compared to 66 in data set B. The medians of the two data sets are the same, as each data set has the same number of data values and all data values are the same, except the last value.

    ReplyDelete
  8. Question 6c: The mean or median can be used as a good measure of central tendency when there are no outliers or extreme values. The median is best to use when there are outliers or extreme values in the data.

    ReplyDelete
  9. The median is the 17th = (33 + 1)/2 th = 14 score when the data is ordered from lowest to highest.

    ReplyDelete
  10. Q5: The median is the 17th = (33 + 1)/2 th = 14 score when the data is ordered from lowest to highest. My opition is C.

    ReplyDelete
  11. Good answer.
    You can continue with other questions.

    ReplyDelete
  12. Q3a: Median = (4+6)/2 = 5
    Q3b: Median = 12

    ReplyDelete
  13. Is there an outlier for the data set A?

    ReplyDelete
    Replies
    1. Yes, the outlier is 96. Why? Could you please provide your explanation?
      How about the data set B?

      Delete
  14. Q1. median 2, Q1.1 Q3:3, IOR: 2, Rane:7

    ReplyDelete
    Replies
    1. You gave the right answers. But Range = 7 ( not Rane)

      Delete
  15. Q1

    Median:2.5
    Q1: 2
    Q3: 4
    IQR: 2
    Range: 7

    Q2

    Shape: Right-skewed
    Median: 26
    Q1: 17.5
    Q3: 30.5
    IQR: 13
    Range: 29

    Q3

    Shape: Right-skewed, outlier at 6
    Median: 1
    IQR: 2


    ReplyDelete
  16. Q1. (a) Shape of the distribution: approximately symmetric with no outliers. Why? Can you tell me why there are no outliers?
    (b),(c),(d) all RIGHT.

    ReplyDelete
  17. Q3. Your answers are not good. See again
    (a) positively skewed with a possible outlier at 6.
    (b) M=0
    (c) IQR = 1

    ReplyDelete
  18. a) 5.09.9
    b) Maximum value of the IQR : 19.9

    ReplyDelete

Post a Comment

Bình luận của bạn sẽ được duyệt trước khi đăng.