How to identify if a dataset is "skewed" just by looking at a graph or its mean and median values?

To determine if a dataset is "skewed" or "symmetric," you can look for two main indicators: the "tail" of the graph and the mathematical relationship between the Mean and the Median.

1. The Visual Method (Looking at the Histogram)

+ Symmetric (Normal): The graph looks like a balanced bell. Both sides are roughly mirror images.

Practice 1With a dataset: {2, 4, 6, 8, 10} 
Median: The middle value is 6.
Mean: (2+4+6+8+10)/5 = 30/5 = 6.
Result: Mean = Median. This distribution is perfectly symmetric.
In a symmetric dataset, the numbers are balanced on both sides of the center.

+ Right-Skewed (Positive Skew): The "tail" stretches out to the right (toward higher values). Most data points are clustered on the left.
Real-world example: Household income. Most people earn a middle/low income, while a few billionaires pull the tail to the right.

Practice 2With a dataset: {10, 12, 13, 15, 100} (100 is a high outlier, like a celebrity's salary). 
Median: The middle value is 13.
Mean: (10+12+13+15+100)/5 = 150/5 = 30.
Result: Mean (30) > Median (13). The high outlier "tricked" the mean into being much higher than what is typical for the group. This is Right-Skewed.
This happens when you have a few very high values that pull the "tail" to the right.

+ Left-Skewed (Negative Skew): The "tail" stretches out to the left (toward lower values). Most data points are clustered on the right.

Real-world example: Age of retirement. Most people retire in their 60s or 70s, with very few retiring much younger.

Practice 3With a dataset: {5, 80, 85, 90, 95} (5 is a low outlier, like a student who missed most of a test). 
Median: The middle value is 85.
Mean: (5+80+85+90+95)/5 = 355/5 = 71.
Result: Mean (71) < Median (85)The low outlier pulled the mean down. This is Left-Skewed.
This happens when you have a few very low values that pull the "tail" to the left.

SUMMARY FOR QUICK CHECKING 



Comments

Popular posts from this blog

MEASURES OF CENTRE AND SPREAD

DATA ANALYSIS: INTERPRETATION

THE MEAN and STANDARD DEVIATION