Skewness and Kurtosis: Understanding Data Distribution
Introduction to Skewness and Kurtosis
When analyzing data, it is essential to understand how the data is distributed. While measures of central tendency (mean, median, mode) tell us where the data is centered and measures of dispersion (variance, standard deviation) tell us about data spread, skewness and kurtosis provide insights into the shape of the data distribution.
- Skewness measures the asymmetry of a dataset.
- Kurtosis measures the tailedness or peakedness of a dataset.
Both metrics help in detecting patterns, outliers, and anomalies in data, making them useful in finance, machine learning, and statistical modeling.
1. Skewness
Definition
Skewness measures the degree of asymmetry in a probability distribution. If data is perfectly symmetric, it has zero skewness. If data is skewed, it means that one side of the distribution is longer or fatter than the other.
Types of Skewness
A. Positive Skewness (Right-Skewed Distribution)
β
Tail extends more on the right side.
β
Mean > Median > Mode.
β
Example: Income distribution (most people earn low salaries, but a few earn extremely high salaries).
π Graph Representation:
*
* *
* *
* *
* *
* * * * * * * *
Example: Income of 100 people where a few earn millions β creates a long right tail.
B. Negative Skewness (Left-Skewed Distribution)
β
Tail extends more on the left side.
β
Mean < Median < Mode.
β
Example: Exam scores (most students score high, but a few fail badly).
π Graph Representation:
* * * * * * * *
*
*
*
*
*
Example: In an easy test, most students score 80-100, but a few get very low marks, creating a long left tail.
C. Zero Skewness (Symmetric Distribution)
β
Data is perfectly symmetrical.
β
Mean = Median = Mode.
β
Example: Normally distributed height data.
π Graph Representation:
*
* *
* *
* *
* *
Example: Heights of people in a population (normally distributed).
Formula for Skewness
Skewness=β(XiβXΛ)3(nβ1)β s3Skewness = \frac{\sum (X_i – \bar{X})^3}{(n-1) \cdot s^3}
Where:
- XiX_i = Each data point
- XΛ\bar{X} = Mean
- ss = Standard deviation
- nn = Number of observations
If:
- Skewness > 0 β Positively skewed
- Skewness < 0 β Negatively skewed
- Skewness = 0 β Symmetric
Example of Skewness Calculation
Given dataset: {10, 12, 15, 18, 22, 40}
- Find the mean (XΛ\bar{X}) XΛ=10+12+15+18+22+406=1176=19.5\bar{X} = \frac{10+12+15+18+22+40}{6} = \frac{117}{6} = 19.5
- Find the standard deviation (ss) s=10.7s = 10.7
- Find skewness using the formula: Skewness=β(XiβXΛ)3(nβ1)β s3Skewness = \frac{\sum (X_i – \bar{X})^3}{(n-1) \cdot s^3} Skewness=1.23(positivelyΒ skewed)Skewness = 1.23 \quad (\text{positively skewed})
Thus, the dataset is right-skewed.
Interpretation of Skewness
Skewness Value | Interpretation |
---|---|
= 0 | Perfectly symmetrical (Normal Distribution) |
> 0 | Right-skewed (Tail on the right) |
< 0 | Left-skewed (Tail on the left) |
2. Kurtosis
Definition
Kurtosis measures the “tailedness” or how extreme values are distributed in the dataset. It helps determine whether data has outliers or is more concentrated around the mean.
Types of Kurtosis
A. Mesokurtic (Normal Distribution, Kurtosis = 3)
β
Moderate tails (neither too thick nor too thin).
β
Example: Heights of people, IQ scores.
π Graph Representation:
*
* *
* *
* *
* *
B. Leptokurtic (Heavy-Tailed, Kurtosis > 3)
β
More extreme values (outliers).
β
Very tall and sharp peak.
β
Example: Stock market returns (occasional extreme crashes and booms).
π Graph Representation:
*
* *
* *
* *
* *
* * * * * * * * * * *
C. Platykurtic (Light-Tailed, Kurtosis < 3)
β
Few extreme values (outliers are rare).
β
Short and broad peak.
β
Example: Uniform distribution, test scores with little variation.
π Graph Representation:
* * * * * *
* * * * *
Formula for Kurtosis
Kurtosis=β(XiβXΛ)4(nβ1)β s4Kurtosis = \frac{\sum (X_i – \bar{X})^4}{(n-1) \cdot s^4}
Where:
- Kurtosis > 3 β Leptokurtic (Heavy tails, many outliers).
- Kurtosis = 3 β Mesokurtic (Normal distribution).
- Kurtosis < 3 β Platykurtic (Light tails, fewer outliers).
Example of Kurtosis Calculation
Given dataset: {2, 3, 5, 7, 9, 50}
- Find the mean (XΛ\bar{X}) XΛ=12.66\bar{X} = 12.66
- Find standard deviation (ss) s=16.9s = 16.9
- Find kurtosis Kurtosis=4.5(Leptokurtic,Β heavy-tailed)Kurtosis = 4.5 \quad (\text{Leptokurtic, heavy-tailed})
This dataset has a heavy tail due to the extreme value (50).
Comparison: Skewness vs. Kurtosis
Feature | Skewness | Kurtosis |
---|---|---|
Definition | Measures asymmetry | Measures tail thickness |
Value Meaning | >0 = Right skew, <0 = Left skew, 0 = Symmetric | >3 = Leptokurtic, <3 = Platykurtic, =3 = Normal |
Effect of Outliers | Affects the direction of the tail | Determines how extreme the tail is |