Measures of Dispersion: Variance and Standard Deviation
Introduction to Measures of Dispersion
In statistics, dispersion refers to the spread or variability of a dataset. Measures of dispersion help us understand how much the data deviates from the central value (such as the mean). The two most important measures of dispersion are:
- Variance
- Standard Deviation
These measures provide insights into the consistency, reliability, and distribution of data points.
1. Variance
Definition
Variance measures how much each data point in a dataset deviates from the mean. It quantifies the overall spread of data and is represented as σ² (sigma squared) for a population and s² for a sample.
A higher variance indicates that data points are spread out, while a lower variance suggests that data points are closer to the mean.
Formula for Variance
A. Population Variance (σ²)
For a population dataset of size NN: σ2=∑(Xi−μ)2N\sigma^2 = \frac{\sum (X_i – \mu)^2}{N}
Where:
- σ2\sigma^2 = Population variance
- XiX_i = Each data point
- μ\mu = Population mean
- NN = Total number of values in the population
B. Sample Variance (s²)
For a sample dataset of size nn: s2=∑(Xi−Xˉ)2n−1s^2 = \frac{\sum (X_i – \bar{X})^2}{n – 1}
Where:
- s2s^2 = Sample variance
- XiX_i = Each sample data point
- Xˉ\bar{X} = Sample mean
- nn = Total number of sample values
Note: We divide by n−1n-1 instead of nn because this accounts for degrees of freedom, making the estimate unbiased for the population variance.
Step-by-Step Example of Variance Calculation
Example: Find the variance of the dataset: {5, 7, 9, 10, 12}
Step 1: Find the Mean ( Xˉ\bar{X} )
Xˉ=5+7+9+10+125=435=8.6\bar{X} = \frac{5 + 7 + 9 + 10 + 12}{5} = \frac{43}{5} = 8.6
Step 2: Find the Squared Differences from the Mean
Value (XiX_i) | Difference from Mean (Xi−XˉX_i – \bar{X}) | Squared Difference ((Xi−Xˉ)2(X_i – \bar{X})^2) |
---|---|---|
5 | 5 – 8.6 = -3.6 | (-3.6)² = 12.96 |
7 | 7 – 8.6 = -1.6 | (-1.6)² = 2.56 |
9 | 9 – 8.6 = 0.4 | (0.4)² = 0.16 |
10 | 10 – 8.6 = 1.4 | (1.4)² = 1.96 |
12 | 12 – 8.6 = 3.4 | (3.4)² = 11.56 |
Step 3: Find the Variance
s2=12.96+2.56+0.16+1.96+11.565−1s^2 = \frac{12.96 + 2.56 + 0.16 + 1.96 + 11.56}{5 – 1} s2=29.24=7.3s^2 = \frac{29.2}{4} = 7.3
Thus, the sample variance is 7.3.
Advantages of Variance
✅ Shows how much data varies from the mean.
✅ Uses all data points in the calculation.
✅ Important in machine learning and risk analysis.
Disadvantages of Variance
❌ Units are squared, making interpretation harder.
❌ Sensitive to outliers.
❌ Not in the same units as the original data.
2. Standard Deviation
Definition
Standard deviation (σ for population, s for sample) measures the average deviation of data points from the mean. It is simply the square root of variance and is often preferred because it has the same units as the data.
A higher standard deviation means greater variability, while a lower standard deviation means data points are closer to the mean.
Formula for Standard Deviation
A. Population Standard Deviation (σ)
σ=∑(Xi−μ)2N\sigma = \sqrt{\frac{\sum (X_i – \mu)^2}{N}}
B. Sample Standard Deviation (s)
s=∑(Xi−Xˉ)2n−1s = \sqrt{\frac{\sum (X_i – \bar{X})^2}{n – 1}}
Step-by-Step Example of Standard Deviation Calculation
Using the same dataset: {5, 7, 9, 10, 12}
From our previous calculation, sample variance s2s^2 = 7.3.
Now, we find standard deviation: s=7.3=2.7s = \sqrt{7.3} = 2.7
Thus, the sample standard deviation is 2.7.
Advantages of Standard Deviation
✅ Easier to interpret than variance because it is in original units.
✅ Widely used in finance, business, science, and machine learning.
✅ Helps measure data consistency and risk.
Disadvantages of Standard Deviation
❌ Sensitive to outliers.
❌ Does not work well for skewed distributions.
Comparison: Variance vs. Standard Deviation
Feature | Variance (σ², s²) | Standard Deviation (σ, s) |
---|---|---|
Definition | Average squared deviation from the mean | Square root of variance |
Formula | σ2=∑(Xi−μ)2N\sigma^2 = \frac{\sum (X_i – \mu)^2}{N} | σ=σ2\sigma = \sqrt{\sigma^2} |
Units | Squared units (e.g., cm², kg²) | Original units (e.g., cm, kg) |
Interpretation | Harder to interpret due to squared units | Easier to interpret |
Use Cases | Mathematical models, variance analysis | Risk analysis, stock market, quality control |
Real-World Applications of Variance and Standard Deviation
📌 Finance: Used to measure market volatility (higher standard deviation means higher risk).
📌 Quality Control: Used in manufacturing to maintain consistent product quality.
📌 Machine Learning: Helps in feature scaling and model performance evaluation.
📌 Education: Analyzing student performance consistency in exams.
📌 Sports: Measuring performance variability of athletes.