Measures of Dispersion (Variance, Standard Deviation)

Loading

Measures of Dispersion: Variance and Standard Deviation

Introduction to Measures of Dispersion

In statistics, dispersion refers to the spread or variability of a dataset. Measures of dispersion help us understand how much the data deviates from the central value (such as the mean). The two most important measures of dispersion are:

  1. Variance
  2. Standard Deviation

These measures provide insights into the consistency, reliability, and distribution of data points.


1. Variance

Definition

Variance measures how much each data point in a dataset deviates from the mean. It quantifies the overall spread of data and is represented as σ² (sigma squared) for a population and for a sample.

A higher variance indicates that data points are spread out, while a lower variance suggests that data points are closer to the mean.


Formula for Variance

A. Population Variance (σ²)

For a population dataset of size NN: σ2=∑(Xi−μ)2N\sigma^2 = \frac{\sum (X_i – \mu)^2}{N}

Where:

  • σ2\sigma^2 = Population variance
  • XiX_i = Each data point
  • μ\mu = Population mean
  • NN = Total number of values in the population

B. Sample Variance (s²)

For a sample dataset of size nn: s2=∑(Xi−Xˉ)2n−1s^2 = \frac{\sum (X_i – \bar{X})^2}{n – 1}

Where:

  • s2s^2 = Sample variance
  • XiX_i = Each sample data point
  • Xˉ\bar{X} = Sample mean
  • nn = Total number of sample values

Note: We divide by n−1n-1 instead of nn because this accounts for degrees of freedom, making the estimate unbiased for the population variance.


Step-by-Step Example of Variance Calculation

Example: Find the variance of the dataset: {5, 7, 9, 10, 12}

Step 1: Find the Mean ( Xˉ\bar{X} )

Xˉ=5+7+9+10+125=435=8.6\bar{X} = \frac{5 + 7 + 9 + 10 + 12}{5} = \frac{43}{5} = 8.6

Step 2: Find the Squared Differences from the Mean

Value (XiX_i)Difference from Mean (Xi−XˉX_i – \bar{X})Squared Difference ((Xi−Xˉ)2(X_i – \bar{X})^2)
55 – 8.6 = -3.6(-3.6)² = 12.96
77 – 8.6 = -1.6(-1.6)² = 2.56
99 – 8.6 = 0.4(0.4)² = 0.16
1010 – 8.6 = 1.4(1.4)² = 1.96
1212 – 8.6 = 3.4(3.4)² = 11.56

Step 3: Find the Variance

s2=12.96+2.56+0.16+1.96+11.565−1s^2 = \frac{12.96 + 2.56 + 0.16 + 1.96 + 11.56}{5 – 1} s2=29.24=7.3s^2 = \frac{29.2}{4} = 7.3

Thus, the sample variance is 7.3.


Advantages of Variance

✅ Shows how much data varies from the mean.
✅ Uses all data points in the calculation.
✅ Important in machine learning and risk analysis.

Disadvantages of Variance

❌ Units are squared, making interpretation harder.
❌ Sensitive to outliers.
❌ Not in the same units as the original data.


2. Standard Deviation

Definition

Standard deviation (σ for population, s for sample) measures the average deviation of data points from the mean. It is simply the square root of variance and is often preferred because it has the same units as the data.

A higher standard deviation means greater variability, while a lower standard deviation means data points are closer to the mean.


Formula for Standard Deviation

A. Population Standard Deviation (σ)

σ=∑(Xi−μ)2N\sigma = \sqrt{\frac{\sum (X_i – \mu)^2}{N}}

B. Sample Standard Deviation (s)

s=∑(Xi−Xˉ)2n−1s = \sqrt{\frac{\sum (X_i – \bar{X})^2}{n – 1}}


Step-by-Step Example of Standard Deviation Calculation

Using the same dataset: {5, 7, 9, 10, 12}

From our previous calculation, sample variance s2s^2 = 7.3.

Now, we find standard deviation: s=7.3=2.7s = \sqrt{7.3} = 2.7

Thus, the sample standard deviation is 2.7.


Advantages of Standard Deviation

✅ Easier to interpret than variance because it is in original units.
✅ Widely used in finance, business, science, and machine learning.
✅ Helps measure data consistency and risk.

Disadvantages of Standard Deviation

❌ Sensitive to outliers.
❌ Does not work well for skewed distributions.


Comparison: Variance vs. Standard Deviation

FeatureVariance (σ², s²)Standard Deviation (σ, s)
DefinitionAverage squared deviation from the meanSquare root of variance
Formulaσ2=∑(Xi−μ)2N\sigma^2 = \frac{\sum (X_i – \mu)^2}{N}σ=σ2\sigma = \sqrt{\sigma^2}
UnitsSquared units (e.g., cm², kg²)Original units (e.g., cm, kg)
InterpretationHarder to interpret due to squared unitsEasier to interpret
Use CasesMathematical models, variance analysisRisk analysis, stock market, quality control

Real-World Applications of Variance and Standard Deviation

📌 Finance: Used to measure market volatility (higher standard deviation means higher risk).
📌 Quality Control: Used in manufacturing to maintain consistent product quality.
📌 Machine Learning: Helps in feature scaling and model performance evaluation.
📌 Education: Analyzing student performance consistency in exams.
📌 Sports: Measuring performance variability of athletes.


Leave a Reply

Your email address will not be published. Required fields are marked *