Introduction to Time Series: A Comprehensive Guide
1. What is Time Series?
A time series is a sequence of data points recorded at successive time intervals. These observations are collected over time, typically at regular intervals such as daily, weekly, monthly, or yearly.
Time series data is widely used in forecasting, financial markets, economics, weather prediction, and anomaly detection.
2. Characteristics of Time Series Data
A time series has the following unique characteristics:
2.1 Trend
- A long-term increase or decrease in the data.
- Example: Stock market prices increasing over years.
2.2 Seasonality
- Regular patterns repeating at fixed intervals.
- Example: Sales increasing during holidays every year.
2.3 Cyclic Patterns
- Repeated fluctuations over irregular time intervals (longer than seasonality).
- Example: Economic recessions occurring every few years.
2.4 Stationarity
- A time series is stationary if its statistical properties (mean, variance) remain constant over time.
- Non-stationary series show trends, seasonality, or changing variance.
2.5 Noise
- Random variations that cannot be predicted.
- Example: Sudden stock market crashes due to unpredictable events.
3. Examples of Time Series Data
Time series data is found in various domains:
- Finance: Stock prices, cryptocurrency values, exchange rates.
- Economics: GDP, inflation rates, unemployment rates.
- Weather: Temperature, rainfall, wind speed.
- Healthcare: Patient heart rates, hospital admissions.
- IoT & Sensors: Machine vibrations, smart home device activity.
4. Time Series Data Visualization
Visualizing time series data helps in understanding trends, seasonality, and anomalies.
4.1 Importing Necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
4.2 Loading Sample Time Series Data
# Load dataset
df = pd.read_csv('time_series_data.csv', parse_dates=['Date'], index_col='Date')
# Display first few rows
print(df.head())
4.3 Plotting the Time Series
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Value'], label='Time Series Data', color='blue')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Time Series Visualization')
plt.legend()
plt.show()
5. Time Series Decomposition
Decomposition breaks a time series into trend, seasonality, and residual components.
from statsmodels.tsa.seasonal import seasonal_decompose
decomposed = seasonal_decompose(df['Value'], model='additive', period=12)
plt.figure(figsize=(12,8))
plt.subplot(411)
plt.plot(df['Value'], label='Original')
plt.legend()
plt.subplot(412)
plt.plot(decomposed.trend, label='Trend')
plt.legend()
plt.subplot(413)
plt.plot(decomposed.seasonal, label='Seasonality')
plt.legend()
plt.subplot(414)
plt.plot(decomposed.resid, label='Residuals')
plt.legend()
plt.show()
6. Stationarity in Time Series
A stationary time series is crucial for accurate forecasting.
6.1 Checking Stationarity Using Rolling Statistics
rolling_mean = df['Value'].rolling(window=12).mean()
rolling_std = df['Value'].rolling(window=12).std()
plt.figure(figsize=(12,6))
plt.plot(df['Value'], label='Original Data')
plt.plot(rolling_mean, label='Rolling Mean', color='red')
plt.plot(rolling_std, label='Rolling Std Dev', color='black')
plt.legend()
plt.show()
6.2 Augmented Dickey-Fuller (ADF) Test
ADF test helps determine if a time series is stationary.
from statsmodels.tsa.stattools import adfuller
result = adfuller(df['Value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
if result[1] < 0.05:
print("Time Series is Stationary")
else:
print("Time Series is Non-Stationary")
7. Making a Time Series Stationary
7.1 Differencing
- Subtract previous value from the current value to remove trends.
df['Differenced'] = df['Value'].diff()
df['Differenced'].dropna().plot()
plt.title("First Order Differencing")
plt.show()
7.2 Log Transformation
- Reduces variance over time.
df['Log'] = np.log(df['Value'])
df['Log'].plot()
plt.title("Log Transformation")
plt.show()
8. Time Series Forecasting Methods
There are various approaches to forecasting.
8.1 Moving Averages
A simple method that smooths fluctuations.
df['Moving_Avg'] = df['Value'].rolling(window=12).mean()
df[['Value', 'Moving_Avg']].plot(figsize=(12,6))
plt.title("Moving Average Smoothing")
plt.show()
8.2 Exponential Smoothing
Assigns more weight to recent observations.
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
model = SimpleExpSmoothing(df['Value']).fit(smoothing_level=0.2)
df['Exp_Smooth'] = model.fittedvalues
df[['Value', 'Exp_Smooth']].plot(figsize=(12,6))
plt.title("Exponential Smoothing")
plt.show()
8.3 ARIMA Model
The AutoRegressive Integrated Moving Average (ARIMA) model is one of the most powerful forecasting techniques.
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(df['Value'], order=(5,1,0)) # AR=5, I=1, MA=0
model_fit = model.fit()
df['Forecast'] = model_fit.fittedvalues
df[['Value', 'Forecast']].plot(figsize=(12,6))
plt.title("ARIMA Model Forecasting")
plt.show()
9. Evaluating Forecasting Models
Model performance can be evaluated using error metrics.
from sklearn.metrics import mean_absolute_error, mean_squared_error
y_true = df['Value']
y_pred = df['Forecast']
mae = mean_absolute_error(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)