Autoregressive (AR) Models: A Comprehensive Guide
1. Introduction to Autoregressive (AR) Models
Autoregressive (AR) models are one of the fundamental models used in time series forecasting. The AR model predicts a value based on its own past values, making it a powerful tool for analyzing time-dependent data.
📌 Definition:
An Autoregressive Model (AR) is a type of regression model where the dependent variable (future values of the time series) is regressed on its own past values (lags).
📌 Examples of Use:
- Stock market forecasting 📈
- Economic indicators prediction 💰
- Weather forecasting ⛅
- Sales and demand forecasting 🛒
- Energy consumption forecasting 🔋
✅ Why Use AR Models?
✔ Captures temporal dependencies (relationship between past and future values).
✔ Simple and interpretable compared to more complex models.
✔ Works well with stationary time series.
✔ Can be extended to ARIMA, SARIMA, and other advanced models.
2. Understanding the Autoregressive Model (Mathematical Formulation)
An AR(p) model (Autoregressive model of order p) is defined as: Yt=c+ϕ1Yt−1+ϕ2Yt−2+…+ϕpYt−p+ϵtY_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + … + \phi_p Y_{t-p} + \epsilon_t
Where:
- YtY_t → The value of the time series at time tt.
- pp → The order of the AR model (number of past values used).
- cc → Constant term (intercept).
- ϕ1,ϕ2,…,ϕp\phi_1, \phi_2, …, \phi_p → Coefficients for past values.
- ϵt\epsilon_t → Error term (random noise).
Example of an AR(1) Model (First-order autoregression): Yt=c+ϕ1Yt−1+ϵtY_t = c + \phi_1 Y_{t-1} + \epsilon_t
This means the current value depends only on its immediately previous value.
3. Assumptions of AR Models
For an AR model to work effectively, certain assumptions must hold:
1️⃣ Stationarity: The time series must have a constant mean, variance, and autocorrelation over time. If the data is non-stationary, it must be transformed (e.g., differencing).
2️⃣ Linearity: The relationship between past values and future values is linear.
3️⃣ No Autocorrelation in Residuals: The residuals should be randomly distributed with no patterns.
4️⃣ Finite Predictability: The effect of past values diminishes as we go further back in time.
4. Checking for Stationarity in Time Series
Since AR models require stationary time series, let’s check for stationarity using plots and statistical tests.
Step 1: Load the Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
# Load time series dataset
df = pd.read_csv("time_series_data.csv")
df['Date'] = pd.to_datetime(df['Date']) # Convert to datetime format
df.set_index('Date', inplace=True)
# Plot the time series
plt.figure(figsize=(12,5))
plt.plot(df, label="Time Series Data")
plt.xlabel("Time")
plt.ylabel("Value")
plt.title("Time Series Data Visualization")
plt.legend()
plt.show()
Step 2: Check for Stationarity Using Augmented Dickey-Fuller (ADF) Test
The ADF test checks if the time series is stationary.
def adf_test(series):
result = adfuller(series)
print("ADF Statistic:", result[0])
print("p-value:", result[1])
if result[1] < 0.05:
print("The time series is stationary.")
else:
print("The time series is non-stationary.")
adf_test(df['Value']) # Replace 'Value' with your column name
✅ If p-value < 0.05, the data is stationary.
❌ If p-value > 0.05, apply differencing to make it stationary:
df['Value_diff'] = df['Value'].diff()
df.dropna(inplace=True)
adf_test(df['Value_diff'])
5. Fitting an Autoregressive (AR) Model
Step 1: Import Required Libraries
from statsmodels.tsa.ar_model import AutoReg
# Split data into training and testing sets
train_size = int(len(df) * 0.8)
train, test = df[:train_size], df[train_size:]
Step 2: Train AR Model
# Fit AR model with optimal lag
p = 3 # Choose lag order based on ACF/PACF plots
ar_model = AutoReg(train['Value'], lags=p).fit()
print(ar_model.summary()) # Summary of model coefficients
Step 3: Make Predictions
# Predict values
predictions = ar_model.predict(start=len(train), end=len(df)-1)
# Plot actual vs predicted
plt.figure(figsize=(12,6))
plt.plot(train.index, train['Value'], label="Training Data")
plt.plot(test.index, test['Value'], label="Actual Data", color="blue")
plt.plot(test.index, predictions, label="Predicted Data", color="red", linestyle="dashed")
plt.xlabel("Time")
plt.ylabel("Value")
plt.title("Autoregressive Model Forecast")
plt.legend()
plt.show()
6. Selecting the Optimal Lag (p) Using ACF & PACF
Step 1: Plot Autocorrelation (ACF) & Partial Autocorrelation (PACF)
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Plot ACF and PACF to determine lag value (p)
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
plot_acf(df['Value'], lags=20, ax=ax[0])
plot_pacf(df['Value'], lags=20, ax=ax[1])
plt.show()
📌 Choose p where PACF plot cuts off significantly.
7. Evaluating AR Model Performance
Step 1: Calculate MAE and RMSE
from sklearn.metrics import mean_absolute_error, mean_squared_error
mae = mean_absolute_error(test['Value'], predictions)
rmse = np.sqrt(mean_squared_error(test['Value'], predictions))
print(f"MAE: {mae}, RMSE: {rmse}")
✅ Lower MAE and RMSE = Better Model Performance.
8. Extending AR Models: ARMA, ARIMA, and SARIMA
- ARMA (Autoregressive Moving Average): Combines AR (past values) with MA (past errors).
- ARIMA (Autoregressive Integrated Moving Average): Adds differencing to handle non-stationary data.
- SARIMA (Seasonal ARIMA): Incorporates seasonality for periodic time series.
📌 Want a guide on ARIMA or SARIMA? Let me know! 🚀
9. Conclusion
✔ Autoregressive models are simple yet powerful tools for time series forecasting.
✔ AR models work best for stationary time series with strong autocorrelation.
✔ Choosing the correct lag (p) is crucial for model accuracy.
✔ AR can be extended to ARIMA, SARIMA, and deep learning methods for more complex forecasting.
Would you like a comparison of AR vs ARIMA vs LSTMs? Let me know! 🚀