![]()
Time series analysis is a powerful technique used to analyze data points collected over time. It is widely used in finance, economics, weather forecasting, stock market prediction, anomaly detection, and IoT applications. Python provides various libraries for handling time series data efficiently.
1. Understanding Time Series Data
Time series data consists of observations collected at regular time intervals. It can be:
Univariate (e.g., daily temperature recordings)
Multivariate (e.g., temperature, humidity, and pressure recorded together)
Example of Time Series Data (Stock Prices)
| Date | Price ($) |
|---|---|
| 2023-01-01 | 150.5 |
| 2023-01-02 | 152.0 |
| 2023-01-03 | 151.3 |
| 2023-01-04 | 153.7 |
2. Importing Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
3. Loading Time Series Data
Let’s use Pandas to load a dataset with a time-based index.
df = pd.read_csv("stock_prices.csv", parse_dates=["Date"], index_col="Date")
print(df.head())
parse_dates=["Date"] → Converts the “Date” column to a DateTime format.
index_col="Date" → Sets the “Date” column as the index.
4. Visualizing Time Series Data
Line Plot
plt.figure(figsize=(10,5))
plt.plot(df.index, df["Price"], label="Stock Price")
plt.xlabel("Date")
plt.ylabel("Price ($)")
plt.title("Stock Price Over Time")
plt.legend()
plt.show()
Rolling Mean and Standard Deviation
To analyze trends and volatility, we can compute the rolling mean and standard deviation.
df["Rolling_Mean"] = df["Price"].rolling(window=30).mean()
df["Rolling_Std"] = df["Price"].rolling(window=30).std()
plt.figure(figsize=(10,5))
plt.plot(df["Price"], label="Original")
plt.plot(df["Rolling_Mean"], label="Rolling Mean", linestyle="dashed")
plt.plot(df["Rolling_Std"], label="Rolling Std", linestyle="dotted")
plt.legend()
plt.show()
5. Checking Stationarity
A time series is stationary if its mean and variance do not change over time. The Augmented Dickey-Fuller (ADF) test helps check stationarity.
def adf_test(series):
result = adfuller(series)
print(f"ADF Statistic: {result[0]}")
print(f"p-value: {result[1]}")
print("Stationary" if result[1] < 0.05 else "Non-Stationary")
adf_test(df["Price"])
If p-value < 0.05 → Data is stationary
If p-value > 0.05 → Data is non-stationary
If the series is non-stationary, we can make it stationary by differencing.
df["Price_Diff"] = df["Price"].diff().dropna()
adf_test(df["Price_Diff"])
6. Decomposing Time Series
Time series can be broken down into:
- Trend → Long-term direction
- Seasonality → Repeating patterns
- Residuals → Noise
decomposition = seasonal_decompose(df["Price"], model="additive", period=30)
decomposition.plot()
plt.show()
7. Forecasting with ARIMA
ARIMA (AutoRegressive Integrated Moving Average) is a popular time series forecasting model.
Step 1: Find Optimal ARIMA Parameters
We need to choose p (AR), d (Differencing), and q (MA) values.
import pmdarima as pm
auto_arima_model = pm.auto_arima(df["Price"], seasonal=False, trace=True)
This will suggest the best values for (p, d, q).
Step 2: Build ARIMA Model
model = ARIMA(df["Price"], order=(2,1,2)) # Example order
model_fit = model.fit()
print(model_fit.summary())
Step 3: Forecasting
forecast = model_fit.forecast(steps=30)
plt.plot(df.index, df["Price"], label="Actual")
plt.plot(pd.date_range(df.index[-1], periods=30, freq="D"), forecast, label="Forecast")
plt.legend()
plt.show()
8. Advanced Techniques
a) LSTM for Time Series Prediction
LSTM (Long Short-Term Memory) networks are deep learning models for time series forecasting.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# Reshape data
X_train = np.reshape(df["Price"].values, (-1, 1, 1))
# Build LSTM Model
model = Sequential([
LSTM(50, return_sequences=True, input_shape=(1,1)),
LSTM(50, return_sequences=False),
Dense(25),
Dense(1)
])
model.compile(optimizer="adam", loss="mean_squared_error")
model.fit(X_train, df["Price"].values, epochs=50, batch_size=16)
9. Anomaly Detection in Time Series
df["Price_Change"] = df["Price"].pct_change()
threshold = df["Price_Change"].std() * 3 # Define anomaly threshold
df["Anomaly"] = (df["Price_Change"] > threshold) | (df["Price_Change"] < -threshold)
plt.figure(figsize=(10,5))
plt.plot(df.index, df["Price"], label="Price")
plt.scatter(df[df["Anomaly"]].index, df[df["Anomaly"]]["Price"], color="red", label="Anomalies")
plt.legend()
plt.show()
