Time Series Analysis - Rishan Solutions

Time series analysis is a powerful technique used to analyze data points collected over time. It is widely used in finance, economics, weather forecasting, stock market prediction, anomaly detection, and IoT applications. Python provides various libraries for handling time series data efficiently.

1. Understanding Time Series Data

Time series data consists of observations collected at regular time intervals. It can be:
Univariate (e.g., daily temperature recordings)
Multivariate (e.g., temperature, humidity, and pressure recorded together)

Example of Time Series Data (Stock Prices)

Date	Price ($)
2023-01-01	150.5
2023-01-02	152.0
2023-01-03	151.3
2023-01-04	153.7

2. Importing Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA

3. Loading Time Series Data

Let’s use Pandas to load a dataset with a time-based index.

df = pd.read_csv("stock_prices.csv", parse_dates=["Date"], index_col="Date")
print(df.head())

parse_dates=["Date"] → Converts the “Date” column to a DateTime format.
index_col="Date" → Sets the “Date” column as the index.

4. Visualizing Time Series Data

Line Plot

plt.figure(figsize=(10,5))
plt.plot(df.index, df["Price"], label="Stock Price")
plt.xlabel("Date")
plt.ylabel("Price ($)")
plt.title("Stock Price Over Time")
plt.legend()
plt.show()

Rolling Mean and Standard Deviation

To analyze trends and volatility, we can compute the rolling mean and standard deviation.

df["Rolling_Mean"] = df["Price"].rolling(window=30).mean()
df["Rolling_Std"] = df["Price"].rolling(window=30).std()

plt.figure(figsize=(10,5))
plt.plot(df["Price"], label="Original")
plt.plot(df["Rolling_Mean"], label="Rolling Mean", linestyle="dashed")
plt.plot(df["Rolling_Std"], label="Rolling Std", linestyle="dotted")
plt.legend()
plt.show()

5. Checking Stationarity

A time series is stationary if its mean and variance do not change over time. The Augmented Dickey-Fuller (ADF) test helps check stationarity.

def adf_test(series):
    result = adfuller(series)
    print(f"ADF Statistic: {result[0]}")
    print(f"p-value: {result[1]}")
    print("Stationary" if result[1] < 0.05 else "Non-Stationary")

adf_test(df["Price"])

If p-value < 0.05 → Data is stationary
If p-value > 0.05 → Data is non-stationary

If the series is non-stationary, we can make it stationary by differencing.

df["Price_Diff"] = df["Price"].diff().dropna()
adf_test(df["Price_Diff"])

6. Decomposing Time Series

Time series can be broken down into:

Trend → Long-term direction
Seasonality → Repeating patterns
Residuals → Noise

decomposition = seasonal_decompose(df["Price"], model="additive", period=30)
decomposition.plot()
plt.show()

7. Forecasting with ARIMA

ARIMA (AutoRegressive Integrated Moving Average) is a popular time series forecasting model.

Step 1: Find Optimal ARIMA Parameters

We need to choose p (AR), d (Differencing), and q (MA) values.

import pmdarima as pm
auto_arima_model = pm.auto_arima(df["Price"], seasonal=False, trace=True)

This will suggest the best values for (p, d, q).

Step 2: Build ARIMA Model

model = ARIMA(df["Price"], order=(2,1,2))  # Example order
model_fit = model.fit()
print(model_fit.summary())

Step 3: Forecasting

forecast = model_fit.forecast(steps=30)
plt.plot(df.index, df["Price"], label="Actual")
plt.plot(pd.date_range(df.index[-1], periods=30, freq="D"), forecast, label="Forecast")
plt.legend()
plt.show()

8. Advanced Techniques

a) LSTM for Time Series Prediction

LSTM (Long Short-Term Memory) networks are deep learning models for time series forecasting.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Reshape data
X_train = np.reshape(df["Price"].values, (-1, 1, 1))

# Build LSTM Model
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(1,1)),
    LSTM(50, return_sequences=False),
    Dense(25),
    Dense(1)
])

model.compile(optimizer="adam", loss="mean_squared_error")
model.fit(X_train, df["Price"].values, epochs=50, batch_size=16)

9. Anomaly Detection in Time Series

df["Price_Change"] = df["Price"].pct_change()
threshold = df["Price_Change"].std() * 3  # Define anomaly threshold

df["Anomaly"] = (df["Price_Change"] > threshold) | (df["Price_Change"] < -threshold)

plt.figure(figsize=(10,5))
plt.plot(df.index, df["Price"], label="Price")
plt.scatter(df[df["Anomaly"]].index, df[df["Anomaly"]]["Price"], color="red", label="Anomalies")
plt.legend()
plt.show()