Feature Engineering for Time Series Data

1. Introduction to Feature Engineering in Time Series

Feature engineering is a crucial step in time series forecasting and machine learning. It involves transforming raw time series data into meaningful features that enhance model accuracy. Proper feature extraction helps machine learning models capture patterns, trends, and seasonality more effectively.

Why is Feature Engineering Important?

✅ Improves Model Performance – Extracting useful information enhances model predictions.
✅ Reduces Noise – Helps remove irrelevant fluctuations in the data.
✅ Captures Temporal Dependencies – Enables models to recognize patterns across time.
✅ Enhances Interpretability – Provides insights into what drives predictions.
✅ Reduces Computational Cost – Helps simplify models by using fewer, more relevant features.

Common Applications of Time Series Feature Engineering

📌 Stock Market Prediction 📈
📌 Weather Forecasting ⛅
📌 Sales and Demand Forecasting 🛒
📌 Energy Load Forecasting 🔋
📌 Anomaly Detection 🚨

2. Data Preparation for Feature Engineering

Step 1: Load Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from statsmodels.tsa.stattools import adfuller
from scipy.stats import skew, kurtosis

Step 2: Load and Visualize the Time Series Data

# Load dataset
df = pd.read_csv("time_series_data.csv")
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Plot the time series data
plt.figure(figsize=(12, 5))
plt.plot(df, label="Time Series Data")
plt.xlabel("Time")
plt.ylabel("Value")
plt.title("Time Series Data Visualization")
plt.legend()
plt.show()

✅ Check for Missing Values

print(df.isnull().sum())

Fill missing values using forward-fill or interpolation:

df.fillna(method='ffill', inplace=True)

✅ Check for Stationarity (Augmented Dickey-Fuller Test)

result = adfuller(df['Value'])
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')

If p-value > 0.05, the data is non-stationary and needs differencing:

df['Differenced'] = df['Value'].diff()
df.dropna(inplace=True)

3. Feature Engineering Techniques for Time Series

1️⃣ Time-Based Features (Date & Time Components)

Extracting date-based features allows the model to recognize seasonal patterns.

df['Year'] = df.index.year
df['Month'] = df.index.month
df['Day'] = df.index.day
df['DayOfWeek'] = df.index.dayofweek
df['IsWeekend'] = (df.index.dayofweek >= 5).astype(int)
df['Hour'] = df.index.hour  # Useful for hourly data
df['Quarter'] = df.index.quarter
df['WeekOfYear'] = df.index.isocalendar().week

✅ Use Cyclical Encoding for Periodic Features

df['Month_sin'] = np.sin(2 * np.pi * df['Month'] / 12)
df['Month_cos'] = np.cos(2 * np.pi * df['Month'] / 12)

df['DayOfWeek_sin'] = np.sin(2 * np.pi * df['DayOfWeek'] / 7)
df['DayOfWeek_cos'] = np.cos(2 * np.pi * df['DayOfWeek'] / 7)

➡ This helps capture seasonality better than raw categorical values.

2️⃣ Lag Features (Past Values as Inputs)

Lag features allow models to learn from past observations.

df['Lag_1'] = df['Value'].shift(1)  # One-step lag
df['Lag_3'] = df['Value'].shift(3)  # Three-step lag
df['Lag_7'] = df['Value'].shift(7)  # One-week lag
df['Lag_30'] = df['Value'].shift(30)  # One-month lag

➡ Useful for autoregressive models like ARIMA, LSTMs, and XGBoost.

3️⃣ Rolling Window Features (Moving Averages & Statistics)

Rolling statistics smooth out fluctuations and capture trends.

df['Rolling_Mean_7'] = df['Value'].rolling(window=7).mean()
df['Rolling_Mean_30'] = df['Value'].rolling(window=30).mean()
df['Rolling_Std_7'] = df['Value'].rolling(window=7).std()
df['Rolling_Std_30'] = df['Value'].rolling(window=30).std()
df['Exp_Moving_Avg_7'] = df['Value'].ewm(span=7).mean()

➡ Helps capture short-term trends and seasonality.

4️⃣ Exponential Weighted Features (EMA – Exponential Moving Average)

Gives more weight to recent observations.

df['EWMA_7'] = df['Value'].ewm(span=7, adjust=False).mean()
df['EWMA_30'] = df['Value'].ewm(span=30, adjust=False).mean()

5️⃣ Difference Features (Change Over Time)

Measures the rate of change in values over time.

df['Diff_1'] = df['Value'].diff(1)
df['Diff_7'] = df['Value'].diff(7)
df['Diff_30'] = df['Value'].diff(30)

6️⃣ Fourier Transforms for Seasonality Detection

Detects cyclical patterns in time series.

from scipy.fftpack import fft

df['FFT'] = np.abs(fft(df['Value']))  # Fourier Transform

➡ Useful for detecting hidden seasonal cycles.

7️⃣ Autocorrelation and Partial Autocorrelation Features

from statsmodels.tsa.stattools import acf, pacf

df['ACF_1'] = acf(df['Value'], nlags=1)[1]
df['PACF_1'] = pacf(df['Value'], nlags=1)[1]

➡ Helps identify lags that influence future values.

8️⃣ Skewness and Kurtosis Features

Measures data distribution characteristics.

df['Skewness'] = df['Value'].apply(lambda x: skew(x))
df['Kurtosis'] = df['Value'].apply(lambda x: kurtosis(x))

4. Model Training with Engineered Features

Once features are generated, split the data into train and test sets:

train_size = int(len(df) * 0.8)
train, test = df.iloc[:train_size], df.iloc[train_size:]

Use machine learning models like:
✅ Random Forest
✅ XGBoost
✅ LSTMs

from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split

X = df.drop(['Value'], axis=1)
y = df['Value']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = XGBRegressor(n_estimators=100, learning_rate=0.1)
model.fit(X_train, y_train)

predictions = model.predict(X_test)