Linear Regression

Loading

Linear Regression in Machine Learning

1. Introduction to Linear Regression

Linear Regression is one of the most fundamental and widely used supervised learning algorithms in machine learning. It is primarily used for predicting continuous values based on input features.

Linear Regression establishes a linear relationship between the dependent variable (target/output) and one or more independent variables (features/input variables).

💡 Example Use Cases:
Predicting house prices based on square footage, number of bedrooms, and location.
Estimating salary based on years of experience and education level.
Forecasting sales revenue based on marketing expenditure.


2. Types of Linear Regression

There are two main types of Linear Regression:

1️⃣ Simple Linear Regression → Only one independent variable (feature).
2️⃣ Multiple Linear Regression → More than one independent variable (feature).


3. Simple Linear Regression

Simple Linear Regression models the relationship between one independent variable (X) and one dependent variable (Y) using a straight-line equation: Y=mX+bY = mX + b

Where:

  • Y = Dependent variable (predicted value)
  • X = Independent variable (input feature)
  • m = Slope of the line (how much Y changes for a unit change in X)
  • b = Intercept (value of Y when X = 0)

📌 Example Scenario:

  • Suppose you want to predict a person’s salary based on years of experience.
  • X = Years of Experience, Y = Salary
  • The regression model will find the best-fit straight line that describes the relationship.

4. Multiple Linear Regression

When there are multiple independent variables, the equation extends to: Y=b0+b1X1+b2X2+…+bnXnY = b_0 + b_1X_1 + b_2X_2 + … + b_nX_n

Where:

  • Y = Dependent variable (predicted output)
  • X₁, X₂, …, Xₙ = Independent variables (features)
  • b₀ = Intercept (bias term)
  • b₁, b₂, …, bₙ = Coefficients (weights)

📌 Example Scenario:

  • Predicting house prices based on multiple factors:
    • X₁ = Square footage
    • X₂ = Number of bedrooms
    • X₃ = Location rating
    • Y = House Price

5. Assumptions of Linear Regression

For Linear Regression to work effectively, certain assumptions must hold:

1️⃣ Linearity

  • There must be a linear relationship between the independent (X) and dependent (Y) variables.

2️⃣ Independence

  • Observations should be independent of each other (no correlation between residuals).

3️⃣ Homoscedasticity

  • The variance of residual errors should be constant across all values of X.

4️⃣ Normality of Residuals

  • The residuals (errors) should follow a normal distribution.

5️⃣ No Multicollinearity (for Multiple Regression)

  • Independent variables should not be highly correlated with each other.

6. Cost Function in Linear Regression

Linear Regression uses the Mean Squared Error (MSE) as the cost function to measure how well the line fits the data. MSE=1n∑(Yi−Yi^)2MSE = \frac{1}{n} \sum (Y_i – \hat{Y_i})^2

Where:

  • Yᵢ = Actual values
  • Ŷᵢ = Predicted values

The goal of Linear Regression is to minimize this error to get the best-fitting line.


7. Gradient Descent Optimization

To find the best parameters (m and b) that minimize the cost function, we use Gradient Descent.

Gradient Descent Steps:
1️⃣ Start with random values of m and b.
2️⃣ Calculate the gradient (slope of cost function).
3️⃣ Update m and b iteratively: m=m−αddmMSEm = m – \alpha \frac{d}{dm} MSE b=b−αddbMSEb = b – \alpha \frac{d}{db} MSE

Where α (learning rate) controls the step size of updates.

4️⃣ Repeat until convergence (error stops decreasing).


8. Evaluating a Linear Regression Model

After training, we evaluate the model’s performance using:

1️⃣ R-squared (R² Score)

R2=1−SSresidualSStotalR^2 = 1 – \frac{SS_{residual}}{SS_{total}}

  • Measures how well the model explains the variance in Y.
  • R² = 1 → Perfect fit, R² = 0 → No predictive power.

2️⃣ Mean Absolute Error (MAE)

MAE=1n∑∣Yi−Yi^∣MAE = \frac{1}{n} \sum |Y_i – \hat{Y_i}|

  • Measures the absolute differences between actual and predicted values.

3️⃣ Mean Squared Error (MSE)

MSE=1n∑(Yi−Yi^)2MSE = \frac{1}{n} \sum (Y_i – \hat{Y_i})^2

  • Penalizes larger errors more than MAE.

4️⃣ Root Mean Squared Error (RMSE)

RMSE=MSERMSE = \sqrt{MSE}

  • Provides an error measure in the same unit as Y.

9. Implementing Linear Regression in Python (Sklearn)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)  # y = 4 + 3X + noise

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse:.2f}')
print(f'R² Score: {r2:.2f}')

# Plot results
plt.scatter(X_test, y_test, color="blue", label="Actual")
plt.plot(X_test, y_pred, color="red", linewidth=2, label="Predicted Line")
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.show()

10. Advantages and Disadvantages of Linear Regression

Advantages:

✔ Simple and easy to interpret.
✔ Works well for linearly related data.
✔ Computationally efficient.

Disadvantages:

❌ Assumes a linear relationship (not suitable for complex patterns).
❌ Sensitive to outliers.
❌ Cannot capture non-linear relationships.


11. When to Use Linear Regression?

When the relationship between variables is approximately linear.
When you need an interpretable and simple model.
When computational efficiency is important.


12. Summary

Linear Regression models the relationship between input and output using a straight line.
Simple Linear Regression → One feature, Multiple Linear Regression → Multiple features.
✔ Uses Mean Squared Error (MSE) as a cost function and Gradient Descent for optimization.
✔ Evaluated using R² Score, MSE, RMSE, MAE.
✔ Implemented easily using Sklearn in Python.

Mastering Linear Regression is a crucial step in machine learning and data science!

Leave a Reply

Your email address will not be published. Required fields are marked *