Decision Trees in Machine Learning
1. Introduction to Decision Trees
A Decision Tree is a Supervised Learning algorithm used for both classification and regression problems. It mimics human decision-making by splitting data into branches to make predictions.
π³ Key Features of Decision Trees:
β Easy to understand and interpret
β Works with both categorical & numerical data
β Can model non-linear relationships
β Requires minimal data preprocessing
π Real-world Applications:
β
Spam Detection (Spam or Not Spam)
β
Credit Risk Analysis (Loan Default or Not)
β
Medical Diagnosis (Disease Present or Not)
β
Customer Segmentation (High-Value vs Low-Value Customers)
2. How Decision Trees Work?
A Decision Tree follows a hierarchical tree-like structure, consisting of:
- Root Node β The starting point (entire dataset)
- Decision Nodes β Intermediate nodes where data is split
- Leaf Nodes β Final nodes with class labels (output)
π Example:
Imagine you need to classify whether a customer will buy a product.
1οΈβ£ Start at the root node: “Is income > $50K?”
2οΈβ£ If Yes, move to the next decision: “Is age > 30?”
3οΈβ£ If No, predict: “Will not buy”
4οΈβ£ Keep splitting data until a final decision (leaf node) is reached.
3. Splitting Criteria in Decision Trees
To determine the best feature for splitting, Decision Trees use impurity measures like:
1οΈβ£ Gini Impurity
Gini=1ββ(pi)2Gini = 1 – \sum (p_i)^2
- Measures impurity in a dataset
- Lower Gini = Better purity
- Default criterion in Scikit-Learn
2οΈβ£ Entropy & Information Gain
Entropy=ββpilogβ‘2piEntropy = -\sum p_i \log_2 p_i Information Gain=Entropyparentββ(weighted child entropy)Information\ Gain = Entropy_{parent} – \sum \text{(weighted child entropy)}
- Entropy measures uncertainty in data
- Information Gain chooses features with highest entropy reduction
3οΈβ£ Mean Squared Error (MSE) for Regression Trees
MSE=1nβ(yiβy^)2MSE = \frac{1}{n} \sum (y_i – \hat{y})^2
- Used for continuous output predictions
π Gini vs Entropy
β Gini is faster (computationally less expensive)
β Entropy is more informative but requires more computation
4. Overfitting and Pruning in Decision Trees
A deep Decision Tree may lead to overfitting (high accuracy on training data, poor generalization).
β Solution: Pruning (reducing tree complexity)
- Pre-Pruning (Stop growing tree early)
- Post-Pruning (Trim branches after full growth)
β Techniques: Setting maximum depth, minimum samples per leaf, pruning weak branches
5. Advantages & Disadvantages of Decision Trees
β Advantages
β Simple & easy to understand
β No need for feature scaling
β Handles categorical & numerical data
β Works well with missing values
β Disadvantages
β Prone to overfitting
β Biased towards dominant classes
β Unstable (small data changes can change tree structure)
β Greedy algorithm (locally optimal splits may not be globally best)
6. Implementing Decision Trees in Python (Sklearn)
Let’s build a Decision Tree using Scikit-Learn!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn import tree
# Sample Dataset
data = {'Age': [25, 30, 35, 40, 45, 50, 55, 60],
'Salary': [30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000],
'Buys_Product': [0, 0, 1, 1, 1, 1, 0, 0]}
df = pd.DataFrame(data)
# Features & Target
X = df[['Age', 'Salary']]
y = df['Buys_Product']
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Decision Tree
model = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Model Evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(report)
# Visualize the Decision Tree
plt.figure(figsize=(10, 6))
tree.plot_tree(model, feature_names=['Age', 'Salary'], class_names=['No', 'Yes'], filled=True)
plt.show()
7. Hyperparameters of Decision Trees
πΉ max_depth
β Limits depth to prevent overfitting
πΉ min_samples_split
β Minimum samples needed to split
πΉ min_samples_leaf
β Minimum samples per leaf
πΉ criterion
β Choose “gini” or “entropy”
πΉ max_features
β Number of features to consider for the best split
π Hyperparameter tuning helps improve model performance and reduce overfitting!
8. Decision Trees for Regression (Regression Trees)
Decision Trees can also predict continuous values (e.g., house prices). Instead of classification, they use Mean Squared Error (MSE) for splits.
from sklearn.tree import DecisionTreeRegressor
# Train Regression Tree
regressor = DecisionTreeRegressor(max_depth=3, random_state=42)
regressor.fit(X_train, y_train)
# Predict and Evaluate
y_pred = regressor.predict(X_test)
β Works well with non-linear data
β Captures interactions between features
9. Decision Trees vs Other Algorithms
Algorithm | Strengths | Weaknesses |
---|---|---|
Decision Trees | Easy to interpret, no feature scaling needed | Prone to overfitting |
Random Forest | More accurate, reduces overfitting | Computationally expensive |
SVM | Works well with high-dimensional data | Needs proper tuning |
Neural Networks | Handles complex patterns | Requires large data and tuning |
π Ensemble Methods like Random Forest and Gradient Boosting improve Decision Tree performance!
10. Summary
β Decision Trees split data based on conditions to make predictions.
β Use Gini Impurity, Entropy, or MSE for splitting.
β Can be used for classification & regression tasks.
β Pruning & hyperparameter tuning prevent overfitting.
β Used in Finance, Healthcare, Marketing, and many other fields.
Mastering Decision Trees is crucial for building robust Machine Learning models!