Latest Posts

Reinforcement Learning in Python

Posted on March 11, 2025March 11, 2025 by Rishan Solutions

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment to maximize rewards.

Key Terms in RL

Agent – The learner or decision-maker.
Environment – The system in which the agent interacts.
State (S) – The current situation of the agent.
Action (A) – The moves the agent can make.
Reward (R) – Feedback from the environment (positive or negative).
Policy (π) – The strategy the agent follows to choose actions.
Q-value (Q-function) – Expected reward for taking an action in a state.

Installing Required Libraries

pip install numpy gym

Step 1: Understanding OpenAI Gym

OpenAI Gym provides environments to develop RL algorithms.

import gym

# Load the CartPole environment
env = gym.make("CartPole-v1")

# Reset the environment
state = env.reset()

# Run 10 episodes
for _ in range(10):
    env.render()  # Render the environment
    action = env.action_space.sample()  # Take a random action
    state, reward, done, _ = env.step(action)  # Apply action
    if done:
        env.reset()  # Restart if game over

env.close()

Step 2: Building a Simple Q-Learning Algorithm

Q-learning is a value-based RL algorithm where we learn the Q-values (action-value function).

import numpy as np
import random

# Create the Q-table
state_size = 10  # Example state size
action_size = 2  # Example action size
Q_table = np.zeros((state_size, action_size))

# Q-learning parameters
learning_rate = 0.1
discount_factor = 0.9
epsilon = 1.0  # Exploration rate
epsilon_decay = 0.99

# Training loop
for episode in range(1000):
    state = random.randint(0, state_size - 1)  # Random initial state
    done = False

    while not done:
        # Choose action (Exploration-Exploitation trade-off)
        if np.random.rand() < epsilon:
            action = random.choice([0, 1])  # Random action
        else:
            action = np.argmax(Q_table[state, :])  # Best action from Q-table

        # Simulate environment (random next state and reward)
        next_state = random.randint(0, state_size - 1)
        reward = np.random.choice([-1, 1])  # Random reward (+1 or -1)

        # Update Q-value using Bellman equation
        Q_table[state, action] = Q_table[state, action] + learning_rate * (
            reward + discount_factor * np.max(Q_table[next_state, :]) - Q_table[state, action]
        )

        state = next_state  # Move to next state
        done = np.random.choice([True, False], p=[0.1, 0.9])  # End randomly

    # Reduce epsilon (less exploration over time)
    epsilon *= epsilon_decay

print("Q-table after training:")
print(Q_table)

Step 3: Deep Q-Networks (DQN) with PyTorch

Q-learning struggles with large state spaces, so we use Deep Q-Networks (DQN).

import torch
import torch.nn as nn
import torch.optim as optim

# Define a neural network model for DQN
class DQN(nn.Module):
    def __init__(self, state_size, action_size):
        super(DQN, self).__init__()
        self.fc1 = nn.Linear(state_size, 24)
        self.fc2 = nn.Linear(24, 24)
        self.fc3 = nn.Linear(24, action_size)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

# Initialize model, optimizer, and loss function
model = DQN(state_size=4, action_size=2)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()

Reinforcement Learning Applications

✔ Robotics – Self-learning robots.
✔ Gaming – AI in chess, Go, and video games.
✔ Self-Driving Cars – Decision-making in autonomous vehicles.
✔ Stock Trading – AI-based trading strategies.

Leave a Reply Cancel reply