Reinforcement Learning in Python

Loading

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment to maximize rewards.

Key Terms in RL

  • Agent – The learner or decision-maker.
  • Environment – The system in which the agent interacts.
  • State (S) – The current situation of the agent.
  • Action (A) – The moves the agent can make.
  • Reward (R) – Feedback from the environment (positive or negative).
  • Policy (π) – The strategy the agent follows to choose actions.
  • Q-value (Q-function) – Expected reward for taking an action in a state.

Installing Required Libraries

pip install numpy gym

Step 1: Understanding OpenAI Gym

OpenAI Gym provides environments to develop RL algorithms.

import gym

# Load the CartPole environment
env = gym.make("CartPole-v1")

# Reset the environment
state = env.reset()

# Run 10 episodes
for _ in range(10):
env.render() # Render the environment
action = env.action_space.sample() # Take a random action
state, reward, done, _ = env.step(action) # Apply action
if done:
env.reset() # Restart if game over

env.close()

Step 2: Building a Simple Q-Learning Algorithm

Q-learning is a value-based RL algorithm where we learn the Q-values (action-value function).

import numpy as np
import random

# Create the Q-table
state_size = 10 # Example state size
action_size = 2 # Example action size
Q_table = np.zeros((state_size, action_size))

# Q-learning parameters
learning_rate = 0.1
discount_factor = 0.9
epsilon = 1.0 # Exploration rate
epsilon_decay = 0.99

# Training loop
for episode in range(1000):
state = random.randint(0, state_size - 1) # Random initial state
done = False

while not done:
# Choose action (Exploration-Exploitation trade-off)
if np.random.rand() < epsilon:
action = random.choice([0, 1]) # Random action
else:
action = np.argmax(Q_table[state, :]) # Best action from Q-table

# Simulate environment (random next state and reward)
next_state = random.randint(0, state_size - 1)
reward = np.random.choice([-1, 1]) # Random reward (+1 or -1)

# Update Q-value using Bellman equation
Q_table[state, action] = Q_table[state, action] + learning_rate * (
reward + discount_factor * np.max(Q_table[next_state, :]) - Q_table[state, action]
)

state = next_state # Move to next state
done = np.random.choice([True, False], p=[0.1, 0.9]) # End randomly

# Reduce epsilon (less exploration over time)
epsilon *= epsilon_decay

print("Q-table after training:")
print(Q_table)

Step 3: Deep Q-Networks (DQN) with PyTorch

Q-learning struggles with large state spaces, so we use Deep Q-Networks (DQN).

import torch
import torch.nn as nn
import torch.optim as optim

# Define a neural network model for DQN
class DQN(nn.Module):
def __init__(self, state_size, action_size):
super(DQN, self).__init__()
self.fc1 = nn.Linear(state_size, 24)
self.fc2 = nn.Linear(24, 24)
self.fc3 = nn.Linear(24, action_size)

def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
return self.fc3(x)

# Initialize model, optimizer, and loss function
model = DQN(state_size=4, action_size=2)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()

Reinforcement Learning Applications

Robotics – Self-learning robots.
Gaming – AI in chess, Go, and video games.
Self-Driving Cars – Decision-making in autonomous vehicles.
Stock Trading – AI-based trading strategies.

Leave a Reply

Your email address will not be published. Required fields are marked *