Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment to maximize rewards.
Key Terms in RL
- Agent – The learner or decision-maker.
- Environment – The system in which the agent interacts.
- State (S) – The current situation of the agent.
- Action (A) – The moves the agent can make.
- Reward (R) – Feedback from the environment (positive or negative).
- Policy (π) – The strategy the agent follows to choose actions.
- Q-value (Q-function) – Expected reward for taking an action in a state.
Installing Required Libraries
pip install numpy gym
Step 1: Understanding OpenAI Gym
OpenAI Gym provides environments to develop RL algorithms.
import gym
# Load the CartPole environment
env = gym.make("CartPole-v1")
# Reset the environment
state = env.reset()
# Run 10 episodes
for _ in range(10):
env.render() # Render the environment
action = env.action_space.sample() # Take a random action
state, reward, done, _ = env.step(action) # Apply action
if done:
env.reset() # Restart if game over
env.close()
Step 2: Building a Simple Q-Learning Algorithm
Q-learning is a value-based RL algorithm where we learn the Q-values (action-value function).
import numpy as np
import random
# Create the Q-table
state_size = 10 # Example state size
action_size = 2 # Example action size
Q_table = np.zeros((state_size, action_size))
# Q-learning parameters
learning_rate = 0.1
discount_factor = 0.9
epsilon = 1.0 # Exploration rate
epsilon_decay = 0.99
# Training loop
for episode in range(1000):
state = random.randint(0, state_size - 1) # Random initial state
done = False
while not done:
# Choose action (Exploration-Exploitation trade-off)
if np.random.rand() < epsilon:
action = random.choice([0, 1]) # Random action
else:
action = np.argmax(Q_table[state, :]) # Best action from Q-table
# Simulate environment (random next state and reward)
next_state = random.randint(0, state_size - 1)
reward = np.random.choice([-1, 1]) # Random reward (+1 or -1)
# Update Q-value using Bellman equation
Q_table[state, action] = Q_table[state, action] + learning_rate * (
reward + discount_factor * np.max(Q_table[next_state, :]) - Q_table[state, action]
)
state = next_state # Move to next state
done = np.random.choice([True, False], p=[0.1, 0.9]) # End randomly
# Reduce epsilon (less exploration over time)
epsilon *= epsilon_decay
print("Q-table after training:")
print(Q_table)
Step 3: Deep Q-Networks (DQN) with PyTorch
Q-learning struggles with large state spaces, so we use Deep Q-Networks (DQN).
import torch
import torch.nn as nn
import torch.optim as optim
# Define a neural network model for DQN
class DQN(nn.Module):
def __init__(self, state_size, action_size):
super(DQN, self).__init__()
self.fc1 = nn.Linear(state_size, 24)
self.fc2 = nn.Linear(24, 24)
self.fc3 = nn.Linear(24, action_size)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
return self.fc3(x)
# Initialize model, optimizer, and loss function
model = DQN(state_size=4, action_size=2)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()
Reinforcement Learning Applications
✔ Robotics – Self-learning robots.
✔ Gaming – AI in chess, Go, and video games.
✔ Self-Driving Cars – Decision-making in autonomous vehicles.
✔ Stock Trading – AI-based trading strategies.