Log analysis is essential for monitoring system activity, detecting security threats, and troubleshooting issues. Python provides powerful libraries to parse, analyze, and visualize logs effectively.
Key Uses of Log Analysis:
- Monitor system activity & errors
- Detect security threats (failed logins, brute force attacks, anomalies)
- Analyze web server logs (Apache, Nginx)
- Generate insights & reports
Installing Required Libraries
pip install pandas matplotlib re
1. Reading Log Files in Python
def read_logs(file_path):
with open(file_path, "r") as file:
logs = file.readlines()
return logs
# Example usage
log_data = read_logs("server.log")
print(log_data[:5]) # Print first 5 log entries
Loads logs from a file for analysis.
2. Parsing Log Entries with Regular Expressions (re Module)
Extract useful information (IP address, timestamp, status codes).
import re
log_entry = '192.168.1.10 - - [10/Mar/2025:12:00:45] "GET /index.html HTTP/1.1" 200 512'
pattern = r'(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<timestamp>.+?)\] "(?P<method>\w+) (?P<url>\S+) (?P<protocol>HTTP/\d.\d)" (?P<status>\d+) (?P<size>\d+)'
match = re.match(pattern, log_entry)
if match:
print(match.groupdict()) # Extracted log details
Extracts IP address, request type, URL, status codes, etc.
3. Analyzing Web Server Logs with Pandas
Count HTTP status codes from Apache/Nginx logs.
import pandas as pd
# Sample logs
log_data = [
"192.168.1.10 - - [10/Mar/2025] \"GET /index.html HTTP/1.1\" 200 512",
"192.168.1.12 - - [10/Mar/2025] \"GET /login HTTP/1.1\" 404 1024",
"192.168.1.15 - - [10/Mar/2025] \"POST /api HTTP/1.1\" 500 2048",
]
df = pd.DataFrame([re.match(pattern, log).groupdict() for log in log_data])
df["status"] = df["status"].astype(int) # Convert status to integer
# Count occurrences of each status code
status_counts = df["status"].value_counts()
print(status_counts)
Finds error patterns (404, 500) in logs.
4. Detecting Suspicious Login Attempts
Track multiple failed login attempts from the same IP.
from collections import defaultdict
failed_logins = defaultdict(int)
log_entries = [
"Failed login from 192.168.1.50",
"Failed login from 192.168.1.50",
"Failed login from 192.168.1.100",
"Failed login from 192.168.1.50",
]
for log in log_entries:
ip = log.split()[-1]
failed_logins[ip] += 1
# Detect suspicious IPs
for ip, count in failed_logins.items():
if count > 2:
print(f"Suspicious activity detected from {ip} (Failed logins: {count})")
Detects potential brute force attacks.
5. Visualizing Log Data with Matplotlib
Plot HTTP status codes using Matplotlib.
import matplotlib.pyplot as plt
# Sample status counts
status_counts = {200: 50, 404: 10, 500: 5}
plt.bar(status_counts.keys(), status_counts.values(), color=["green", "red", "orange"])
plt.xlabel("Status Code")
plt.ylabel("Count")
plt.title("HTTP Status Code Distribution")
plt.show()
Identifies trends & errors visually.
When to Use Log Analysis in Python?
✔ Monitor server performance & detect anomalies
✔ Identify security threats & failed login attempts
✔ Analyze web traffic & error trends
✔ Automate reporting for IT & security teams