Analyzing Logs with Python

Loading

Log analysis is essential for monitoring system activity, detecting security threats, and troubleshooting issues. Python provides powerful libraries to parse, analyze, and visualize logs effectively.

Key Uses of Log Analysis:

  • Monitor system activity & errors
  • Detect security threats (failed logins, brute force attacks, anomalies)
  • Analyze web server logs (Apache, Nginx)
  • Generate insights & reports

Installing Required Libraries

pip install pandas matplotlib re

1. Reading Log Files in Python

def read_logs(file_path):
with open(file_path, "r") as file:
logs = file.readlines()
return logs

# Example usage
log_data = read_logs("server.log")
print(log_data[:5]) # Print first 5 log entries

Loads logs from a file for analysis.


2. Parsing Log Entries with Regular Expressions (re Module)

Extract useful information (IP address, timestamp, status codes).

import re

log_entry = '192.168.1.10 - - [10/Mar/2025:12:00:45] "GET /index.html HTTP/1.1" 200 512'

pattern = r'(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<timestamp>.+?)\] "(?P<method>\w+) (?P<url>\S+) (?P<protocol>HTTP/\d.\d)" (?P<status>\d+) (?P<size>\d+)'
match = re.match(pattern, log_entry)

if match:
print(match.groupdict()) # Extracted log details

Extracts IP address, request type, URL, status codes, etc.


3. Analyzing Web Server Logs with Pandas

Count HTTP status codes from Apache/Nginx logs.

import pandas as pd

# Sample logs
log_data = [
"192.168.1.10 - - [10/Mar/2025] \"GET /index.html HTTP/1.1\" 200 512",
"192.168.1.12 - - [10/Mar/2025] \"GET /login HTTP/1.1\" 404 1024",
"192.168.1.15 - - [10/Mar/2025] \"POST /api HTTP/1.1\" 500 2048",
]

df = pd.DataFrame([re.match(pattern, log).groupdict() for log in log_data])
df["status"] = df["status"].astype(int) # Convert status to integer

# Count occurrences of each status code
status_counts = df["status"].value_counts()
print(status_counts)

Finds error patterns (404, 500) in logs.


4. Detecting Suspicious Login Attempts

Track multiple failed login attempts from the same IP.

from collections import defaultdict

failed_logins = defaultdict(int)

log_entries = [
"Failed login from 192.168.1.50",
"Failed login from 192.168.1.50",
"Failed login from 192.168.1.100",
"Failed login from 192.168.1.50",
]

for log in log_entries:
ip = log.split()[-1]
failed_logins[ip] += 1

# Detect suspicious IPs
for ip, count in failed_logins.items():
if count > 2:
print(f"Suspicious activity detected from {ip} (Failed logins: {count})")

Detects potential brute force attacks.


5. Visualizing Log Data with Matplotlib

Plot HTTP status codes using Matplotlib.

import matplotlib.pyplot as plt

# Sample status counts
status_counts = {200: 50, 404: 10, 500: 5}

plt.bar(status_counts.keys(), status_counts.values(), color=["green", "red", "orange"])
plt.xlabel("Status Code")
plt.ylabel("Count")
plt.title("HTTP Status Code Distribution")
plt.show()

Identifies trends & errors visually.


When to Use Log Analysis in Python?

Monitor server performance & detect anomalies
Identify security threats & failed login attempts
Analyze web traffic & error trends
Automate reporting for IT & security teams

Leave a Reply

Your email address will not be published. Required fields are marked *