Malware analysis is the process of examining malicious software to understand its behavior, capabilities, and impact. Python is widely used in malware analysis due to its powerful libraries, automation capabilities, and ease of scripting.
This guide covers:
✔ Static and dynamic malware analysis
✔ Extracting indicators of compromise (IOCs)
✔ Reverse engineering techniques
✔ Sandboxing malware in a controlled environment
🔹 1. Setting Up a Malware Analysis Environment
Tools Required
- Python 3
- Virtual Machines (VMWare/VirtualBox)
- Sandboxing tools (Cuckoo Sandbox, Any.Run)
- Python Libraries:
pefile
(for PE file analysis)pydeep
(for fuzzy hashing)yara-python
(for malware pattern detection)scapy
(for network traffic analysis)pyshark
(to analyze PCAP files)
Isolate Your Environment
Never analyze malware on your main machine!
- Use a dedicated VM (Windows/Linux)
- Disable network access or use a safe VPN
- Take VM snapshots before executing malware
2. Static Malware Analysis with Python
Static analysis involves examining the malware file without executing it.
Extracting Metadata from PE Files
Portable Executable (PE) files are common in Windows malware.
import pefile
pe = pefile.PE("malware.exe")
print(f"Entry Point: {hex(pe.OPTIONAL_HEADER.AddressOfEntryPoint)}")
print(f"Sections: {[section.Name.decode().strip() for section in pe.sections]}")
print(f"Imported DLLs: {[entry.dll.decode() for entry in pe.DIRECTORY_ENTRY_IMPORT]}")
Find hidden imports, packed files, and suspicious DLLs.
Computing File Hashes for Malware Detection
import hashlib
def get_hash(file_path):
with open(file_path, "rb") as f:
data = f.read()
return hashlib.sha256(data).hexdigest()
print(get_hash("malware.exe"))
Compare against known malware hashes from VirusTotal.
3. Dynamic Malware Analysis with Python
Dynamic analysis involves running the malware in a controlled environment and monitoring its behavior.
Monitoring System Calls with Python
import psutil
for proc in psutil.process_iter(['pid', 'name', 'username']):
print(proc.info)
Detect suspicious processes spawned by malware.
Capturing Network Traffic with Scapy
from scapy.all import sniff
def packet_callback(packet):
print(packet.summary())
sniff(prn=packet_callback, count=10)
Analyze if malware is contacting C2 servers.
4. Detecting Malware with YARA Rules
YARA is a rule-based tool to detect malware patterns.
Example YARA Rule
rule Trojan_Dropper {
strings:
$a = "malicious_code_here"
$b = { E8 83 EC 18 68 }
condition:
any of them
}
Using Python to Scan Files with YARA
import yara
rules = yara.compile(filepath="rules.yara")
matches = rules.match("malware.exe")
if matches:
print("Malware detected:", matches)
Identify malware families and signatures.
5. Extracting Malware Configuration Data
Some malware hides its configuration in encrypted files, registry keys, or encoded scripts.
Extracting Strings from Malware
import strings
with open("malware.exe", "rb") as f:
data = f.read()
for string in strings.extract(data):
print(string)
Extract possible IPs, URLs, or suspicious commands.
6. Automating Malware Analysis with Python
Writing a Basic Malware Sandbox
import os
import subprocess
malware_path = "malware.exe"
try:
output = subprocess.check_output(malware_path, shell=True, timeout=10)
print("Malware executed:", output)
except subprocess.TimeoutExpired:
print("Execution timed out - possible sandbox detection")
Run malware in a restricted environment for observation.
7. Reverse Engineering Malware
Use tools like Ghidra, IDA Pro, or Radare2 for deep analysis. Python can automate parts of the reverse engineering process.
Extracting Opcode Sequences
from capstone import *
code = b"\x55\x48\x8b\x05\xb8\x13\x00\x00"
md = Cs(CS_ARCH_X86, CS_MODE_64)
for i in md.disasm(code, 0x1000):
print("0x%x:\t%s\t%s" % (i.address, i.mnemonic, i.op_str))
Useful for analyzing shellcode and obfuscated binaries.
8. Detecting Malware Persistence Mechanisms
Malware often creates registry keys or startup entries to persist after reboot.
Checking Windows Startup Entries
import winreg
key = winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, r"SOFTWARE\Microsoft\Windows\CurrentVersion\Run")
i = 0
while True:
try:
value = winreg.EnumValue(key, i)
print(value)
i += 1
except OSError:
break
Detects if malware runs at startup.
9. Analyzing Malware Communication
Extracting C2 Server Information from Memory Dumps
import re
with open("memory.dmp", "rb") as f:
data = f.read()
ips = re.findall(rb"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b", data)
print(set(ips))
Find C2 IPs embedded in malware memory dumps.
10. Detecting Malware Using Machine Learning
Python can train models to detect malware based on static and dynamic features.
Example: Machine Learning for Malware Classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Sample feature set (hashes, file size, entropy)
X = [[1.2, 500, 7.8], [0.9, 200, 3.5], [1.5, 1000, 6.4]]
y = [1, 0, 1] # 1 = Malware, 0 = Benign
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
print("Malware detection accuracy:", clf.score(X_test, y_test))
Detects malware based on file characteristics.