Performance drop due to excessive thread switching

Excessive thread switching, also known as context switching overhead, occurs when multiple threads frequently switch execution, reducing overall performance instead of improving it.

1. Why Does Excessive Thread Switching Occur?

GIL (Global Interpreter Lock): Python’s GIL prevents true parallel execution in CPU-bound tasks, causing frequent thread switching.
Too Many Threads: If more threads than CPU cores are created, the OS keeps switching between them, leading to delays.
Frequent Lock Contention: Threads competing for locks get blocked, triggering unnecessary context switches.
Short Execution Bursts: Threads performing very short tasks but switching frequently cause overhead.
Improper Thread Scheduling: Unoptimized scheduling may switch threads even when unnecessary.

2. Identifying Thread Switching Overhead

A. Measuring Thread Context Switches

You can track thread switches using Python’s psutil module:

import psutil
import os

pid = os.getpid()
proc = psutil.Process(pid)

before = proc.num_ctx_switches()
# Run multithreading workload
after = proc.num_ctx_switches()

print("Context Switches:", after.voluntary - before.voluntary)

If the number is very high, excessive switching is slowing performance.

3. Common Causes and Fixes

A. Too Many Threads

Issue: Creating too many threads results in the OS frequently switching between them, wasting CPU cycles.

Example (Inefficient)

import threading

def worker():
    while True:
        pass  # Does nothing but keeps running

threads = [threading.Thread(target=worker) for _ in range(1000)]

for t in threads:
    t.start()

Problem: Spawning 1000 threads when you have only 4 or 8 CPU cores leads to excessive thread switching.

Fix: Limit threads to the number of CPU cores (os.cpu_count())

import concurrent.futures
import os

def worker():
    pass  # Do something meaningful

with concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
    executor.map(worker, range(1000))

B. GIL Limiting CPU-Bound Threads

Issue: Python’s Global Interpreter Lock (GIL) allows only one thread to execute Python bytecode at a time, making CPU-bound multithreading inefficient.

Example (Inefficient CPU-bound Task in Threads)

import threading

def cpu_task():
    sum(x*x for x in range(10**6))  # Heavy computation

threads = [threading.Thread(target=cpu_task) for _ in range(5)]

for t in threads:
    t.start()
for t in threads:
    t.join()

Problem: Despite using threads, only one thread runs at a time due to the GIL.

Fix: Use multiprocessing instead of threading for CPU-bound tasks

import multiprocessing

with multiprocessing.Pool(processes=os.cpu_count()) as pool:
    pool.map(cpu_task, range(5))

Now, tasks run truly in parallel using multiple processes instead of inefficient thread switching.

C. Lock Contention Causing Frequent Context Switching

Issue: If multiple threads compete for the same lock, some threads are frequently blocked and switched out.

Example (Lock Contention Causing Overhead)

import threading

lock = threading.Lock()

def worker():
    for _ in range(1000):
        with lock:  # Multiple threads waiting for the lock
            pass

threads = [threading.Thread(target=worker) for _ in range(5)]
for t in threads:
    t.start()
for t in threads:
    t.join()

Problem: The lock forces context switches every time a thread waits.

Fix: Use lock.acquire(timeout=...) to avoid indefinite waiting

def worker():
    for _ in range(1000):
        if lock.acquire(timeout=0.01):  # Avoid waiting too long
            lock.release()

D. Frequent Thread Creation and Destruction

Issue: Creating and destroying threads frequently adds overhead.

Example (Inefficient Repeated Thread Creation)

import threading

def worker():
    pass  # Do some work

for _ in range(1000):
    t = threading.Thread(target=worker)
    t.start()
    t.join()

Problem: Creating 1000 new threads repeatedly slows execution.

Fix: Use a thread pool (ThreadPoolExecutor) instead

import concurrent.futures

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(worker, range(1000))

Now, threads are reused instead of being created and destroyed repeatedly.

4. Summary of Fixes

Issue	Fix
Too many threads	Use `max_workers=os.cpu_count()`
CPU-bound tasks running in threads	Use `multiprocessing` instead of `threading`
Lock contention slowing execution	Use `lock.acquire(timeout=...)`
Frequent thread creation/destruction	Use `ThreadPoolExecutor` to reuse threads