Multiprocessing in Python

Loading

Multiprocessing is the ability to run multiple processes concurrently, utilizing multiple CPU cores. In contrast to multithreading, which runs threads in the same process and shares memory space, multiprocessing creates separate processes with their own memory space. This allows parallel execution, especially beneficial for CPU-bound tasks, like data processing, mathematical calculations, and simulations.

Python provides the multiprocessing module to allow easy parallelization of tasks, which can improve performance by leveraging multiple CPU cores, especially in CPU-bound operations.


2. The Difference Between Multithreading and Multiprocessing

  • Multithreading: Threads run within a single process and share the same memory space. Threads are lightweight and are suitable for I/O-bound tasks, but due to Python’s Global Interpreter Lock (GIL), they may not provide significant performance improvements for CPU-bound tasks.
  • Multiprocessing: Involves creating multiple processes, each with its own memory space. Each process runs independently, allowing full use of multiple CPU cores. Multiprocessing is ideal for CPU-bound tasks, as it bypasses the GIL and takes full advantage of multi-core processors.

3. The multiprocessing Module

The multiprocessing module provides the ability to create processes, manage communication between them, and synchronize them. It contains several classes and functions that enable parallel execution.

Core Components of the multiprocessing Module:

  • Process: Represents a separate process that runs independently.
  • Queue: A thread-safe way for processes to communicate and share data.
  • Pool: Provides a pool of worker processes for parallel execution.
  • Pipe: A two-way communication channel between processes.
  • Lock: Prevents race conditions when multiple processes access shared resources.

4. Basic Example: Creating Processes with multiprocessing.Process

The most straightforward way to use multiprocessing is by creating Process objects. Each process runs a target function, and you can manage multiple processes concurrently.

Example: Basic Usage of Process

import multiprocessing
import time

def print_numbers():
for i in range(5):
print(i)
time.sleep(1)

# Create two processes
process1 = multiprocessing.Process(target=print_numbers)
process2 = multiprocessing.Process(target=print_numbers)

# Start the processes
process1.start()
process2.start()

# Wait for processes to finish
process1.join()
process2.join()

print("Both processes have finished.")

Explanation:

  • multiprocessing.Process(target=print_numbers) creates a new process that runs the print_numbers function.
  • start() begins the execution of the process.
  • join() ensures that the main program waits for the completion of the processes before continuing.

5. Using Pool for Parallel Execution

The Pool class is a powerful abstraction for parallel execution. It allows you to define a pool of worker processes and distribute tasks among them.

Example: Using Pool for Parallel Execution

import multiprocessing

def square(x):
return x * x

# Create a pool of 4 worker processes
with multiprocessing.Pool(4) as pool:
result = pool.map(square, [1, 2, 3, 4, 5])

print(result) # Output: [1, 4, 9, 16, 25]

Explanation:

  • multiprocessing.Pool(4) creates a pool of 4 worker processes.
  • pool.map(square, [1, 2, 3, 4, 5]) distributes the input list across the worker processes and applies the square() function to each element.

The map() function of Pool behaves similarly to the built-in map(), but it distributes the workload across multiple processes.


6. Communication Between Processes

Multiprocessing provides several ways to enable communication between processes, such as Queues and Pipes. These allow data to be shared or passed between processes.

Example: Using a Queue for Communication

import multiprocessing

def worker(queue):
for i in range(5):
queue.put(i)

if __name__ == "__main__":
queue = multiprocessing.Queue()

process = multiprocessing.Process(target=worker, args=(queue,))
process.start()
process.join()

while not queue.empty():
print(queue.get())

Explanation:

  • queue.put(i) adds items to the queue.
  • The main process retrieves the items using queue.get() after the worker process finishes.

7. Synchronization Using Lock

When multiple processes access shared resources, it is important to use locks to prevent race conditions. Locks ensure that only one process can access the shared resource at a time.

Example: Using Lock for Synchronization

import multiprocessing

def increment(counter, lock):
for _ in range(5):
with lock:
counter.value += 1
print(f"Counter: {counter.value}")

if __name__ == "__main__":
counter = multiprocessing.Value('i', 0) # Shared counter
lock = multiprocessing.Lock()

processes = [multiprocessing.Process(target=increment, args=(counter, lock)) for _ in range(3)]

for p in processes:
p.start()

for p in processes:
p.join()

print(f"Final Counter: {counter.value}")

Explanation:

  • multiprocessing.Value creates a shared variable (counter) that can be accessed by multiple processes.
  • lock = multiprocessing.Lock() ensures that only one process can increment the counter at a time.

8. Using Pipe for Communication

A Pipe is a two-way communication channel between two processes, allowing them to send and receive data.

Example: Using Pipe for Two-Way Communication

import multiprocessing

def sender(pipe):
pipe.send("Hello from sender!")

def receiver(pipe):
message = pipe.recv()
print(f"Receiver received: {message}")

if __name__ == "__main__":
pipe = multiprocessing.Pipe()

process1 = multiprocessing.Process(target=sender, args=(pipe[0],))
process2 = multiprocessing.Process(target=receiver, args=(pipe[1],))

process1.start()
process2.start()

process1.join()
process2.join()

Explanation:

  • multiprocessing.Pipe() creates a pipe with two ends (pipe[0] for sending and pipe[1] for receiving).
  • pipe.send() sends data through the pipe, and pipe.recv() receives it.

Leave a Reply

Your email address will not be published. Required fields are marked *