Multiprocessing is the ability to run multiple processes concurrently, utilizing multiple CPU cores. In contrast to multithreading, which runs threads in the same process and shares memory space, multiprocessing creates separate processes with their own memory space. This allows parallel execution, especially beneficial for CPU-bound tasks, like data processing, mathematical calculations, and simulations.
Python provides the multiprocessing
module to allow easy parallelization of tasks, which can improve performance by leveraging multiple CPU cores, especially in CPU-bound operations.
2. The Difference Between Multithreading and Multiprocessing
- Multithreading: Threads run within a single process and share the same memory space. Threads are lightweight and are suitable for I/O-bound tasks, but due to Python’s Global Interpreter Lock (GIL), they may not provide significant performance improvements for CPU-bound tasks.
- Multiprocessing: Involves creating multiple processes, each with its own memory space. Each process runs independently, allowing full use of multiple CPU cores. Multiprocessing is ideal for CPU-bound tasks, as it bypasses the GIL and takes full advantage of multi-core processors.
3. The multiprocessing
Module
The multiprocessing
module provides the ability to create processes, manage communication between them, and synchronize them. It contains several classes and functions that enable parallel execution.
Core Components of the multiprocessing
Module:
- Process: Represents a separate process that runs independently.
- Queue: A thread-safe way for processes to communicate and share data.
- Pool: Provides a pool of worker processes for parallel execution.
- Pipe: A two-way communication channel between processes.
- Lock: Prevents race conditions when multiple processes access shared resources.
4. Basic Example: Creating Processes with multiprocessing.Process
The most straightforward way to use multiprocessing is by creating Process
objects. Each process runs a target function, and you can manage multiple processes concurrently.
Example: Basic Usage of Process
import multiprocessing
import time
def print_numbers():
for i in range(5):
print(i)
time.sleep(1)
# Create two processes
process1 = multiprocessing.Process(target=print_numbers)
process2 = multiprocessing.Process(target=print_numbers)
# Start the processes
process1.start()
process2.start()
# Wait for processes to finish
process1.join()
process2.join()
print("Both processes have finished.")
Explanation:
multiprocessing.Process(target=print_numbers)
creates a new process that runs theprint_numbers
function.start()
begins the execution of the process.join()
ensures that the main program waits for the completion of the processes before continuing.
5. Using Pool
for Parallel Execution
The Pool
class is a powerful abstraction for parallel execution. It allows you to define a pool of worker processes and distribute tasks among them.
Example: Using Pool
for Parallel Execution
import multiprocessing
def square(x):
return x * x
# Create a pool of 4 worker processes
with multiprocessing.Pool(4) as pool:
result = pool.map(square, [1, 2, 3, 4, 5])
print(result) # Output: [1, 4, 9, 16, 25]
Explanation:
multiprocessing.Pool(4)
creates a pool of 4 worker processes.pool.map(square, [1, 2, 3, 4, 5])
distributes the input list across the worker processes and applies thesquare()
function to each element.
The map()
function of Pool
behaves similarly to the built-in map()
, but it distributes the workload across multiple processes.
6. Communication Between Processes
Multiprocessing provides several ways to enable communication between processes, such as Queues and Pipes. These allow data to be shared or passed between processes.
Example: Using a Queue
for Communication
import multiprocessing
def worker(queue):
for i in range(5):
queue.put(i)
if __name__ == "__main__":
queue = multiprocessing.Queue()
process = multiprocessing.Process(target=worker, args=(queue,))
process.start()
process.join()
while not queue.empty():
print(queue.get())
Explanation:
queue.put(i)
adds items to the queue.- The main process retrieves the items using
queue.get()
after the worker process finishes.
7. Synchronization Using Lock
When multiple processes access shared resources, it is important to use locks to prevent race conditions. Locks ensure that only one process can access the shared resource at a time.
Example: Using Lock
for Synchronization
import multiprocessing
def increment(counter, lock):
for _ in range(5):
with lock:
counter.value += 1
print(f"Counter: {counter.value}")
if __name__ == "__main__":
counter = multiprocessing.Value('i', 0) # Shared counter
lock = multiprocessing.Lock()
processes = [multiprocessing.Process(target=increment, args=(counter, lock)) for _ in range(3)]
for p in processes:
p.start()
for p in processes:
p.join()
print(f"Final Counter: {counter.value}")
Explanation:
multiprocessing.Value
creates a shared variable (counter
) that can be accessed by multiple processes.lock = multiprocessing.Lock()
ensures that only one process can increment the counter at a time.
8. Using Pipe
for Communication
A Pipe is a two-way communication channel between two processes, allowing them to send and receive data.
Example: Using Pipe
for Two-Way Communication
import multiprocessing
def sender(pipe):
pipe.send("Hello from sender!")
def receiver(pipe):
message = pipe.recv()
print(f"Receiver received: {message}")
if __name__ == "__main__":
pipe = multiprocessing.Pipe()
process1 = multiprocessing.Process(target=sender, args=(pipe[0],))
process2 = multiprocessing.Process(target=receiver, args=(pipe[1],))
process1.start()
process2.start()
process1.join()
process2.join()
Explanation:
multiprocessing.Pipe()
creates a pipe with two ends (pipe[0]
for sending andpipe[1]
for receiving).pipe.send()
sends data through the pipe, andpipe.recv()
receives it.