Programmer's Python: Async - The Process Pool
Written by Mike James   
Monday, 20 November 2023
Article Index
Programmer's Python: Async - The Process Pool
Waiting For Processes
Computing Pi using AsyncResult

The process pool is the way to make your Python program run faster even if it is CPU bound. Find out how to use the pool in this extract from Programmer's Python: Async.

Programmer's Python:
Async
Threads, processes, asyncio & more

Is now available as a print book: Amazon

pythonAsync360Contents

1)  A Lightning Tour of Python.

2) Asynchronous Explained

3) Processed-Based Parallelism
         Extract 1 Process Based Parallism
4) Threads
         Extract 1 -- Threads
5) Locks and Deadlock

6) Synchronization

7) Sharing Data
        Extract 1 - Pipes & Queues

8) The Process Pool
        Extract 1 -The Process Pool 1 ***NEW!

9) Process Managers

10) Subprocesses

11) Futures
        Extract 1 Futures

12) Basic Asyncio
        Extract 1 Basic Asyncio

13) Using asyncio
        Extract 1 Asyncio Web Client
14) The Low-Level API
       Extract 1 - Streams & Web Clients
Appendix I Python in Visual Studio Code

 

There are overheads in creating processes and threads and one strategy to reduce the cost of creating them is to use a pool of pre-created items. In most other languages it is the idea of a “thread pool” which is important, but in Python the GIL acts as a deterrent to using many threads. As only one thread can be running Python code at any given time, there isn’t a huge advantage is splitting a program into multiple threads. While Python does support a thread pool class, Threading in multiprocessing.pool, it isn’t much used and the newer and more used thread pool features in concurrent.futures is described in Chapter 11.

As processes provide a way to improve performance, it is the process pool which is more important. You can create a pool of processes ready to perform jobs which you can submit later, using:

multiprocessing.pool.Pool(number, initializer, initargs,
	                     maxtasksperchild, context)

All of the parameters are optional and often all you need to do is to specify the number of processes to create. Notice that, unlike a typical thread pool, this is not a system process pool that you can assume is already constructed. The processes are created for you when you use the Pool constructor and they are destroyed when there is nothing more for them to do or when they have completed maxtasksperchild jobs.

It isn’t a good idea to keep a process around for too long and give it lots of jobs to do because processes tend to accumulate resources which are only freed when the process ends. A good balance of re-use and re-creation of processes is desirable. If you don’t specify the number of processes to create then the number of CPUs as reported by os.cpu_count is used. This makes sense for CPU-bound processes, but is less suitable if the processes perform I/O that they have to wait for.

You can also specify an initializer function which will be called using initargs when the process is started. Notice that this only happens once, even if the process is reused by multiple jobs.

As Pool sets up global resources, you cannot simply allow Python to automatically clean up when the object is garbage collected – you need to explicitly use its close or terminate method to free resources. The difference between the two is that the close method allows the processes to finish what they are doing before closing them and the terminate method stops the processes immediately. After calling terminate or close you can call join to wait for all of the processes to finish. Notice that it doesn’t make sense to call join if you haven’t used close or terminate as the processes don’t necessarily end when their jobs are complete.

The safest way to use Pool is in a with as a context manager:

with multiprocessing.pool.Pool(2) as myPool:
     use myPool

Notice that this uses the terminate method when the with ends and this means you need to test that everything has completed before leaving the with.



Last Updated ( Monday, 20 November 2023 )