|
Page 1 of 4 The biggest problem in async is sharing data between different processes. Why not just share memory? Find out how to do this in this extract from Programmer's Python: Async.
Programmer's Python: Async Threads, processes, asyncio & more
Is now available as a print book: Amazon
Contents
1) A Lightning Tour of Python.
2) Asynchronous Explained
3) Processed-Based Parallelism Extract 1 Process Based Parallism 4) Threads Extract 1 -- Threads 5) Locks and Deadlock Extract 1 - Locks
6) Synchronization
7) Sharing Data Extract 1 - Pipes & Queues Extract 2 - Shared Memory ***NEW!
8) The Process Pool Extract 1 -The Process Pool 1
9) Process Managers Extract 1- Process Manager
10) Subprocesses
11) Futures Extract 1 Futures,
12) Basic Asyncio Extract 1 Basic Asyncio
13) Using asyncio Extract 1 Asyncio Web Client 14) The Low-Level API Extract 1 - Streams & Web Clients Appendix I Python in Visual Studio Code
Sharing Data
Processes have access to the same range of synchronization primitives as threads. You can use Lock, Rlock, Event, Semaphore, Barrier and Condition with processes in almost exactly the same way as with threads. What is very different, however, is that processes do not share global variables and thus there is very little to lock! Of course, for processes to work together towards some common objective they need to share some data and there are a number of different ways of doing this. There are two shared data structures, the Queue and the Pipe, which are easy to use and usually powerful enough for most problems. The Queue has the advantage of being usable by processes and threads. The Pipe is closer to the operating system.
Beyond these two data structures there are some more sophisticated and flexible options. You can use a shared area of memory to transfer data directly between any number of processes. This is made easier by the use of the ctypes module which allows the specification of Python types to C types.
Finally we have raw shared memory, which is very close to the way the hardware allows processes to share data. The only problem with this alternative is that everything is done in terms of bytes rather than data structures.
In chapter but not in this extract
- The Queue
- Pipes
- Queues for Threads
Shared Memory
Using a queue or any data structure is often more than you need for communication between processes. For example, the program that calculates pi given in previous chapters only needs to share a single variable to allow the different threads to pool their calculation. At the moment such a simple arrangement isn’t possible with processes, even though they are ideal for the implementation of a CPU-bound program. To pass simple data between processes the solution is to use shared memory. That is, the system will allocate a block of memory that more than one process can access and this can be used something like a postbox to pass data. This is simple, but the downside is that the shared memory isn’t presented as a Python object but as a C data type. Fortunately Python has the ctypes module which provides Python wrappers for all of the standard C data types as explained in Programmer’s Python: Everything Is Data.
There are two easy-to-use shared data ctypes objects – Value and Array. The Value object wraps a single C variable of a specific type and the Array object wraps an array of C types. It is easier to see how things work by looking at Value first. To create a Value you have to use the constructor in the parent process:
value = multiprocessing.Value(type, args, lock = True)
This creates a single ctypes object that wraps a shared memory variable of the specified type. The type parameter determines which of the ctypes classes is used to wrap the shared memory and one or more args is passed to its constructor. You can also specify the type using a single letter code:
'c': ctypes.c_char, 'u': ctypes.c_wchar,
'b': ctypes.c_byte, 'B': ctypes.c_ubyte,
'h': ctypes.c_short, 'H': ctypes.c_ushort,
'i': ctypes.c_int, 'I': ctypes.c_uint,
'l': ctypes.c_long, 'L': ctypes.c_ulong,
'q': ctypes.c_longlong, 'Q': ctypes.c_ulonglong,
'f': ctypes.c_float, 'd': ctypes.c_double
By default an associated Rlock is created, but you can pass an existing Lock or Rlock to be used instead. The lock can be accessed using get_lock and Value can be used in a context manager.
You can access the shared data using the value attribute. The wrapper automatically locks get/set access to the variable and it is in this sense that the wrapper is “thread-safe”. However, you need to be careful in using the value where the lock is released between operations. The only operations that are automatically protected are myVal.value = expression and myVar = myVal.value and expression can’t include myVal. This means that in most cases you have to explicitly lock access to Value. For example, if you want to create a shared integer Value you first have to discover what ctypes class corresponds to the exact type you want, bearing in mind that c_int is a 32-bit integer. Then you create the Value object:
myValue= multiprocessing.Value(ctypes.c_int, 0)
and use it.
|