Programmer's Python: Async

Programmer's Python: Async - Locks

Written by Mike James

Wednesday, 17 September 2025

Article Index
Programmer's Python: Async - Locks
Hardware Problem or Heisenbug?

Page 2 of 2

pythonAsync180

The important point is that you cannot predict what this very simple program will produce when it is run. In a system that is organized to not allow a thread to interrupt a running CPU-bound thread then you will get the “correct” answer of 200000. On a machine that allows threads to interrupt each other with less restraint you will get a lower value. The actual behavior of the program depends on its timing and the way that the GIL interacts with the operating system’s scheduling method.

The point is that this code is not deterministic in the sense that you cannot predict what it does just by reading its code.

You might object that the function being used to demonstrate this is contrived and would never be created in practice, but it is a simplified model of what most functions do when they access a shared resource – read the resource, do some computation and finally save the new result to the resource. In practice the reason for the race condition is usually much more difficult to see.

Hardware Problem or Heisenbug?

The example of a race condition just given is optimized to increase the probability that the condition will occur. In the real world programs generally have a lower probability of creating a race condition and the result might well be what you are expecting when you run it many times. Eventually, however, the conditions will be right and the program will give the wrong result. This means that the program will likely pass testing and only show an error very occasionally, usually when most damage can be incurred. Such bugs are usually referred to as “non-deterministic” because you can run the program under the same conditions and get different results.

Often the first response is to test, or even replace, the hardware and this increases the time it takes even to realize that there is a software bug waiting to be found. Such bugs are very difficult to locate because they are very difficult to reproduce. They are often labeled as Heisenbugs because any attempt to find them tends to make them disappear. Running a program with a race condition in a debugger, for example, can make the probability of it occurring go to zero. Similarly, adding debugging statements can modify the timings so as to make the problem vanish – until they are removed and the program put back into general use.

The only secure and reasonable solution to the problem is to use locking.

Locks

A lock is a co-operative mechanism for restricting access to a resource. The important point here is “co-operative”. It needs to be clear right from the start that a locking mechanism only works if you implement it correctly in all of the code that makes use of the shared resource. There is nothing stopping code that does not make use of the lock from accessing the resource. This is a general feature of locking in most operating systems and isn’t specific to Python.

The simplest type of lock has just two states – locked and unlocked. Any code that wants access to a resource that will not be interrupted by another thread has to acquire the lock by changing it to the locked state. If a thread tries to acquire the lock that is already locked then the thread has to wait for it to be unlocked.

The Lock class behaves exactly as described. It is a wrapper for a lock that is implemented by the operating system. In other words, the Python Lock is an operating system construct. It corresponds to the most basic type of lock, usually called a mutex, for Mutually Exclusive lock. It has an acquire method:

Lock.acquire(blocking=True, timeout=- 1)

and a release method:

Lock.release()

The blocking parameter determines what happens if the lock cannot be acquired. If it is True, the default, then the thread simply waits until the lock is available. The acquire returns True when the lock is acquired and you can set a timeout for the wait. Its default is -1 which means “wait forever”. If the acquire returns because of the timeout then it returns False. Alternatively you can set blocking to False and then acquire returns immediately with True if the lock has been acquired or False if it has not. In this case you cannot specify a timeout.

If a thread has the lock then it has to release it when it has finished modifying the resource, using the release method. Any thread, not just the thread that has the lock, can release it and this can be a problem. If you try to release a lock that isn’t locked then you generate a RuntimeError. When a lock is released the thread that released it carries on running until it gives up the GIL and another thread gets a chance to run. If there are multiple threads waiting to acquire the lock then the operating system picks just one of them to run and the others have to again wait until it releases the lock that it has just acquired.

Notice that which thread gets to run when a lock becomes available depends on the operating system and you cannot rely on any particular order of execution. That is, if threads A, B and C attempt to acquire the lock in that order they don’t necessarily run in that order when the lock is released.

If we add a lock to the function in the previous example then it always returns the correct result no matter how many threads are used to execute it:

myCounter=0
countlock=threading.Lock()
def count():
    global myCounter
    for i in range(100000):
        countlock.acquire()
        temp=myCounter+1
        x=sqrt(2)
        myCounter=temp
        countlock.release()

In this example we acquire the lock before accessing the global variable myCounter and release it after it has been completely updated. As long as all threads use the same locking then only one thread can access the resource and the program is fully deterministic. It never misses an update due to overlapped access.

This works, but it slows things down. The unlocked, but incorrect version, runs two threads in about 70 ms whereas the locked version takes 150 ms. The overhead isn’t due to any loss of parallelism as with the GIL in place there isn’t any. The overhead is entirely due to the cost of locking and unlocking. In principle, you should always arrange for a thread to keep a lock for the shortest possible time to allow other threads to work. However, this doesn’t take the GIL into account. If you change the program so that it keeps the lock for the duration of the loop. i.e. until it has very nearly finished. then it is still deterministic, but it only takes about 70 ms with two threads:

myCounter=0
def count():
    global myCounter
    countlock.acquire()
    for i in range(100000):
        temp=myCounter+1
        x=sqrt(2)
        myCounter=temp
    countlock.release()

In other words, as the GIL only allows one thread to run at a time and as all of the threads are CPU-bound, there is no time advantage in releasing the thread early. The story would be different if some of the threads were I/O-bound because then releasing the lock might give them time to move on to another I/O operation and so reduce the overall runtime.

In chapter but not in this extract

Locks and Processes
Deadlock
Context Managed Locks
Recursive Lock
Semaphore
Atomic Operations
Atomic CPython
Lock-Free Code
Computing Pi Using Locks

Summary

A race condition occurs when two or more threads try to update the same resource and interfere with each other in such a way that the final result depends on the exact timing of the operations.
Race conditions manifest as bugs that seem to occur randomly because of their dependence on the exact timing of operations. Such bugs are often blamed on hardware problems and are hard to find due to the difficulty in reproducing them.
Locks are a co-operative scheme that restricts access to shared resources to eliminate the possibly of race conditions.
The simplest lock is Lock which can only be acquired by one thread until is is released. With small changes in how the Lock is shared this works the same way for threads and processes.
If more than one lock is in use there is the possibility of creating deadlock where a set of threads cannot progress because they are all waiting on locks already owned by other threads in the set.
One way of avoiding deadlock is to use timeouts in acquiring a lock.
Rlock is a slightly more advanced lock which keeps track of which thread has it locked and how many times that thread has called acquire. To unlock the Rlock that thread has to call release the same number of times.
Rlock can be used to give a single thread access to a locked resource from multiple sections of the code it runs.
The Semaphore is a lock that counts the number of times it has been acquired by any thread. It is initially set to a value which is decremented at each successful acquire and acquire blocks when the count reaches zero. Each release increments the count.
Semaphores can be used to restrict access to a shared resource to a specific number of threads. They are more useful, however, for synchronization.
Atomic operations cannot be interrupted by another thread. They don’t need to be locked to be free of race conditions.
It is very difficult to find out what is an atomic instruction in Python.
Lock-free code is possible if suitable atomic operations can be used instead of locks. Another approach, not available in Python, is to use an atomic compare and swap to check that a shared resource hasn’t changed.