Last updated: December 27th, 2020
In [1]:
import time
import random
import sys


Deadlocks are yet another problematic condition that might arise as the result of poorly synchronized code.

#### A simple example¶

Let's start by analyzing a simple example: movement between two "bank accounts":

In [2]:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}
ITERATIONS = {
'a1': 0,
'a2': 0,
}

In [3]:
def move_funds(_from, _to, expected):

while True:
amount = random.randint(0, 100)
ACCOUNTS[_from] -= amount
ACCOUNTS[_to] += amount
total = sum(ACCOUNTS.values())
if total != expected:
print(f"{name} found an inconsistent balance: ${total}") break ITERATIONS[_from] += 1  Your already trained eye will probably spot the issue with the previous function. The operations ACCOUNTS[_from] -= amount and ACCOUNTS[_to] += amount can potentially introduce race conditions. Let's verify it anyways: In [4]: t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', 2000)) t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', 2000))  In [5]: t1.start() t2.start()  Thread-2 found an inconsistent balance:$-231319Thread-1 found an inconsistent balance: $-231319  We've confirmed once again the potential dangers of multithreaded code. #### Fixing it with Locks¶ We've already learned about Locks, so we can use those to try synchronizing the access to the accounts. We'll create 2 locks, on for each account: In [6]: ACCOUNTS = { 'a1': 1000, 'a2': 1000 }  In [7]: lock_1 = threading.Lock() lock_2 = threading.Lock()  In [8]: def move_funds(_from, _to, lock_from, lock_to, expected): name = threading.current_thread().name iterations = 0 while True: amount = random.randint(0, 100) with lock_from: with lock_to: ACCOUNTS[_from] -= amount ACCOUNTS[_to] += amount total = sum(ACCOUNTS.values()) if total != expected: print(f"{name} found an inconsistent balance:${total}")
break
iterations += 1
if iterations > 1_000_000:
print(f'{name} reached iteration limit. Stopping...')
return

In [9]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))

In [10]:
t1.start()
t2.start()

In [11]:
[t.join() for t in (t1, t2)];

Thread-1 reached iteration limit. Stopping...

In [12]:
sum(ACCOUNTS.values())

Out[12]:
2000
In [13]:
lock_1.locked(), lock_2.locked()

Out[13]:
(False, False)

Worked! Now the access to the accounts is protected by the locks. But there's a very dangerous potential situation hidden in our code. If we make just 1 tiny change, altering the order of the locks that are passed to our threads, we'll find ourselves deadlocked:

In [14]:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}

In [15]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))

In [16]:
t1.start()
t2.start()


We'll try to join the threads until they finish. This will probably hang forever:

In [17]:
[t.join() for t in (t1, t2)];

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-17-64b67b704c40> in <module>
----> 1 [t.join() for t in (t1, t2)];

<ipython-input-17-64b67b704c40> in <listcomp>(.0)
----> 1 [t.join() for t in (t1, t2)];

1009
1010         if timeout is None:
-> 1011             self._wait_for_tstate_lock()
1012         else:
1013             # the behavior of a negative timeout isn't documented, but

1025         if lock is None:  # already determined that the C code is done
1026             assert self._is_stopped
-> 1027         elif lock.acquire(block, timeout):
1028             lock.release()
1029             self._stop()

KeyboardInterrupt: 

And we'll see that NO thread has even started processing:

In [18]:
sum(ACCOUNTS.values())

Out[18]:
2000

Both locks are locked...

In [19]:
lock_1.locked(), lock_2.locked()

Out[19]:
(True, True)

(we'll have to restart the Kernel here).

#### What are Deadlocks and when do they happen?¶

Wikipedia's definition:

a deadlock is a state in which each member of a group is waiting for another member, including itself, to take action, such as sending a message or more commonly releasing a lock

And this is one of my favorite quotes from an Operating Systems (Avi Silberschatz) book I read more than 10 years ago while I was still at school:

Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: “When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.”

In our code, the issue is the ordering in which the locks are acquired. This is a VERY difficult thing to spot.

Sorry, but I can't avoid telling you the truth: It's very hard to prevent Deadlocks. One simple technique is to always use timeouts when trying to acquire locks. If you're trying to acquire N shared locks, if you can't acquire all N, you can release them all and start over. Let's see an example:

In [20]:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}

LOCK_TIMEOUT = .001


In [21]:
def move_funds(_from, _to, lock_from, lock_to, expected):
iterations = 0
while True:
amount = random.randint(0, 100)
locked = False
while not locked:
res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
if all([res1, res2]):
# Success! We acquired both locks
locked = True
else:
# Release locks "partially" acquired
if res1:
lock_from.release()
if res2:
lock_to.release()
# locked is True, we're safe
ACCOUNTS[_from] -= amount
ACCOUNTS[_to] += amount
total = sum(ACCOUNTS.values())
lock_from.release()
lock_to.release()
if total != expected:
print(f"{name} found an inconsistent balance: \${total}")
break
iterations += 1
if iterations > 100_000:
print(f'{name} reached iteration limit. Stopping...')
return

In [22]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))

In [23]:
t1.start()
t2.start()

In [24]:
[t.join() for t in (t1, t2)];

Thread-1 reached iteration limit. Stopping...

In [25]:
sum(ACCOUNTS.values())

Out[25]:
2000
In [26]:
lock_1.locked(), lock_2.locked()

Out[26]:
(False, False)

As you can see, we've just prevented the deadlock. The key code piece that prevents the deadlock is the following:

locked = False
while not locked:
res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
if all([res1, res2]):
# Success! We acquired both locks
locked = True
else:
# Release locks "partially" acquired
if res1:
lock_from.release()
if res2:
lock_to.release()


If we successfully acquire both locks within the timeout window, we can proceed to perform our critical section. In other case, we'll release any "partially" acquired locks.

There are several other synchronization mechanisms that we're not explicitly talking about, like Semaphores, Conditions, Events, Barriers, etc. These follow the same principles as locks, but are used for other purposes.