# Concurrency-6-Deadlocks

Last updated: February 15th, 2021
In [1]:
import time
import threading
from threading import Thread
import random
import sys


## Deadlocks¶

Deadlocks are yet another problematic condition that might arise as the result of poorly synchronized code.

#### A simple example¶

Let's start by analyzing a simple example: movement between two "bank accounts":

In [2]:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}
ITERATIONS = {
'a1': 0,
'a2': 0,
}

In [3]:
def move_funds(_from, _to, expected):
name = threading.current_thread().name

while True:
amount = random.randint(0, 100)
ACCOUNTS[_from] -= amount
ACCOUNTS[_to] += amount
total = sum(ACCOUNTS.values())
if total != expected:
print(f"{name} found an inconsistent balance: ${total}") break ITERATIONS[_from] += 1  Your already trained eye will probably spot the issue with the previous function. The operations ACCOUNTS[_from] -= amount and ACCOUNTS[_to] += amount can potentially introduce race conditions. Let's verify it anyways: In [4]: t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', 2000)) t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', 2000))  In [5]: t1.start() t2.start()  Thread-2 found an inconsistent balance:$-110956Thread-1 found an inconsistent balance: $-110956  We've confirmed once again the potential dangers of multithreaded code. #### Fixing it with Locks¶ We've already learned about Locks, so we can use those to try synchronizing the access to the accounts. We'll create 2 locks, on for each account: In [6]: ACCOUNTS = { 'a1': 1000, 'a2': 1000 }  In [7]: lock_1 = threading.Lock() lock_2 = threading.Lock()  In [8]: def move_funds(_from, _to, lock_from, lock_to, expected): name = threading.current_thread().name iterations = 0 while True: amount = random.randint(0, 100) with lock_from: with lock_to: ACCOUNTS[_from] -= amount ACCOUNTS[_to] += amount total = sum(ACCOUNTS.values()) if total != expected: print(f"{name} found an inconsistent balance:${total}")
break
iterations += 1
if iterations > 1_000_000:
print(f'{name} reached iteration limit. Stopping...')
return

In [9]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_1, lock_2, 2000))

In [10]:
t1.start()
t2.start()

In [11]:
[t.join() for t in (t1, t2)];

Thread-1 reached iteration limit. Stopping...
Thread-2 reached iteration limit. Stopping...

In [12]:
sum(ACCOUNTS.values())

Out[12]:
2000
In [14]:
lock_1.locked(), lock_2.locked()

Out[14]:
(False, False)

Worked! Now the access to the accounts is protected by the locks. But there's a very dangerous potential situation hidden in our code. If we make just 1 tiny change, altering the order of the locks that are passed to our threads, we'll find ourselves deadlocked:

In [15]:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}

In [16]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_2, lock_1, 2000))

In [17]:
t1.start()
t2.start()


We'll try to join the threads until they finish. This will probably hang forever:

In [18]:
[t.join() for t in (t1, t2)];

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-18-64b67b704c40> in <module>
----> 1 [t.join() for t in (t1, t2)];

<ipython-input-18-64b67b704c40> in <listcomp>(.0)
----> 1 [t.join() for t in (t1, t2)];

/usr/local/lib/python3.8/threading.py in join(self, timeout)
1009
1010         if timeout is None:
-> 1011             self._wait_for_tstate_lock()
1012         else:
1013             # the behavior of a negative timeout isn't documented, but

/usr/local/lib/python3.8/threading.py in _wait_for_tstate_lock(self, block, timeout)
1025         if lock is None:  # already determined that the C code is done
1026             assert self._is_stopped
-> 1027         elif lock.acquire(block, timeout):
1028             lock.release()
1029             self._stop()

KeyboardInterrupt: 

And we'll see that NO thread has even started processing:

In [19]:
sum(ACCOUNTS.values())

Out[19]:
2000

Both locks are locked...

In [20]:
lock_1.locked(), lock_2.locked()

Out[20]:
(True, True)

Congratulations! You've just experienced your first: DEADLOCK!

(we'll have to restart the Kernel here).

#### What are Deadlocks and when do they happen?¶

Wikipedia's definition:

a deadlock is a state in which each member of a group is waiting for another member, including itself, to take action, such as sending a message or more commonly releasing a lock

And this is one of my favorite quotes from an Operating Systems (Avi Silberschatz) book I read more than 10 years ago while I was still at school:

Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: “When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.”

In our code, the issue is the ordering in which the locks are acquired. This is a VERY difficult thing to spot.

#### How to prevent Deadlocks¶

Sorry, but I can't avoid telling you the truth: It's very hard to prevent Deadlocks. One simple technique is to always use timeouts when trying to acquire locks. If you're trying to acquire N shared locks, if you can't acquire all N, you can release them all and start over. Let's see an example:

In [53]:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}

LOCK_TIMEOUT = .001

lock_1 = threading.Lock()
lock_2 = threading.Lock()

In [54]:
def move_funds(_from, _to, lock_from, lock_to, expected):
name = threading.current_thread().name
iterations = 0
while True:
amount = random.randint(0, 100)
locked = False
while not locked:
res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
if all([res1, res2]):
# Success! We acquired both locks
locked = True
else:
# Release locks "partially" acquired
if res1:
lock_from.release()
if res2:
lock_to.release()
# locked is True, we're safe
ACCOUNTS[_from] -= amount
ACCOUNTS[_to] += amount
total = sum(ACCOUNTS.values())
lock_from.release()
lock_to.release()
if total != expected:
print(f"{name} found an inconsistent balance: \${total}")
break
iterations += 1
if iterations > 10_000:
print(f'{name} reached iteration limit. Stopping...')
return

In [55]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_2, lock_1, 2000))

In [56]:
t1.start()
t2.start()

In [58]:
[t.join() for t in (t1, t2)];

In [59]:
sum(ACCOUNTS.values())

Out[59]:
2000
In [60]:
lock_1.locked(), lock_2.locked()

Out[60]:
(False, False)

As you can see, we've just prevented the deadlock. The key code piece that prevents the deadlock is the following:

locked = False
while not locked:
res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
if all([res1, res2]):
# Success! We acquired both locks
locked = True
else:
# Release locks "partially" acquired
if res1:
lock_from.release()
if res2:
lock_to.release()


If we successfully acquire both locks within the timeout window, we can proceed to perform our critical section. In other case, we'll release any "partially" acquired locks.

## Thread Synchronization: Summary¶

There are several other synchronization mechanisms that we're not explicitly talking about, like Semaphores, Conditions, Events, Barriers, etc. These follow the same principles as locks, but are used for other purposes.

The main takeaway from this lesson is: synchronization is HARD, and error/bug prone. Even the most experience developers avoid writing synchronized code, there's always something going wrong.

Still, synchronization seems to be a necessary evil. We don't want to have race conditions in our code. In our following lessons we'll explore other alternatives to create correct code without the need of working with synchronization.