Concurrency-6-Deadlocks

Last updated: December 27th, 20202020-12-27Project preview
In [1]:
import time
import threading
from threading import Thread
import random
import sys

Deadlocks

Deadlocks are yet another problematic condition that might arise as the result of poorly synchronized code.

A simple example

Let's start by analyzing a simple example: movement between two "bank accounts":

In [2]:
ACCOUNTS = {
    'a1': 1000,
    'a2': 1000
}
ITERATIONS = {
    'a1': 0,
    'a2': 0,
}
In [3]:
def move_funds(_from, _to, expected):
    name = threading.current_thread().name

    while True:
        amount = random.randint(0, 100)
        ACCOUNTS[_from] -= amount
        ACCOUNTS[_to] += amount
        total = sum(ACCOUNTS.values())
        if total != expected:
            print(f"{name} found an inconsistent balance: ${total}")
            break
        ITERATIONS[_from] += 1

Your already trained eye will probably spot the issue with the previous function. The operations ACCOUNTS[_from] -= amount and ACCOUNTS[_to] += amount can potentially introduce race conditions. Let's verify it anyways:

In [4]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', 2000))
In [5]:
t1.start()
t2.start()
Thread-2 found an inconsistent balance: $-231319Thread-1 found an inconsistent balance: $-231319

We've confirmed once again the potential dangers of multithreaded code.

Fixing it with Locks

We've already learned about Locks, so we can use those to try synchronizing the access to the accounts. We'll create 2 locks, on for each account:

In [6]:
ACCOUNTS = {
    'a1': 1000,
    'a2': 1000
}
In [7]:
lock_1 = threading.Lock()
lock_2 = threading.Lock()
In [8]:
def move_funds(_from, _to, lock_from, lock_to, expected):
    name = threading.current_thread().name
    iterations = 0
    while True:
        amount = random.randint(0, 100)
        with lock_from:
            with lock_to:
                ACCOUNTS[_from] -= amount
                ACCOUNTS[_to] += amount
                total = sum(ACCOUNTS.values())
                if total != expected:
                    print(f"{name} found an inconsistent balance: ${total}")
                    break
        iterations += 1
        if iterations > 1_000_000:
            print(f'{name} reached iteration limit. Stopping...')
            return
In [9]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_1, lock_2, 2000))
In [10]:
t1.start()
t2.start()
In [11]:
[t.join() for t in (t1, t2)];
Thread-1 reached iteration limit. Stopping...
Thread-2 reached iteration limit. Stopping...
In [12]:
sum(ACCOUNTS.values())
Out[12]:
2000
In [13]:
lock_1.locked(), lock_2.locked()
Out[13]:
(False, False)

Worked! Now the access to the accounts is protected by the locks. But there's a very dangerous potential situation hidden in our code. If we make just 1 tiny change, altering the order of the locks that are passed to our threads, we'll find ourselves deadlocked:

In [14]:
ACCOUNTS = {
    'a1': 1000,
    'a2': 1000
}
In [15]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_2, lock_1, 2000))
In [16]:
t1.start()
t2.start()

We'll try to join the threads until they finish. This will probably hang forever:

In [17]:
[t.join() for t in (t1, t2)];
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-17-64b67b704c40> in <module>
----> 1 [t.join() for t in (t1, t2)];

<ipython-input-17-64b67b704c40> in <listcomp>(.0)
----> 1 [t.join() for t in (t1, t2)];

~/.pyenv/versions/3.8.0/lib/python3.8/threading.py in join(self, timeout)
   1009 
   1010         if timeout is None:
-> 1011             self._wait_for_tstate_lock()
   1012         else:
   1013             # the behavior of a negative timeout isn't documented, but

~/.pyenv/versions/3.8.0/lib/python3.8/threading.py in _wait_for_tstate_lock(self, block, timeout)
   1025         if lock is None:  # already determined that the C code is done
   1026             assert self._is_stopped
-> 1027         elif lock.acquire(block, timeout):
   1028             lock.release()
   1029             self._stop()

KeyboardInterrupt: 

And we'll see that NO thread has even started processing:

In [18]:
sum(ACCOUNTS.values())
Out[18]:
2000

Both locks are locked...

In [19]:
lock_1.locked(), lock_2.locked()
Out[19]:
(True, True)

Congratulations! You've just experienced your first: DEADLOCK!

(we'll have to restart the Kernel here).

What are Deadlocks and when do they happen?

Wikipedia's definition:

a deadlock is a state in which each member of a group is waiting for another member, including itself, to take action, such as sending a message or more commonly releasing a lock

Deadlock_at_a_four-way-stop

And this is one of my favorite quotes from an Operating Systems (Avi Silberschatz) book I read more than 10 years ago while I was still at school:

Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: “When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.”

In our code, the issue is the ordering in which the locks are acquired. This is a VERY difficult thing to spot.

deadlocks

How to prevent Deadlocks

Sorry, but I can't avoid telling you the truth: It's very hard to prevent Deadlocks. One simple technique is to always use timeouts when trying to acquire locks. If you're trying to acquire N shared locks, if you can't acquire all N, you can release them all and start over. Let's see an example:

In [20]:
ACCOUNTS = {
    'a1': 1000,
    'a2': 1000
}

LOCK_TIMEOUT = .001

lock_1 = threading.Lock()
lock_2 = threading.Lock()
In [21]:
def move_funds(_from, _to, lock_from, lock_to, expected):
    name = threading.current_thread().name
    iterations = 0
    while True:
        amount = random.randint(0, 100)
        locked = False
        while not locked:
            res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
            res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
            if all([res1, res2]):
                # Success! We acquired both locks
                locked = True
            else:
                # Release locks "partially" acquired
                if res1:
                    lock_from.release()
                if res2:
                    lock_to.release()
        # locked is True, we're safe
        ACCOUNTS[_from] -= amount
        ACCOUNTS[_to] += amount
        total = sum(ACCOUNTS.values())
        lock_from.release()
        lock_to.release()
        if total != expected:
            print(f"{name} found an inconsistent balance: ${total}")
            break
        iterations += 1
        if iterations > 100_000:
            print(f'{name} reached iteration limit. Stopping...')
            return
In [22]:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_2, lock_1, 2000))
In [23]:
t1.start()
t2.start()
In [24]:
[t.join() for t in (t1, t2)];
Thread-1 reached iteration limit. Stopping...
Thread-2 reached iteration limit. Stopping...
In [25]:
sum(ACCOUNTS.values())
Out[25]:
2000
In [26]:
lock_1.locked(), lock_2.locked()
Out[26]:
(False, False)

As you can see, we've just prevented the deadlock. The key code piece that prevents the deadlock is the following:

locked = False
while not locked:
    res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
    res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
    if all([res1, res2]):
        # Success! We acquired both locks
        locked = True
    else:
        # Release locks "partially" acquired
        if res1:
            lock_from.release()
        if res2:
            lock_to.release()

If we successfully acquire both locks within the timeout window, we can proceed to perform our critical section. In other case, we'll release any "partially" acquired locks.

Thread Synchronization: Summary

There are several other synchronization mechanisms that we're not explicitly talking about, like Semaphores, Conditions, Events, Barriers, etc. These follow the same principles as locks, but are used for other purposes.

The main takeaway from this lesson is: synchronization is HARD, and error/bug prone. Even the most experience developers avoid writing synchronized code, there's always something going wrong.

Still, synchronization seems to be a necessary evil. We don't want to have race conditions in our code. In our following lessons we'll explore other alternatives to create correct code without the need of working with synchronization.

You have to be this tall to write concurrent code

Notebooks AI
Notebooks AI Profile20060