import time
import threading
from threading import Thread
import random
import sys
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}
ITERATIONS = {
'a1': 0,
'a2': 0,
}
def move_funds(_from, _to, expected):
name = threading.current_thread().name
while True:
amount = random.randint(0, 100)
ACCOUNTS[_from] -= amount
ACCOUNTS[_to] += amount
total = sum(ACCOUNTS.values())
if total != expected:
print(f"{name} found an inconsistent balance: ${total}")
break
ITERATIONS[_from] += 1
Your already trained eye will probably spot the issue with the previous function. The operations ACCOUNTS[_from] -= amount
and ACCOUNTS[_to] += amount
can potentially introduce race conditions. Let's verify it anyways:
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', 2000))
t1.start()
t2.start()
We've confirmed once again the potential dangers of multithreaded code.
Fixing it with Locks¶
We've already learned about Locks, so we can use those to try synchronizing the access to the accounts. We'll create 2 locks, on for each account:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}
lock_1 = threading.Lock()
lock_2 = threading.Lock()
def move_funds(_from, _to, lock_from, lock_to, expected):
name = threading.current_thread().name
iterations = 0
while True:
amount = random.randint(0, 100)
with lock_from:
with lock_to:
ACCOUNTS[_from] -= amount
ACCOUNTS[_to] += amount
total = sum(ACCOUNTS.values())
if total != expected:
print(f"{name} found an inconsistent balance: ${total}")
break
iterations += 1
if iterations > 1_000_000:
print(f'{name} reached iteration limit. Stopping...')
return
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_1, lock_2, 2000))
t1.start()
t2.start()
[t.join() for t in (t1, t2)];
sum(ACCOUNTS.values())
lock_1.locked(), lock_2.locked()
Worked! Now the access to the accounts is protected by the locks. But there's a very dangerous potential situation hidden in our code. If we make just 1 tiny change, altering the order of the locks that are passed to our threads, we'll find ourselves deadlocked:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_2, lock_1, 2000))
t1.start()
t2.start()
We'll try to join
the threads until they finish. This will probably hang forever:
[t.join() for t in (t1, t2)];
And we'll see that NO thread has even started processing:
sum(ACCOUNTS.values())
Both locks are locked...
lock_1.locked(), lock_2.locked()
Congratulations! You've just experienced your first: DEADLOCK!
(we'll have to restart the Kernel here).
What are Deadlocks and when do they happen?¶
Wikipedia's definition:
a deadlock is a state in which each member of a group is waiting for another member, including itself, to take action, such as sending a message or more commonly releasing a lock
And this is one of my favorite quotes from an Operating Systems (Avi Silberschatz) book I read more than 10 years ago while I was still at school:
Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: “When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.”
In our code, the issue is the ordering in which the locks are acquired. This is a VERY difficult thing to spot.
How to prevent Deadlocks¶
Sorry, but I can't avoid telling you the truth: It's very hard to prevent Deadlocks. One simple technique is to always use timeouts when trying to acquire locks. If you're trying to acquire N shared locks, if you can't acquire all N, you can release them all and start over. Let's see an example:
ACCOUNTS = {
'a1': 1000,
'a2': 1000
}
LOCK_TIMEOUT = .001
lock_1 = threading.Lock()
lock_2 = threading.Lock()
def move_funds(_from, _to, lock_from, lock_to, expected):
name = threading.current_thread().name
iterations = 0
while True:
amount = random.randint(0, 100)
locked = False
while not locked:
res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
if all([res1, res2]):
# Success! We acquired both locks
locked = True
else:
# Release locks "partially" acquired
if res1:
lock_from.release()
if res2:
lock_to.release()
# locked is True, we're safe
ACCOUNTS[_from] -= amount
ACCOUNTS[_to] += amount
total = sum(ACCOUNTS.values())
lock_from.release()
lock_to.release()
if total != expected:
print(f"{name} found an inconsistent balance: ${total}")
break
iterations += 1
if iterations > 100_000:
print(f'{name} reached iteration limit. Stopping...')
return
t1 = threading.Thread(target=move_funds, name='Thread-1', args=('a1', 'a2', lock_1, lock_2, 2000))
t2 = threading.Thread(target=move_funds, name='Thread-2', args=('a2', 'a1', lock_2, lock_1, 2000))
t1.start()
t2.start()
[t.join() for t in (t1, t2)];
sum(ACCOUNTS.values())
lock_1.locked(), lock_2.locked()
As you can see, we've just prevented the deadlock. The key code piece that prevents the deadlock is the following:
locked = False
while not locked:
res1 = lock_from.acquire(timeout=LOCK_TIMEOUT)
res2 = lock_to.acquire(timeout=LOCK_TIMEOUT)
if all([res1, res2]):
# Success! We acquired both locks
locked = True
else:
# Release locks "partially" acquired
if res1:
lock_from.release()
if res2:
lock_to.release()
If we successfully acquire both locks within the timeout window, we can proceed to perform our critical section. In other case, we'll release any "partially" acquired locks.
Thread Synchronization: Summary¶
There are several other synchronization mechanisms that we're not explicitly talking about, like Semaphores
, Conditions
, Events
, Barriers
, etc. These follow the same principles as locks, but are used for other purposes.
The main takeaway from this lesson is: synchronization is HARD, and error/bug prone. Even the most experience developers avoid writing synchronized code, there's always something going wrong.
Still, synchronization seems to be a necessary evil. We don't want to have race conditions in our code. In our following lessons we'll explore other alternatives to create correct code without the need of working with synchronization.