Concurrency-11-Python-Gil

Last updated: February 17th, 20212021-02-17Project preview
In [1]:
import time
import itertools
from queue import Queue
from threading import Thread

import requests

You've probably heard about the feared "Python GIL". In this lecture we'll learn about it and why it's such a problem. But first, let's talk about computing prime numbers.

We'll write a very naive function to check if a number is prime or not:

In [2]:
def is_prime(n):
    if n in (2, 3):
        return True
    if n % 2 == 0:
        return False
    for divisor in range(3, n, 2):
        if n % divisor == 0:
            return False
    return True
In [3]:
is_prime(19)
Out[3]:
True
In [4]:
is_prime(8)
Out[4]:
False

We have several numbers in a text file that we want to check for their "primality":

In [5]:
!cat prime_mixture.txt
15492781
15492787
15492803
15492811
15492810
15492833
15492859
15502547
15520301
15527509
In [6]:
with open('prime_mixture.txt') as fp:
    numbers = [int(n.strip()) for n in fp.read().split() if n]
In [7]:
numbers[:5]
Out[7]:
[15492781, 15492787, 15492803, 15492811, 15492810]

Searching primes

Understanding the problem, designing the solution

As great software engineers we are, we'll start first by understanding our problem. How much time does it take to find out if a number is prime or not?

In [11]:
start = time.time()
In [12]:
is_prime(numbers[0])
Out[12]:
True
In [13]:
time.time() - start
Out[13]:
0.8977217674255371

Approximately 0.5 seconds (we don't need to be very accurate, don't worry). If we have 10 prime numbers:

In [14]:
len(numbers)
Out[14]:
10

we could expect a single threaded solution to take ~5 seconds. Let's start with that approach first:

Single threaded approach

A single threaded solution is the most basic one we can think of. No threads, locks, queues or concurrency. Plain old Python code to get the job done:

In [15]:
def search_primes_single_thread(numbers):
    return [n for n in numbers if is_prime(n)]
In [16]:
start = time.time()
In [17]:
results = search_primes_single_thread(numbers)
In [18]:
f"Time: {time.time() - start} seconds. Found {len(results)} primes out of {len(numbers)} total numbers."
Out[18]:
'Time: 7.3194098472595215 seconds. Found 9 primes out of 10 total numbers.'

As we can see, it took less than 5 seconds, but it's within the same order of magnitude: between 1 and 10 seconds.

Speeding things up with multiple threads

We quickly realize that we could improve a lot the solution by using multiple threads. If I have 16 cores in this computer, each one of them can calculate a prime at the same time, and we'll be done a lot quicker. How quicker? Well, assuming I have 16 cores, and each core will definitively take 1 number to process, our solution should take no more than a second. The slowest prime to compute will be the total time.

Let's try it out!

In [19]:
def check_prime_worker(number, results):
    if is_prime(number):
        results.append(number)

(We should potentially use a thread safe collection, I know in CPython list append operations are thread safe)

In [20]:
results = []
In [21]:
threads = [Thread(target=check_prime_worker, args=(number, results)) for number in numbers]
In [23]:
start = time.time()
In [24]:
[t.start() for t in threads];
In [25]:
[t.join() for t in threads];
In [26]:
f"Time: {time.time() - start} seconds. Found {len(results)} primes out of {len(numbers)} total numbers."
Out[26]:
'Time: 7.55951189994812 seconds. Found 9 primes out of 10 total numbers.'

WHAT! 😦 4 seconds! That's even slower than the sequential single-threaded solution. What is going on here? 🤔

Congratulations, let me introduce you to:

What is the GIL?

The GIL is a safety mechanism added to cpython (and Cpython only) that prevents multiple threads from running at the same time. This is completely counter intuitive, as the objective of having multiple threads is to make them run in parallel: that is, 2 or more at the same time.

Understanding the GIL is outside the scope of this presentation. There's one important point that you should understand about the GIL: your threads will "suffer" the GIL ONLY if you're running CPU Bound tasks. I/O tasks will release the GIL and let other threads run.

If you want to know more about the GIL, this talk from Larry Hastings is just amazing: https://www.youtube.com/watch?v=KVKufdTphKs

As a summary:

  • The GIL is a necessary evil. It made Python's C API simple to understand and extend.
  • Guido is open to remove the GIL, ONLY if it doesn't hurt performance (Larry demonstrated in another talk that it's very hard to achieve that)
  • Only CPython has a GIL. Other interpreters (Jython, PyPy, Ironpython) don't.
  • The GIL was "broken" in Python 2, but has been fixed in Python 3.2. We've reached "peak" performance so Larry doesn't think there will be significant improvements there.

An I/O bound demonstration

Let's revisit the example from our first lesson and run an I/O bound task multithreaded to prove the GIL is not an issue in this case. We'll use our crypto_db command, in this case with a 2 second artificial sleep parameter:

$ python crypto_db --sleep 2

Sequential first

We'll check our crypto prices again. This time, each request is artificially delayed 3 seconds. We'll check only 3 prices, for the same date, so it's going to take approximately 6 seconds:

In [27]:
BASE_URL = "http://localhost:8080"
In [28]:
EXCHANGES = ['bitfinex', 'bitstamp', 'kraken']
In [29]:
start = time.time()
In [30]:
prices = [
    requests.get(f"{BASE_URL}/price/{exchange}/btc/2020-04-01").json()['close']
    for exchange in EXCHANGES
]
In [31]:
time.time() - start
Out[31]:
6.1959497928619385

As expected, ~6 seconds.

Now, the multithreaded version

We'll try now the same example with multiple threads. If our claims about the GIL being released by I/O Bound tasks is true, then the whole task should take around 2 seconds, let's check it out:

In [32]:
def check_price(exchange, results):
    return requests.get(f"{BASE_URL}/price/{exchange}/btc/2020-04-01").json()['close']
In [33]:
results = []
In [34]:
threads = [Thread(target=check_price, args=(exchange, results)) for exchange in EXCHANGES]
In [35]:
start = time.time()
In [36]:
[t.start() for t in threads];
In [37]:
[t.join() for t in threads];
In [38]:
time.time() - start
Out[38]:
2.127873659133911

Success! We've now corroborated that the GIL is actually released when waiting for I/O, which makes our programs "feel" like running in parallel.

Summary

In this lesson we learned about one of the most "hated" features of Python: the GIL. Every post you read that is titled "why not Python?" or "Python vs [insert language]" will mention the GIL as a major drawback.

In our next lesson we'll learn how we can improve our code if it's CPU bound.

Notebooks AI
Notebooks AI Profile20060