import time
import itertools
from queue import Queue
from threading import Thread
import requests
You've probably heard about the feared "Python GIL". In this lecture we'll learn about it and why it's such a problem. But first, let's talk about computing prime numbers.
We'll write a very naive function to check if a number is prime or not:
def is_prime(n):
if n in (2, 3):
return True
if n % 2 == 0:
return False
for divisor in range(3, n, 2):
if n % divisor == 0:
return False
return True
is_prime(19)
is_prime(8)
We have several numbers in a text file that we want to check for their "primality":
!cat prime_mixture.txt
with open('prime_mixture.txt') as fp:
numbers = [int(n.strip()) for n in fp.read().split() if n]
numbers[:5]
start = time.time()
is_prime(numbers[0])
time.time() - start
Approximately 0.5 seconds (we don't need to be very accurate, don't worry). If we have 10 prime numbers:
len(numbers)
we could expect a single threaded solution to take ~5 seconds. Let's start with that approach first:
Single threaded approach¶
A single threaded solution is the most basic one we can think of. No threads, locks, queues or concurrency. Plain old Python code to get the job done:
def search_primes_single_thread(numbers):
return [n for n in numbers if is_prime(n)]
start = time.time()
results = search_primes_single_thread(numbers)
f"Time: {time.time() - start} seconds. Found {len(results)} primes out of {len(numbers)} total numbers."
As we can see, it took less than 5 seconds, but it's within the same order of magnitude: between 1 and 10 seconds.
Speeding things up with multiple threads¶
We quickly realize that we could improve a lot the solution by using multiple threads. If I have 16 cores in this computer, each one of them can calculate a prime at the same time, and we'll be done a lot quicker. How quicker? Well, assuming I have 16 cores, and each core will definitively take 1 number to process, our solution should take no more than a second. The slowest prime to compute will be the total time.
Let's try it out!
def check_prime_worker(number, results):
if is_prime(number):
results.append(number)
(We should potentially use a thread safe collection, I know in CPython list append operations are thread safe)
results = []
threads = [Thread(target=check_prime_worker, args=(number, results)) for number in numbers]
start = time.time()
[t.start() for t in threads];
[t.join() for t in threads];
f"Time: {time.time() - start} seconds. Found {len(results)} primes out of {len(numbers)} total numbers."
WHAT! 😦 4 seconds! That's even slower than the sequential single-threaded solution. What is going on here? 🤔
Congratulations, let me introduce you to:

What is the GIL?¶
The GIL is a safety mechanism added to cpython (and Cpython only) that prevents multiple threads from running at the same time. This is completely counter intuitive, as the objective of having multiple threads is to make them run in parallel: that is, 2 or more at the same time.
Understanding the GIL is outside the scope of this presentation. There's one important point that you should understand about the GIL: your threads will "suffer" the GIL ONLY if you're running CPU Bound tasks. I/O tasks will release the GIL and let other threads run.
If you want to know more about the GIL, this talk from Larry Hastings is just amazing: https://www.youtube.com/watch?v=KVKufdTphKs
As a summary:
- The GIL is a necessary evil. It made Python's C API simple to understand and extend.
- Guido is open to remove the GIL, ONLY if it doesn't hurt performance (Larry demonstrated in another talk that it's very hard to achieve that)
- Only CPython has a GIL. Other interpreters (Jython, PyPy, Ironpython) don't.
- The GIL was "broken" in Python 2, but has been fixed in Python 3.2. We've reached "peak" performance so Larry doesn't think there will be significant improvements there.
An I/O bound demonstration¶
Let's revisit the example from our first lesson and run an I/O bound task multithreaded to prove the GIL is not an issue in this case. We'll use our crypto_db
command, in this case with a 2 second artificial sleep parameter:
$ python crypto_db --sleep 2
Sequential first¶
We'll check our crypto prices again. This time, each request is artificially delayed 3 seconds. We'll check only 3 prices, for the same date, so it's going to take approximately 6 seconds:
BASE_URL = "http://localhost:8080"
EXCHANGES = ['bitfinex', 'bitstamp', 'kraken']
start = time.time()
prices = [
requests.get(f"{BASE_URL}/price/{exchange}/btc/2020-04-01").json()['close']
for exchange in EXCHANGES
]
time.time() - start
As expected, ~6 seconds.
Now, the multithreaded version¶
We'll try now the same example with multiple threads. If our claims about the GIL being released by I/O Bound tasks is true, then the whole task should take around 2 seconds, let's check it out:
def check_price(exchange, results):
return requests.get(f"{BASE_URL}/price/{exchange}/btc/2020-04-01").json()['close']
results = []
threads = [Thread(target=check_price, args=(exchange, results)) for exchange in EXCHANGES]
start = time.time()
[t.start() for t in threads];
[t.join() for t in threads];
time.time() - start
Success! We've now corroborated that the GIL is actually released when waiting for I/O, which makes our programs "feel" like running in parallel.
Summary¶
In this lesson we learned about one of the most "hated" features of Python: the GIL. Every post you read that is titled "why not Python?" or "Python vs [insert language]" will mention the GIL as a major drawback.
In our next lesson we'll learn how we can improve our code if it's CPU bound.