In our previous lesson we saw how to achieve very basic concurrency (and potentially parallelism) using multiple processes. But the idea of this course is to teach you how YOU can program, explicitly, concurrent (and parallel) programs.
Our first stop will be Multithreading). We want to understand what are threads, and how we can make use of them.
Achieving concurrency by using processes is both inconvenient (there's 0 control from you as a programmer), and expensive in terms of resources. Remember how the OS keeps information about all these resources (memory allocated, file descriptors, etc), those resources would be duplicated when running multiple processes.
That's why Threads) were invented. Threads are a mechanism to achieve concurrency in a program, within the same process. A program can spawn multiple threads, that will be executed concurrently (and potentially in parallel) by the OS/CPU.
Threads give us, programmers, the chance to make OUR own programs concurrent.
We'll talk about threads from a general point of view first, as an Operating System concept, and then we'll see how to use them in Python.
A thread is an Operating System feature that allows a process to spawn multiple "tasks" that can be executed independently. Instead of having a single-threaded, sequence program, we can use multiple threads of executions. This is what a threads look like in a process:
as you can see, there are multiple "execution threads" running at the same time. The OS scheduler will give them CPU time independently, and potentially in parallel (if the right conditions are met).
Now, from the above image, can you note something special? In the multithreaded illustration, you'll see that each thread has its own stack and registers, but then, they all access to the same data and files. This introduces us to a VERY IMPORTANT concept: threads share memory.
All threads can access the general memory of the process. That means that they can also change it, which will create some problems we'll talk about later.
Reasoning about threads¶
How do we think about threads in our program? After all, now the responsibility of concurrency is in OUR hands. Let's see an example.
Suppose we have to fetch data from 3 different sites and then combine it for some final report. Let's also say that these sites are not very fast, each request takes 2 seconds. A simple program would do:
# program start d1 = get_website_1() # 2 secs d2 = get_website_2() # 2 secs d3 = get_website_3() # 2 secs combine(d1, d2, d3) # very fast
The total execution time of this program would be ~6 seconds:
This is an excellent use case for threads. Remember that I/O operations are slow, and the CPU is just waiting to get the response from the web.
A thread base program would fire concurrently all 3 website requests, and wait until they all complete:
# program start d1, d2, d3 = run_concurrently( get_website_1, get_website_2, get_website_3) # 2 secs combine(d1, d2, d3) # very fast
In this case, the functions
get_website_3 are all running concurrently. Once all of them are done, the program moves to the following line, to combine them. The total running time of this program would be around ~2 seconds.