Concurrency-3-Thread-Basics

Last updated: February 16th, 20212021-02-16Project preview

In our previous lesson we saw how to achieve very basic concurrency (and potentially parallelism) using multiple processes. But the idea of this course is to teach you how YOU can program, explicitly, concurrent (and parallel) programs.

Our first stop will be Multithreading). We want to understand what are threads, and how we can make use of them.

Multithreading

Achieving concurrency by using processes is both inconvenient (there's 0 control from you as a programmer), and expensive in terms of resources. Remember how the OS keeps information about all these resources (memory allocated, file descriptors, etc), those resources would be duplicated when running multiple processes.

That's why Threads) were invented. Threads are a mechanism to achieve concurrency in a program, within the same process. A program can spawn multiple threads, that will be executed concurrently (and potentially in parallel) by the OS/CPU.

Threads give us, programmers, the chance to make OUR own programs concurrent.

Threads

We'll talk about threads from a general point of view first, as an Operating System concept, and then we'll see how to use them in Python.

A thread is an Operating System feature that allows a process to spawn multiple "tasks" that can be executed independently. Instead of having a single-threaded, sequence program, we can use multiple threads of executions. This is what a threads look like in a process:

threads

as you can see, there are multiple "execution threads" running at the same time. The OS scheduler will give them CPU time independently, and potentially in parallel (if the right conditions are met).

Now, from the above image, can you note something special? In the multithreaded illustration, you'll see that each thread has its own stack and registers, but then, they all access to the same data and files. This introduces us to a VERY IMPORTANT concept: threads share memory.

All threads can access the general memory of the process. That means that they can also change it, which will create some problems we'll talk about later.

Reasoning about threads

How do we think about threads in our program? After all, now the responsibility of concurrency is in OUR hands. Let's see an example.

Suppose we have to fetch data from 3 different sites and then combine it for some final report. Let's also say that these sites are not very fast, each request takes 2 seconds. A simple program would do:

# program start

d1 = get_website_1() # 2 secs
d2 = get_website_2() # 2 secs
d3 = get_website_3() # 2 secs

combine(d1, d2, d3) # very fast

The total execution time of this program would be ~6 seconds:

sequential-program

This is an excellent use case for threads. Remember that I/O operations are slow, and the CPU is just waiting to get the response from the web.

A thread base program would fire concurrently all 3 website requests, and wait until they all complete:

# program start
d1, d2, d3 = run_concurrently(
    get_website_1,
    get_website_2,
    get_website_3)  # 2 secs

combine(d1, d2, d3)  # very fast

In this case, the functions get_website_1, get_website_2 and get_website_3 are all running concurrently. Once all of them are done, the program moves to the following line, to combine them. The total running time of this program would be around ~2 seconds.

concurrent-program

Threads in Python

Threads in Python are not as simple as the run_concurrently function shown above (although there are external libraries, like parallel, that support a similar API).

In our following lesson, we'll explore how to create and use threads using Python.

Notebooks AI
Notebooks AI Profile20060