Last updated: December 30th, 20202020-12-30Project preview

To really grasp the concepts related to concurrency and parallelism that we'll explore in this course, we must first start with the basics of computer architecture and operating systems.

Note: This is a very simplified introduction to both Computer Architecture and OS concepts. If you're interested in digger deeper (that'd be actually great!) please check the references at the end of the notebook.

Computer architecture

We can cover all the necessary basics of computer architecture by looking at the von Neumann architecture, an early architecture proposed for the first computers designed. Today's computers have evolved from it, but we can still map the main components that will let us better understand concurrency.

This is a (very simplified) diagram of the von Neumann architecture:

We have 3 main players:

  • CPU: The "processing" unit, doing the computations
  • Memory: a fast storage unit where the cpu places and fetches data
  • I/O: all Input (keyboard, mouse, etc) and Output (monitor, sound, printer, etc)

Any program you write will interface with one of this components, let's see an example:

# Data stored in memory
x = 1
# Calculation done in CPU
x += 3

# I/O (write to a file)
with open('res.txt', 'w') as fp:
    fp.write(f"The value is {x}")

# I/O (print to screen)

The reason this is important is because the latency (speed) of these different components are extremely different. A simple print statement can be thousands of times slower than a CPU calculation. This latency will affect the ability of our program to be "parallel". Here's a table showing the latency of different operations (source):

System Event Actual Latency Scaled Latency
One CPU cycle 0.4 ns 1 s
Level 1 cache access 0.9 ns 2 s
Level 2 cache access 2.8 ns 7 s
Level 3 cache access 28 ns 1 min
Main memory access (DDR DIMM) ~100 ns 4 min
Intel Optane memory access <10 μs 7 hrs
NVMe SSD I/O ~25 μs 17 hrs
SSD I/O 50–150 μs 1.5–4 days
Rotational disk I/O 1–10 ms 1–9 months
Internet call: San Francisco to New York City 65 ms 5 years
Internet call: San Francisco to Hong Kong 141 ms 11 years

As you can see, there's a huge difference between the three main components: CPU is ~200 times faster than accessing memory, which in turn, is ~1000 times faster than accessing an SSD. It's not important to remember the precise value of the latency of each component, just have an idea of the and how faster the CPU and RAM are compared to I/O.

The Operating System

I usually say that the OS is the Guardian of the System. In any modern Operating System, our programs don't have direct access to the components of the computer. All the operations we write in our code are actually "requests" to the Operating System to perform for us.

The OS, guardian of the system

The OS does this by executing our code in the context of a Process, which it can uniquely identify. This way, it knows which process is trying to do what, at each time.

The process

A Process is an abstraction that encapsulates a running program. The OS uses this process to store information about the program. For example, the current state (if it's running, terminated, etc), the current line that is being executed, how much memory it had allocated, file descriptors, etc.

Running programs

We're going to illustrate this concept of a "process" by running the same "program" (the same code), several times. The code is under, you can check the code, but it's a very simple program that just sleeps for some time and then exits.

We'll run the same program/code several times, with different parameters and inspect our running processes to see how multiple processes have been created. Run the program using:

$ python [SECONDS] &

For example:

$ python 60 &
$ python 65 &
$ python 70 &

Then use the ps command to inspect your running processes, you should see the N processes started before. You can also try top -c. Here's an example of the running processes:


As you can see, the same program is being executed, concurrently, by three different processes. Each process has a different ID, and they're all isolated.

This is an illustration of what's going on within the Operating System:


OS Concurrency

I learned how to code on an old Pentium 4 computer running Windows 98. Pentium 4 CPUs were "single core", that meant, they had just one CPU. How many processes do you think I could run at the same time?

The answer is straightforward: just 1. 1 process at a time. Yet, it felt as I was running several things at the same time. I could drag my mouse pointer surfing the Web 1.0, while listening to music and doing some background processing. How was it possible?

CPU Scheduling

What the OS does to achieve this "multitasking" feeling with just 1 CPU, is running ALL the processes, for short periods of time and swapping them in and out:


This gives a chance to all the processes to run "concurrently", for short periods of time. There are many mechanisms employed by Operating Systems to decide what process should be running at a given time. The simplest way, and what early OSs used to do, was a simple "round robin". Each process gets the same time, in order.

What OS designers quickly realized was that the "status" of the process was very important. Remember the latency table we talked about before? Now it's important. If a given process is currently doing a network request, that process is going to take a lot of time, so the Scheduler shouldn't give it the chance to run for quite some time. This is an oversimplification, but you can check the wikipedia article) to know about other mechanisms employed by OSs.

What's important is that, even with just 1 CPU, we've achieved what we call CONCURRENCY, several processes have been started at the same time, and we have this "feeling" that they're running in parallel. But we know that is actually impossible.

Now, what happens if we actually have multiple CPUs, or multiple cores. In that case, the OS is capable of actually scheduling several processes in parallel:


In the image above, it's an illustration of an example of a CPU scheduler on a 2-core system. If you look closely, you'll be able to spot 2 things: some periods of time when there are 2 processes running in parallel, and also some idle time.

You can also see how P2 has only short bursts of CPU time. That is probably an I/O bound process. Again, refer to our latency table from before.

Process concurrency

In the following lesson, we'll learn how to write concurrent programs using Python Threads. But for now, you have already learned how to do some very primitive concurrency. You can just spawn several processes of your same program with different arguments and that could speed things up. For example, let's say you have to process a large 3,000,000-line file. You could write your code in a way that takes the starting and ending line, and just spawn 3 (or more) processes:

$ python large_file.txt 1 1_000_000 &
$ python large_file.txt 1_000_001 2_000_000 &
$ python large_file.txt 2_000_001 3_000_000 &
Notebooks AI
Notebooks AI Profile20060