Profile picture

Co-founder @ RMOTR

Intro to Python Demo

Last updated: March 24th, 20202020-03-24Project preview

Chapter 1: Elements of a Program

Do you remember how you first learned to speak in your mother tongue? Probably not. No one's memory goes back that far. Your earliest memory as a child should probably be around the age of three or four years old when you could already say simple things and interact with your environment. Although you did not know any grammar rules yet, other people just understood what you said. At least most of the time.

It is intuitively best to take the very mindset of a small child when learning a new language. And a programming language is no different from that. This first chapter introduces simplistic examples and we accept them as they are without knowing any of the "grammar" rules yet. Then, we analyze them in parts and slowly build up our understanding.

Consequently, if parts of this chapter do not make sense right away, let's not worry too much. Besides introducing the basic elements, it also serves as an outlook for what is to come. So, many terms and concepts used here are deconstructed in great detail in the following chapters.

Example: Averaging all even Numbers in a List

As our introductory example, we want to calculate the average of all evens in a list of whole numbers: [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4].

While we are used to finding an analytical solution in math (i.e., derive some equation with "pen and paper"), we solve this task programmatically instead.

We start by creating a list called numbers that holds all the individual numbers between brackets [ and ].

In [1]:
numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

To verify that something happened in our computer's memory, we reference numbers.

In [2]:
numbers
Out[2]:
[7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

So far, so good. Let's see how the desired computation could be expressed as a sequence of instructions in the next code cell.

Intuitively, the line for number in numbers describes a "loop" over all the numbers in the numbers list, one at a time.

The if number % 2 == 0 may look confusing at first sight. Both % and == must have an unintuitive meaning here. Luckily, the comment in the same line after the # symbol has the answer: The program does something only for an even number.

In particular, it increases count by 1 and adds the current number onto the running total. Both count and number are initialized to 0 and the single = symbol reads as "... is set equal to ...". It cannot indicate a mathematical equation as, for example, count is generally not equal to count + 1.

Lastly, the average is calculated as the ratio of the final values of total and count. Overall, we divide the sum of all even numbers by their count: This is nothing but the definition of an average.

The lines of code "within" the for and if statements are indented and aligned with multiples of four spaces: This shows immediately how the lines relate to each other.

In [3]:
count = 0  # initialize variables to keep track of the
total = 0  # running total and the count of even numbers

for number in numbers:
    if number % 2 == 0:  # only work with even numbers
        count = count + 1
        total = total + number

average = total / count

We do not see any output yet but obtain the value of average by referencing it again.

In [4]:
average
Out[4]:
7.0

Output in a Jupyter Notebook

Only two of the previous four code cells generate an output while two remained "silent" (i.e., nothing appears below the cell after running it).

By default, Jupyter notebooks only show the value of the expression in the last line of a code cell. And, this output may also be suppressed by ending the last line with a semicolon ;.

In [5]:
"Hello, World!"
"I am feeling great :-)"
Out[5]:
'I am feeling great :-)'
In [6]:
"I am invisible!";

To see any output other than that, we use the built-in print() function. Here, the parentheses () indicate that we call (i.e., "execute") code written somewhere else.

In [7]:
print("Hello, World!")
print("I am feeling great :-)")
Hello, World!
I am feeling great :-)

Outside Jupyter notebooks, the semicolon ; is used as a separator between statements that must otherwise be on a line on their own. However, it is not considered good practice to use it as it makes code less readable.

In [8]:
print("Hello, World!"); print("I am feeling great :-)")
Hello, World!
I am feeling great :-)

(Arithmetic) Operators

Python comes with many built-in operators: They are tokens (i.e., "symbols") that have a special meaning to the Python interpreter.

The arithmetic operators either "operate" with the number immediately following them, so-called unary operators (e.g., negation), or "process" the two numbers "around" them, so-called binary operators (e.g., addition).

By definition, operators on their own have no permanent side effects in the computer's memory. Although the code cells in this section do indeed create new numbers in memory (e.g., 77 + 13 creates 90), they are immediately "forgotten" as they are not stored in a variable like numbers or average above. We develop this thought further at the end of this chapter when we compare expressions with statements.

Let's see some examples of operators. We start with the binary + and the - operators for addition and subtraction. Binary operators mimic what mathematicians call infix notation and have the expected meaning.

In [9]:
77 + 13
Out[9]:
90
In [10]:
101 - 93
Out[10]:
8

The - operator may be used as a unary operator as well. Then, it unsurprisingly flips the sign of a number.

In [11]:
-1
Out[11]:
-1

When we compare the output of the * and / operators for multiplication and division, we note the subtle difference between the 42 and the 42.0: They are the same number represented as a different data type.

In [12]:
2 * 21
Out[12]:
42
In [13]:
84 / 2
Out[13]:
42.0

The so-called floor division operator // always "rounds" to an integer and is thus also called integer division operator. It is an example of an arithmetic operator we commonly do not know from high school mathematics.

In [14]:
84 // 2
Out[14]:
42
In [15]:
85 // 2
Out[15]:
42

Even though it appears that the // operator truncates (i.e., "cuts off") the decimals so as to effectively "round down" (i.e., the 42.5 became 42 in the previous code cell), this is not the case: The result is always "rounded" towards minus infinity!

In [16]:
-85 // 2
Out[16]:
-43

To obtain the remainder of a division, we use the modulo operator %.

In [17]:
85 % 2
Out[17]:
1

The remainder is 0 only if a number is divisible by another.

A popular convention in both computer science and mathematics is to abbreviate "only if" as "iff", which is short for "if and only if." The iff means that a remainder of 0 implies that a number is divisible by another but also that a number's being divisible by another implies a remainder of 0. The implication goes in both directions!

So, 49 is divisible by 7.

In [18]:
49 % 7
Out[18]:
0

Modulo division is also useful if we want to extract the last couple of digits in a large integer.

In [19]:
789 % 10
Out[19]:
9
In [20]:
789 % 100
Out[20]:
89

The built-in divmod() function combines the integer and modulo divisions into one step. However, grammatically this is not an operator but a function. Also, divmod() returns a "pair" of integers and not a single one.

In [21]:
divmod(42, 10)
Out[21]:
(4, 2)

Raising a number to a power is performed with the exponentiation operator **. It is different from the ^ operator other programming languages may use and that also exists in Python with a different meaning.

In [22]:
2 ** 3
Out[22]:
8

The standard order of precedence from mathematics applies (i.e., PEMDAS rule) when several operators are combined.

In [23]:
3 ** 2 * 2 
Out[23]:
18

Parentheses help avoid confusion and take the role of a delimiter here.

In [24]:
(3 ** 2) * 2
Out[24]:
18
In [25]:
3 ** (2 * 2)
Out[25]:
81

Some programmers also use "style" conventions. For example, we might play with the whitespace, which is an umbrella term that refers to any non-printable sign like spaces, tabs, or the like. However, this is not a good practice and parentheses convey a much clearer picture.

In [26]:
3**2 * 2  # bad style; it is better to use parentheses here
Out[26]:
18

There exist many non-mathematical operators that are introduced throughout this book, together with the concepts they implement. They often come in a form different from the unary and binary ones mentioned above.

Objects vs. Types vs. Values

Python is a so-called object-oriented language, which is a paradigm of organizing a program's memory.

An object may be viewed as a "bag" of $0$s and $1$s in a given memory location. The $0$s and $1$s in a bag make up the object's value. There exist different types of bags, and each type comes with its own rules how the $0$s and $1$s are interpreted and may be worked with.

So, an object always has three main characteristics. Let's look at the following examples and work them out.

In [27]:
a = 42
b = 42.0
c = "Python rocks"

Identity / "Memory Location"

The built-in id() function shows an object's "address" in memory.

In [28]:
id(a)
Out[28]:
139627575613392
In [29]:
id(b)
Out[29]:
140361812063056
In [30]:
id(c)
Out[30]:
139627575358192

These addresses are not meaningful for anything other than checking if two variables reference the same object.

Obviously, a and b have the same value as revealed by the equality operator ==: We say a and b "evaluate equal." The resulting True - and the False further below - is yet another data type, a so-called boolean. We look into them in Chapter 3.

In [31]:
a == b
Out[31]:
True

On the contrary, a and b are different objects as the identity operator is shows: They are stored at different addresses in the memory.

In [32]:
a is b
Out[32]:
False

If we want to check the opposite case, we use the negated version of the is operator, namely is not.

In [33]:
a is not b
Out[33]:
True

(Data) Type / "Behavior"

The type() built-in shows an object's type. For example, a is an integer (i.e., int) while b is a so-called floating-point number (i.e., float).

In [34]:
type(a)
Out[34]:
int
In [35]:
type(b)
Out[35]:
float

Different types imply different behaviors for the objects. The b object, for example, may be "asked" if it is a whole number with the is_integer() "functionality" that comes with every float object.

Formally, we call such type-specific functionalities methods (i.e., as opposed to functions) and we look at them in detail in Chapter 10. For now, it suffices to know that we access them with the dot operator . on the object. Of course, b is a whole number, which the boolean object True tells us.

In [36]:
b.is_integer()
Out[36]:
True

For an int object, this is_integer() check does not make sense as we already know it is an int: We see the AttributeError below as a does not even know what is_integer() means.

In [37]:
a.is_integer()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-37-7db0a38aefcc> in <module>
----> 1 a.is_integer()

AttributeError: 'int' object has no attribute 'is_integer'

The c object is a so-called string type (i.e., str), which is Python's way of representing "text." Strings also come with peculiar behaviors, for example, to make a text lower or upper case.

In [38]:
type(c)
Out[38]:
str
In [39]:
c.lower()
Out[39]:
'python rocks'
In [40]:
c.upper()
Out[40]:
'PYTHON ROCKS'

Value / (Semantic) "Meaning"

Almost trivially, every object also has a value to which it evaluates when referenced. We think of the value as the conceptual idea of what the $0$s and $1$s in the bag mean to humans. In other words, an object's value regards its semantic meaning.

For built-in data types, Python prints out an object's value as a so-called literal: This means that we may copy and paste the value back into a code cell and create a new object with the same value.

In [42]:
a
Out[42]:
42
In [43]:
b
Out[43]:
42.0

In this book, we follow the convention of creating strings with double quotes " instead of the single quotes ' to which Python defaults in its literal notation for str objects. Both types of quotes may be used interchangeably. So, the "Python rocks" from above and 'Python rocks' below create two objects that evaluate equal (i.e., "Python rocks" == 'Python rocks').

In [44]:
c
Out[44]:
'Python rocks'

Formal vs. Natural Languages

Just like the language of mathematics is good at expressing relationships among numbers and symbols, any programming language is just a formal language that is good at expressing computations.

Formal languages come with their own "grammatical rules" called syntax.

Syntax Errors

If we do not follow the rules, the code cannot be parsed correctly, i.e., the program does not even start to run but raises a syntax error indicated as SyntaxError in the output. Computers are very dumb in the sense that the slightest syntax error leads to the machine not understanding our code.

If we were to write an accounting program that adds up currencies, we would, for example, have to model dollar prices as float objects as the dollar symbol cannot be understood by Python.

In [45]:
3.99 $ + 10.40 $
  File "<ipython-input-45-cafa82e54b9c>", line 1
    3.99 $ + 10.40 $
         ^
SyntaxError: invalid syntax

Python requires certain symbols at certain places (e.g., a : is missing here).

In [46]:
for number in numbers
    print(number)
  File "<ipython-input-46-499e4d0d0cbb>", line 1
    for number in numbers
                         ^
SyntaxError: invalid syntax

Furthermore, it relies on whitespace (i.e., indentation), unlike many other programming languages. The IndentationError below is just a particular type of a SyntaxError.

In [47]:
for number in numbers:
print(number)
  File "<ipython-input-47-19398c5f89de>", line 2
    print(number)
        ^
IndentationError: expected an indented block

Runtime Errors

Syntax errors are easy to find as the code does not even run in the first place.

However, there are also so-called runtime errors that occur whenever otherwise (i.e., syntactically) correct code does not run because of invalid input. Runtime errors are also often referred to as exceptions.

This example does not work because just like in the "real" world, Python does not know how to divide by 0. The syntactically correct code leads to a ZeroDivisionError.

In [48]:
1 / 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-48-bc757c3fda29> in <module>
----> 1 1 / 0

ZeroDivisionError: division by zero

Semantic Errors

So-called semantic errors, on the contrary, are hard to spot as they do not crash the program. The only way to find such errors is to run a program with test input for which we can predict the output. However, testing software is a whole discipline on its own and often very hard to do in practice.

The cell below copies our first example from above with a "tiny" error. How fast could you have spotted it without the comment?

In [49]:
count = 0
total = 0

for number in numbers:
    if number % 2 == 0:
        count = count + 1
        total = total + count  # count is wrong here, it should be number

average = total / count
In [50]:
average
Out[50]:
3.5

Systematically finding errors is called debugging. For the history of the term, see this article.

Best Practices

Thus, adhering to just syntax rules is never enough. Over time, best practices and style guides were created to make it less likely for a developer to mess up a program and also to allow "onboarding" him as a contributor to an established code base, often called legacy code, faster. These rules are not enforced by Python itself: Badly styled code still runs. At the very least, Python programs should be styled according to PEP 8 and documented "inline" (i.e., in the code itself) according to PEP 257.

An easier to read version of PEP 8 is here. The video below features a well known Pythonista talking about the importance of code style.

In [51]:
from IPython.display import YouTubeVideo
YouTubeVideo("Hwckt4J96dI", width="60%")
Out[51]:

For example, while the above code to calculate the average of the even numbers in [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4] is correct, a Pythonista would rewrite it in a more "Pythonic" way and use the built-in sum() and len() functions (cf., Chapter 2) as well as a so-called list comprehension (cf., Chapter 7). Pythonic code runs faster in many cases and is less error-prone.

In [52]:
numbers = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]
In [53]:
evens = [n for n in numbers if n % 2 == 0]  # use of a list comprehension
In [54]:
evens
Out[54]:
[8, 12, 2, 6, 10, 4]
In [55]:
average = sum(evens) / len(evens)  # use built-in functions
In [56]:
average
Out[56]:
7.0

To get a rough overview of the mindset of a typical Python programmer, look at these rules, also known as the Zen of Python, that are deemed so important that they are included in every Python installation.

In [57]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Jupyter Notebook Aspects

The Order of Code Cells is arbitrary

We can run the code cells in a Jupyter notebook in any arbitrary order.

That means, for example, that a variable defined towards the bottom could accidentally be referenced at the top of the notebook. This happens quickly when we iteratively built a program and go back and forth between cells.

As a good practice, it is recommended to click on "Kernel" > "Restart Kernel and Run All Cells" in the navigation bar once a notebook is finished. That restarts the Python process forgetting all state (i.e., all variables) and ensures that the notebook runs top to bottom without any errors the next time it is opened.

Notebooks are linear

While this book is built with Jupyter notebooks, it is crucial to understand that "real" programs are almost never "linear" (i.e., top to bottom) sequences of instructions but instead may take many different flows of execution.

At the same time, for a beginner's course, it is often easier to code linearly.

In real data science projects, one would probably employ a mixed approach and put reusable code into so-called Python modules (i.e., .py files; cf., Chapter 2) and then use Jupyter notebooks to build up a linear report or storyline for an analysis.

Variables vs. Names vs. Identifiers vs. References

Variables are created with the assignment statement =, which is not an operator because of its side effect of making a name reference an object in memory.

We read the terms variable, name, and identifier used interchangebly in many Python-related texts. In this book, we adopt the following convention: First, we treat name and identifier as perfect synonyms but only use the term name in the text for clarity. Second, whereas name only refers to a string of letters, numbers, and some other symbols, a variable means the combination of a name and a reference to an object in memory.

In [58]:
variable = 20.0

When used as a literal, a variable evaluates to the value of the object it references. Colloquially, we could say that variable evaluates to 20.0, but this would not be an accurate description of what is going on in memory. We see some more colloquialisms in this section but should always relate this to what Python actually does in memory.

In [59]:
variable
Out[59]:
20.0

A variable may be re-assigned as often as we wish. Thereby, we could also assign an object of a different type. Because this is allowed, Python is said to be a dynamically typed language. On the contrary, a statically typed language like C also allows re-assignment but only with objects of the same type. This subtle distinction is one reason why Python is slower at execution than C: As it runs a program, it needs to figure out an object's type each time it is referenced.

In [60]:
variable = 20
In [61]:
variable
Out[61]:
20

If we want to re-assign a variable while referencing its "old" (i.e., current) object, we may also update it using a so-called augmented assignment statement (i.e., not operator), as introduced with PEP 203: The currently mapped object is implicitly inserted as the first operand on the right-hand side.

In [62]:
variable *= 4  # same as variable = variable * 4
In [63]:
variable
Out[63]:
80
In [64]:
variable //= 2  # same as variable = variable // 2; "//" to retain the integer type
In [65]:
variable
Out[65]:
40
In [66]:
variable += 2  # same as variable = variable + 2
In [67]:
variable
Out[67]:
42

Variables are dereferenced (i.e., "deleted") with the del statement. This does not delete the object a variable references but merely removes the variable's name from the "global list of all names."

In [68]:
variable
Out[68]:
42
In [69]:
del variable

If we refer to an unknown name, a runtime error occurs, namely a NameError. The Name in NameError gives a hint why we choose the term name over identifier above: Python uses it more often in its error messages.

In [70]:
variable
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-70-1748287bc46a> in <module>
----> 1 variable

NameError: name 'variable' is not defined

Some variables magically exist when a Python process is started or are added by Jupyter. We may safely ignore the former until Chapter 10 and the latter for good.

In [71]:
__name__
Out[71]:
'__main__'

To see all defined names, the built-in function dir() is helpful.

In [72]:
dir()
Out[72]:
['In',
 'Out',
 'YouTubeVideo',
 '_',
 '_10',
 '_11',
 '_12',
 '_13',
 '_14',
 '_15',
 '_16',
 '_17',
 '_18',
 '_19',
 '_2',
 '_20',
 '_21',
 '_22',
 '_23',
 '_24',
 '_25',
 '_26',
 '_28',
 '_29',
 '_30',
 '_31',
 '_32',
 '_33',
 '_34',
 '_35',
 '_36',
 '_38',
 '_39',
 '_4',
 '_40',
 '_41',
 '_42',
 '_43',
 '_44',
 '_5',
 '_50',
 '_51',
 '_54',
 '_56',
 '_59',
 '_61',
 '_63',
 '_65',
 '_67',
 '_68',
 '_71',
 '_9',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i26',
 '_i27',
 '_i28',
 '_i29',
 '_i3',
 '_i30',
 '_i31',
 '_i32',
 '_i33',
 '_i34',
 '_i35',
 '_i36',
 '_i37',
 '_i38',
 '_i39',
 '_i4',
 '_i40',
 '_i41',
 '_i42',
 '_i43',
 '_i44',
 '_i45',
 '_i46',
 '_i47',
 '_i48',
 '_i49',
 '_i5',
 '_i50',
 '_i51',
 '_i52',
 '_i53',
 '_i54',
 '_i55',
 '_i56',
 '_i57',
 '_i58',
 '_i59',
 '_i6',
 '_i60',
 '_i61',
 '_i62',
 '_i63',
 '_i64',
 '_i65',
 '_i66',
 '_i67',
 '_i68',
 '_i69',
 '_i7',
 '_i70',
 '_i71',
 '_i72',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'a',
 'average',
 'b',
 'c',
 'count',
 'evens',
 'exit',
 'get_ipython',
 'number',
 'numbers',
 'quit',
 'this',
 'total']

Who am I? And how many?

It is crucial to understand that several variables may reference the same object in memory. Not having this in mind may lead to many hard to track down bugs.

Let's make b reference whatever object a is referencing.

In [73]:
a = 42
In [74]:
b = a
In [75]:
b
Out[75]:
42

For "simple" types like int or float this never causes troubles.

Let's "change the value" of a. To be precise, let's create a new 87 object and make a reference it.

In [76]:
a = 87
In [77]:
a
Out[77]:
87

b "is still the same" as before. To be precise, b still references the same object as before.

In [78]:
b
Out[78]:
42

However, if a variable references an object of a more "complex" type (e.g., list), predicting the outcome of a code snippet may be unintuitive for a beginner.

In [79]:
x = [1, 2, 3]
In [80]:
type(x)
Out[80]:
list
In [81]:
y = x
In [82]:
y
Out[82]:
[1, 2, 3]

Let's change the first element of x.

Chapter 7 discusses lists in more depth. For now, let's view a list object as some sort of container that holds an arbitrary number of references to other objects and treat the brackets [] attached to it as yet another operator, namely the indexing operator. So, x[0] instructs Python to first follow the reference from the global list of all names to the x object. Then, it follows the first reference it finds there to the 1 object we put in the list. The indexing operator must be an operator as we merely read the first element and do not change anything in memory permanently.

Python begins counting at 0. This is not the case for many other languages, for example, MATLAB, R, or Stata. To understand why this makes sense, see this short note by one of the all-time greats in computer science, the late Edsger Dijkstra.

In [83]:
x[0]
Out[83]:
1

To change the first entry in the list, we use the assignment statement = again. Here, this does not create a new variable, nor overwrite an existing one, but only changes the object referenced as the first element in x. As we only change parts of the x object, we say that we mutate its state. To use the bag analogy from above, we keep the same bag but "flip" some of the $0$s into $1$s and some of the $1$s into $0$s.

In [84]:
x[0] = 99
In [85]:
x
Out[85]:
[99, 2, 3]

The changes made to the object x is referencing can also be seen through the y variable!

In [86]:
y
Out[86]:
[99, 2, 3]

The difference in behavior illustrated in this sub-section has to do with the fact that int and float objects are immutable types while list objects are mutable.

In the first case, an object cannot be changed "in place" once it is created in memory. When we assigned 87 to the already existing a, we did not change the $0$s and $1$s in the object a referenced before the assignment but created a new int object and made a reference it while the b variable is not affected.

In the second case, x[0] = 99 creates a new int object 99 and merely changes the first reference in the x list.

In general, the assignment statement creates a new name and makes it reference whatever object is on the right-hand side iff the left-hand side is a pure name (i.e., it contains no operators like the indexing operator in the example). Otherwise, it mutates an already existing object. And, we must always expect that the latter may have more than one variable referencing it.

Visualizing what is going on in memory with a tool like PythonTutor may be helpful for a beginner.

Naming Conventions

Phil Karlton famously noted during his time at Netscape:

"There are two hard problems in computer science: naming things and cache invalidation ... and off-by-one errors."

Variable names may contain upper and lower case letters, numbers, and underscores (i.e., _) and be as long as we want them to be. However, they must not begin with a number. Also, they must not be any of Python's built-in keywords like for or if.

Variable names should be chosen such that they do not need any more documentation and are self-explanatory. A widespread convention is to use so-called snake_case: Keep everything lowercase and use underscores to separate words.

See this link for a comparison of different naming conventions.

Good examples

In [87]:
pi = 3.14
In [88]:
answer_to_everything = 42
In [89]:
my_name = "Alexander"
In [90]:
work_address = "WHU, Burgplatz 2, Vallendar"

Bad examples

In [91]:
PI = 3.14  # unless used as a "global" constant
In [92]:
answerToEverything = 42  # this is a style used in languages like Java
In [93]:
name = "Alexander"  # name of what?
In [94]:
address@work = "WHU, Burgplatz 2, Vallendar"
  File "<ipython-input-94-ec51dae29567>", line 1
    address@work = "WHU, Burgplatz 2, Vallendar"
                                                ^
SyntaxError: can't assign to operator

If a variable name collides with a built-in name, we add a trailing underscore.

In [95]:
type_ = "student"

Variables with leading and trailing double underscores, referred to as dunder in Python jargon, are used for built-in functionalities and to implement object-oriented features as we see in Chapter 10. We must not use this style for variables!

In [96]:
__name__
Out[96]:
'__main__'

Expressions

An expression is any syntactically correct combination of variables and literals with operators.

In simple words, anything that may be used on the right-hand side of an assignment statement without creating a SyntaxError is an expression.

What we say about individual operators above, namely that they have no permanent side effects in memory, should be put here, to begin with: The absence of any permanent side effects is the characteristic property of expressions, and all the code cells in the "(Arithmetic) Operators" section above contain only expressions!

The simplest possible expressions contain only one variable or literal.

In [97]:
a
Out[97]:
87
In [98]:
42
Out[98]:
42

For sure, we need to include operators to achieve something useful.

In [99]:
a - 42
Out[99]:
45

The definition of an expression is recursive. So, the sub-expression a - 42 is combined with the literal 9 by the operator // to form the full expression (a - 42) // 9.

In [100]:
(a - 42) // 9
Out[100]:
5

Here, the variable x is combined with the literal 2 by the indexing operator []. The resulting expression evaluates to the third element in the x list.

In [101]:
x[2]
Out[101]:
3

When not used as a delimiter, parentheses also constitute an operator, namely the call operator (). We saw this syntax above when we called built-in functions and methods.

In [102]:
sum(x)
Out[102]:
104

Operator Overloading

Python overloads certain operators. For example, you may not only "add" numbers but also strings: This is called string concatenation.

In [103]:
greeting = "Hi "
audience = "class"
In [104]:
greeting + audience
Out[104]:
'Hi class'

Duplicate strings using multiplication.

In [105]:
10 * greeting
Out[105]:
'Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi '

Statements

A statement is anything that changes the state of a program or has another permanent side effect. Statements, unlike expressions, do not evaluate to a value; instead, they create or change values.

Most notably, of course, are the = and del statements.

In [106]:
a = 42
In [107]:
del a

The built-in print() function is sometimes regarded as a "statement" as well. It used to be an actual statement in Python 2 and has all the necessary properties. It is a bit of a corner case but we can think of it as changing the state of the screen.

In [108]:
print("I change the state of the computer's display")
I change the state of the computer's display

Comments

We use the # symbol to write comments in plain English right into the code.

As a good practice, comments should not describe what happens. This should be evident by reading the code. Otherwise, it is most likely badly written code. Rather, comments should describe why something happens.

Comments may be added either at the end of a line of code, by convention separated with two spaces, or on a line on their own.

In [109]:
distance = 891  # in meters
elapsed_time = 93  # in seconds

# Calculate the speed in km/h.
speed = 3.6 * distance / elapsed_time

But let's think wisely if we need to use a comment. The second cell is a lot more Pythonic.

In [110]:
seconds = 365 * 24 * 60 * 60  # = seconds in the year
In [111]:
seconds_per_year = 365 * 24 * 60 * 60

TL;DR

We end each chapter with a summary of the main points (i.e., TL;DR = "too long; didn't read"). The essence in this first chapter is that just as a sentence in a real language like English may be decomposed into its parts (e.g., subject, predicate, and objects), the same may be done with programming languages.

  • program
    • sequence of instructions that specify how to perform a computation (= a "recipe")
    • a "black box" that processes inputs and transforms them into meaningful outputs in a deterministic way
    • conceptually similar to a mathematical function $f$ that maps some input $x$ to an output $y = f(x)$
  • input (examples)
    • data from a CSV file
    • text entered on a command line
    • data obtained from a database
    • etc.
  • output (examples)
    • result of a computation (e.g., statistical summary of a sample dataset)
    • a "side effect" (e.g., a transformation of raw input data into cleaned data)
    • a physical "behavior" (e.g., a robot moving or a document printed)
    • etc.
  • objects
    • distinct and well-contained areas/parts of the memory that hold the actual data
    • the concept by which Python manages the memory for us
    • can be classified into objects of the same type (i.e., same abstract "structure" but different concrete data)
    • built-in objects (incl. literals) vs. user-defined objects (cf., Chapter 10)
    • e.g., 1, 1.0, and "one" are three different objects of distinct types that are also literals (i.e., by the way we type them into the command line Python knows what the value and type are)
  • variables
    • storage of intermediate state
    • are names referencing objects in memory
    • e.g., x = 1 creates the variable x that references the object 1
  • operators
    • special built-in symbols that perform operations with objects in memory
    • usually, operate with one or two objects
    • e.g., addition +, subtraction -, multiplication *, and division / all take two objects, whereas the negation - only takes one
  • expressions
    • combinations of variables (incl. literals) and operators
    • do not change the involved objects/state of the program
    • evaluate to a value (i.e., the "result" of the expression, usually a new object)
    • e.g., x + 2 evaluates to the (new) object 3 and 1 - 1.0 to 0.0
  • statements
    • instructions that "do" something and have side effects in memory
    • (re-)assign names to (different) objects, change an existing object in place, or, more conceptually, change the state of the program
    • usually, work with expressions
    • e.g., the assignment statement = makes a name reference an object
  • comments
    • prose supporting a human's understanding of the program
    • ignored by Python
  • flow control (cf., Chapter 3 and Chapter 4)
    • expression of business logic or an algorithm
    • conditional execution of parts of a program (e.g., if statements)
    • repetitive execution of parts of a program (e.g., for-loops)

Further Resources

This PyCon 2015 talk by Ned Batchelder, a well-known Pythonista and the organizer of the Python User Group in Boston, summarizes all situations where some sort of assignment is done in Python. The content is intermediate, and, thus, it might be worthwhile to come back to this talk at a later point in time. However, the contents should be known by everyone claiming to be proficient in Python.

In [112]:
YouTubeVideo("_AEJHKGk9ns", width="60%")
Out[112]:
Notebooks AI
Notebooks AI Profile20060