Profile picture

List Comprehensions Explained

A detailed explanation of how List Comprehensions work including examples and a step by step video to understand them.

Last updated: February 24th, 20202020-02-24Project preview

Do you want to learn about List Comprehensions? What are they used for? How to write them?

We've recorded a detailed video with all these explanations, along with some tips to help you remember the syntax.

Important! You can fork this project and work on these examples and the exercises included by using the Fork button on the right šŸ‘‰

(Desktop recommended)

InĀ [1]:
from IPython.display import YouTubeVideo

The objective of a List Comprehension is to turn a collection A into a collection of B. To do that, we define a transformation function which takes an element of A and transform it into an element of B. For example, we have a list of names:

InĀ [2]:
names = ['Grace', 'Ada', 'Sophie', 'Margaret']

and we want to "transform" it into a list of the lenght of each name. Our final result will look something like:

InĀ [3]:
[5, 3, 6, 8]
[5, 3, 6, 8]

The Transform FunctionĀ¶

To do that, we have to first define a function that receives a name, and returns its length in characters:

InĀ [4]:
def length_of_name(name):
    return len(name)

transform function

From Collection A to Collection BĀ¶

As you've seen, we've defined the function length_of_name to work only on one element of A. The list comprehension will be the one in charge of applying that function to every element of A and create a new collection.


As you can see, the transformation is applied in order.

List Comprehension SyntaxĀ¶

we'll transform this code using a list comprehension now, I'll just show it first and we can dissect it later:

InĀ [5]:
[length_of_name(name) for name in names]
[5, 3, 6, 8]

Another example:

InĀ [6]:
[length_of_name(n) for n in names]
[5, 3, 6, 8]

We can see that a list comprehension has 3 clearly defined parts:

List comprehensions rmotr explained (1)

  1. The expression: how would you "transform" each element in the collection
  2. A name we're choosing to reference each element in the collection
  3. The collection we want to transform

If you read the previous examples again, you'll see that I switch the name of the variable (2) between name and n. You can pick any name that you want. We try it to be representative of the elements you're iterating ("self documented").

Any expression is validĀ¶

You'll probably notice that the function length_of_name is just a wrapper for the len function. So we can just use the expression len(name):

InĀ [7]:
[len(n) for n in names]
[5, 3, 6, 8]

This is something that usually confuse new programmers. It's related to the concept of "expressions" (maybe the subject for another video? šŸ˜…). Here's another example:

InĀ [8]:
import math
InĀ [9]:
constants = [math.pi, math.e, math.tau]
[3.141592653589793, 2.718281828459045, 6.283185307179586]
InĀ [10]:
[f"{constant:.2f}" for constant in constants]
['3.14', '2.72', '6.28']

what we're doing is just formatting the numbers including only 2 decimals. The key is in the expression f"{constant:.2f}". Let's separate this in just one example:

InĀ [11]:
Ļ• = (1 + 5 ** 0.5) / 2  #Ā Phi constant, or golden ratio
InĀ [12]:

Sorry about the tangent, but I know I used a not-so-common variable name. In Python 3 we can use an extended range of unicode characters to name variables šŸ˜ƒ. Maybe it's more clear with:

InĀ [13]:
golden_ratio = (1 + 5 ** 0.5) / 2  #Ā Phi constant, or golden ratio
InĀ [14]:

But I digress... The important part is that we're using just an expression for our list comprehension. We could have used the regular format method:

InĀ [15]:
["{:.2f}".format(c) for c in constants]
['3.14', '2.72', '6.28']

Immutability is keyĀ¶

One very important trait of List comprehensions is that they return a new collection, they DO NOT modify the original one. Using our previous example:

InĀ [16]:
[f"{constant:.2f}" for constant in constants]
['3.14', '2.72', '6.28']

But the list constants remains unchanged:

InĀ [17]:
[3.141592653589793, 2.718281828459045, 6.283185307179586]

Collection is actually any "iterable" objectĀ¶

List Comprehensions are not restricted to "just lists". They're called List comprehensions because they return lists, but any iterable can be the input of them. Check the following examples:

Using a dictionary:

InĀ [18]:
constants = {
    "Ļ€": math.pi,
    "e": math.e,
    "Ļ„": math.tau

Iterating over values:

InĀ [19]:
[f"{c:.2f}" for c in constants.values()]
['3.14', '2.72', '6.28']

Iterating over keys:

InĀ [20]:
[c.encode("unicode_escape") for c in constants]
[b'\\u03c0', b'e', b'\\u03c4']
InĀ [21]:
import numpy as np
InĀ [22]:
arr = np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
InĀ [23]:
[n ** 2 for n in arr]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The if part of list comprehensionsĀ¶

List comprehensions accept a final term which will let you "select" which elements will be processed. Syntactically, it's place at the end of the list comprehension. Check the following example, in which we run the expression n ** 2 ONLY for those elements that are divisible by 2:

InĀ [24]:
[n ** 2 for n in arr if n % 2 == 0]
[0, 4, 16, 36, 64]

As you can see, the if expression, is just any valid boolean expression.

InĀ [25]:
names = ['Grace', 'Ada', 'Sophie', 'Margaret']
InĀ [26]:
[name.upper() for name in names if len(name) > 3]

Sometimes, list comprehensions might incur in what's known as "double evaluation", as we can see in the following example, where we're trying to return the length of the names, but only for those names that have more than 3 characters:

InĀ [27]:
[len(name) for name in names if len(name) > 3]
[5, 6, 8]

In this case, len(name) was computed 2 times. If that operation is computationally expensive, we'll have a problem. There are ways of fixing this using generators, but that's the subject of other lesson šŸ˜.

The alternatives for Data ScientistsĀ¶

List Comprehensions are syntactic sugar for the glorious "map" operation in functional programming. Some other libraries have implemented the same logic with other mechanisms. The most important example is the one of "vectorized operations" in numpy and pandas. The same expression above could have been written in this way:

InĀ [28]:
arr ** 2
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

You can see that it yields the same results.

Vectorized operations are very important when working with numpy and pandas, and they're the preferred method (over List Comprehensions). They're also low-level optimized, so they'll result in better performance.

We've written extensively about vectorized operations and ufuncs. Check this post for an introduction.

Next stepsĀ¶

We've seen a good overview of list comprehensions, how they're formed and what they are used for. If you want to keep investigating, there are some other topics that follow up to this, like Dictionary Comprehensions, Generators and Iterators.

Notebooks AI
Notebooks AI Profile20060