Profile picture

Introduction to NumPy Boolean Arrays

Last updated: October 31st, 20192019-10-31Project preview

rmotr


Introduction to NumPy Boolean Arrays

(also called masks)

We saw in our previous lessons, how can we use boolean operators as broadcasting with numpy arrays. We'll see now how we can combine boolean arrays to regular selection to create boolean filters.

purple-divider

Hands on!

In [1]:
import sys
import numpy as np

First, let's review regular selection with a simple array:

In [2]:
a = np.arange(6)
a
Out[2]:
array([0, 1, 2, 3, 4, 5])

If we want to access the first and last elements, there are several options. For example:

1. Regular indexing

In [3]:
a[0], a[-1]
Out[3]:
(0, 5)

2. Muliple indices

In [4]:
a[[0, -1]]
Out[4]:
array([0, 5])

Aside from these two known ones, we can also use boolean arrays:

3. Boolean Array

In [5]:
a[[True, False, False, False, False, True]]
Out[5]:
array([0, 5])

When passing a boolean array to the regular selection operation, we're basically indicating what elements we want to retrieve (all those True values).

Relation with Broadcasting

As you saw in our previous lesson, broadcasting can also be performed with boolean operators. And the result, was a boolean array:

In [6]:
a > 2
Out[6]:
array([False, False, False,  True,  True,  True])

In this case, there are True values for those elements satisfying our condition (element > 2).

We can now combine this operation with the selection process, to create filters. For example, "all the elements that are greater than 2":

In [7]:
a[a > 2]
Out[7]:
array([3, 4, 5])

More examples:

In [8]:
a % 2 == 0
Out[8]:
array([ True, False,  True, False,  True, False])
In [9]:
a[a % 2 == 0]
Out[9]:
array([0, 2, 4])
In [10]:
a.mean()
Out[10]:
2.5
In [11]:
a[a > a.mean()]
Out[11]:
array([3, 4, 5])

green-divider

Logical Operators

You're probably already familiar with python's logical operators (and, or and not). We'll see now numpy's counterparts. From now on, this table might be useful:

Python Numpy
and &
or |
not ~

The best way to understand logical operators is with examples, let's do a few for them:

In [12]:
a
Out[12]:
array([0, 1, 2, 3, 4, 5])
Example 1: All elements greater or equals to 2 *AND* less than 5
In [13]:
(a >= 2) & (a < 5)
Out[13]:
array([False, False,  True,  True,  True, False])
Example 2: Elements equals to 0 *OR* equals to 1:
In [14]:
(a == 0) | (a == 1)
Out[14]:
array([ True,  True, False, False, False, False])

As you've seen in these examples, it's very important to include parenthesis on your expressions, in other case, they'd fail.

Now check these examples with the not (~) operator:

Example 3: All elements greater than 2:
In [15]:
a > 2
Out[15]:
array([False, False, False,  True,  True,  True])
In [16]:
~(a <= 2)
Out[16]:
array([False, False, False,  True,  True,  True])

the results are the same! It's the same to say "everything greater than 2" and "all the elements that are *not* less or equals to 2.

green-divider

Logical Operators in filtering

We can combine boolean filtering (masks) with logical expressions, to achieve more advanced filtering. We'll use the same examples as before:

Example 1: All elements greater or equals to 2 *AND* less than 5
In [17]:
(a >= 2) & (a < 5)
Out[17]:
array([False, False,  True,  True,  True, False])
In [18]:
a[(a >= 2) & (a < 5)]
Out[18]:
array([2, 3, 4])
Example 2: Elements equals to 0 *OR* equals to 1:
In [19]:
(a == 0) | (a == 1)
Out[19]:
array([ True,  True, False, False, False, False])
In [20]:
a[(a == 0) | (a == 1)]
Out[20]:
array([0, 1])
Example 3: All elements greater than 2:
In [21]:
~(a <= 2)
Out[21]:
array([False, False, False,  True,  True,  True])
In [22]:
a[~(a <= 2)]
Out[22]:
array([3, 4, 5])

green-divider

Assignment with condition

Finally, we'll see how we can leverage filtering and masks to also make modifications to our arrays.

In [23]:
a = np.arange(10)
a
Out[23]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

It's possible to modify elements from an array that match a given condition, for example:

In [24]:
a[a >= 4] = 99

The array a has been modified:

In [25]:
a
Out[25]:
array([ 0,  1,  2,  3, 99, 99, 99, 99, 99, 99])

It's also possible to modify elements, based on those same elements. Another example:

In [26]:
a = np.arange(10)
a
Out[26]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [27]:
a[a >= 4] = (a[a >= 4] * 100)
In [28]:
a
Out[28]:
array([  0,   1,   2,   3, 400, 500, 600, 700, 800, 900])

In this example, we've modified each element greater or equals to 4, for the same element multiplied by 100.

green-divider

Methods and patterns for boolean arrays

There are two useful methods that we use with boolean arrays.

  • any returns True if there's at least one True value. Otherwise returns False
  • all returns True if ALL the elements are True. Otherwise returns False
In [29]:
a = np.array([99, 4, 101, 251])
a
Out[29]:
array([ 99,   4, 101, 251])
In [30]:
a >= 99
Out[30]:
array([ True, False,  True,  True])
In [31]:
(a >= 99).any()
Out[31]:
True
In [32]:
a >= 99
Out[32]:
array([ True, False,  True,  True])
In [33]:
(a >= 99).all()
Out[33]:
False
In [34]:
a < 1_000
Out[34]:
array([ True,  True,  True,  True])
In [35]:
(a < 1_000).all()
Out[35]:
True

It's also very common try answering "how many elements satisfy the condition?". any will tell you if there's at least 1 element, but, how many? For that, we'll use the np.sum function.

In [36]:
a
Out[36]:
array([ 99,   4, 101, 251])
In [37]:
a > 99
Out[37]:
array([False, False,  True,  True])
In [38]:
(a > 99).any()
Out[38]:
True
In [39]:
np.sum(a > 99)
Out[39]:
2

We now know that 2 elements are greater than 99.

In [40]:
np.sum(a[a > 99])
Out[40]:
352

And the sum of that 2 elements is 352.

purple-divider

Notebooks AI
Notebooks AI Profile20060