# Conditional Selection on Pandas Series

Last updated: June 3rd, 2019

# Conditional selection on Pandas Series¶

In conditional selection (also known as boolean selection), we will select subsets of data based on the actual values of the data in the Series by using a boolean vector to filter the data.

## Hands on!¶

In [ ]:
import pandas as pd
import numpy as np


The first thing we'll do is create again the Series from our previous lecture:

In [ ]:
data_dic = {
'France': 63.951,
'Germany': 80.94,
'Italy': 60.665,
'Japan': 127.061,
'United Kingdom': 64.511,
'United States': 318.523
}

g7_pop = pd.Series(data_dic,
name='G7 Population in millions')

In [ ]:
g7_pop


Summary of selection (from previous lesson):

In [ ]:
g7_pop['France']

In [ ]:
g7_pop.loc['France']

In [ ]:
g7_pop.iloc[0]


## Conditional selection ( boolean arrays)¶

The same boolean array techniques we saw applied to numpy arrays can be used for Pandas Series.

On previous lecture we saw that we can index our Series using a list of boolean values:

In [ ]:
g7_pop[[False, True,  True, True,  False, False,  False]]


More documented:

In [ ]:
g7_pop[[
False, # CA
True,  # Fr
True,  # GE
True,  # IT
False, # JA
False, # UK
False  #US
]]


Now we'll go a step further and use a real condition to generate these list of boolean values:

In [ ]:
condition = g7_pop > 70

condition

In [ ]:
g7_pop[condition]

In [ ]:
g7_pop.loc[g7_pop > 70]

In [ ]:
g7_pop.mean()

In [ ]:
g7_pop[g7_pop > g7_pop.mean()]

In [ ]:
g7_pop.loc[g7_pop > g7_pop.mean()]

In [ ]:
g7_pop.loc[g7_pop > g7_pop.mean()].size


### Operators¶

#### or¶

In [ ]:
g7_pop[(g7_pop > 70) | (g7_pop < 40)]


#### and¶

In [ ]:
g7_pop[(g7_pop > 80) & (g7_pop < 200)]


#### not¶

In [ ]:
g7_pop.loc[~(g7_pop > 80)]

In [ ]:
g7_pop.loc[g7_pop > 80]

In [ ]:
g7_pop[g7_pop > g7_pop.mean()]

In [ ]:
g7_pop.std()

In [ ]:
g7_pop[(g7_pop > g7_pop.mean() - g7_pop.std() / 2) | (g7_pop > g7_pop.mean() + g7_pop.std() / 2)]


### Indexing with isin¶

Consider the isin() method of Series, which returns a boolean vector that is true wherever the Series elements exist in the passed list. This allows you to select rows where one or more columns have values you want:

In [ ]:
g7_pop

In [ ]:
g7_pop[g7_pop.isin([80, 80.940, 60.451, 35.467])]

In [ ]:
g7_pop[g7_pop.index.isin(['Canada', 'Italy'])]


### Modifying series using conditional selection¶

In [ ]:
g7_pop[g7_pop < 70] = 99.99

g7_pop


Also we can combine +=, -=, *= operations while modifying values.

Lets remove 5 million from countries with population >100M:

In [ ]:
g7_pop[g7_pop > 100] += 5

g7_pop