Profile picture

Vectorized Operations and Methods on Pandas Series

Last updated: October 31st, 20192019-10-31Project preview

rmotr


Vectorized Operations and Methods on Pandas Series

Series also support vectorized operations and aggregation functions as Numpy, on this lecture we'll see most common ones.

purple-divider

Hands on!

In [1]:
import pandas as pd
import numpy as np
In [2]:
pd.options.display.float_format = '{:,.2f}'.format

green-divider

The first thing we'll do is create again the Series from our previous lecture:

In [3]:
g7_pop = pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, dtype=np.float, name='G7 Population in millions')
In [4]:
g7_pop
Out[4]:
Canada            35.47
France            63.95
Germany           80.94
Italy             60.66
Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64
In [5]:
gdp = pd.Series(
    [1785387, 2833687, 3874437, 2167744, 4602367, 2950039, 17348075],
    index=['Canada', 'France', 'Germany', 'Italy',
            'Japan', 'United Kingdom', 'United States'],
    dtype=np.float,
    name='G7 GDP in millions')
In [6]:
gdp
Out[6]:
Canada            1,785,387.00
France            2,833,687.00
Germany           3,874,437.00
Italy             2,167,744.00
Japan             4,602,367.00
United Kingdom    2,950,039.00
United States    17,348,075.00
Name: G7 GDP in millions, dtype: float64
In [7]:
g7_pop.head(3)
Out[7]:
Canada    35.47
France    63.95
Germany   80.94
Name: G7 Population in millions, dtype: float64
In [8]:
g7_pop.tail(3)
Out[8]:
Japan            127.06
United Kingdom    64.51
United States    318.52
Name: G7 Population in millions, dtype: float64

green-divider

Series vectorized operations

In [9]:
g7_pop * 1_000_000
Out[9]:
Canada            35,467,000.00
France            63,951,000.00
Germany           80,940,000.00
Italy             60,665,000.00
Japan            127,061,000.00
United Kingdom    64,511,000.00
United States    318,523,000.00
Name: G7 Population in millions, dtype: float64
In [10]:
g7_pop + 1_000_000
Out[10]:
Canada           1,000,035.47
France           1,000,063.95
Germany          1,000,080.94
Italy            1,000,060.67
Japan            1,000,127.06
United Kingdom   1,000,064.51
United States    1,000,318.52
Name: G7 Population in millions, dtype: float64
In [11]:
gdp * 1_000_000
Out[11]:
Canada            1,785,387,000,000.00
France            2,833,687,000,000.00
Germany           3,874,437,000,000.00
Italy             2,167,744,000,000.00
Japan             4,602,367,000,000.00
United Kingdom    2,950,039,000,000.00
United States    17,348,075,000,000.00
Name: G7 GDP in millions, dtype: float64

Operation between Series:

In [12]:
gdp / g7_pop
Out[12]:
Canada           50,339.39
France           44,310.28
Germany          47,868.01
Italy            35,733.03
Japan            36,221.71
United Kingdom   45,729.24
United States    54,464.12
dtype: float64
In [13]:
(gdp * 1_000_000) / (g7_pop * 1_000_000)
Out[13]:
Canada           50,339.39
France           44,310.28
Germany          47,868.01
Italy            35,733.03
Japan            36,221.71
United Kingdom   45,729.24
United States    54,464.12
dtype: float64

green-divider

Using Universal Functions (Ufuncs) to obtain statistical info

We can apply any Universal Function to a Series.

Another useful method is describe, which gives you a good "summary" of the Series. Let's explore other methods in more detail:

In [14]:
g7_pop.describe()
Out[14]:
count     7.00
mean    107.30
std      97.25
min      35.47
25%      62.31
50%      64.51
75%     104.00
max     318.52
Name: G7 Population in millions, dtype: float64
In [15]:
g7_pop.max()
Out[15]:
318.523
In [16]:
g7_pop.min()
Out[16]:
35.467
In [17]:
g7_pop.mean()
Out[17]:
107.30257142857144
In [18]:
g7_pop.std()
Out[18]:
97.24996987121581
In [19]:
g7_pop.quantile(.2)
Out[19]:
61.3222
In [20]:
g7_pop.quantile(.8)
Out[20]:
117.83680000000004
In [21]:
np.log(g7_pop)
Out[21]:
Canada           3.57
France           4.16
Germany          4.39
Italy            4.11
Japan            4.84
United Kingdom   4.17
United States    5.76
Name: G7 Population in millions, dtype: float64

purple-divider

Notebooks AI
Notebooks AI Profile20060