Profile picture

4.5 - Apply, Applymap and Map

Last updated: April 3rd, 20192019-04-03Project preview

rmotr


Apply, Applymap and Map

  • apply() method can be applied both to Series and DataFrames where function can be applied both series and individual elements based on the type of function provided.
  • applymap() method only works on a pandas DataFrame where function is applied on every element individually.
  • map() method only works on a pandas Series where type of operation to be applied depends on argument passed as a function, dictionary or a list.
Function Data structure Applied to
apply Series / DataFrame All row/column values at a time 
applymap DataFrame One element at time
map Series One element at time

Summing up, apply works on a row/column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series.

purple-divider

Hands on!

In [ ]:
import pandas as pd
import numpy as np
In [ ]:
pd.options.display.float_format = '{:,.2f}'.format

green-divider

 NBA players data

We'll use a small NBA dataset which gives information about its players.

In [ ]:
players = pd.DataFrame({
    'salary': [
        33285709, 31269231, 34682550, 25000000, 17826150,
        29512900, 28530608, 26243760, 18868625, 2500000
    ],
    'season_start': [2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017],
    'season_end': [2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018],
    'team': ['CLE', 'DEN', 'GSW', 'GSW', 'GSW', 'LAC', 'OKC', 'OKC', 'SAS', 'SAS'],
    'Pos': ['SF', 'PF', 'PG', np.nan, 'SG', 'PF', 'PG', 'SF', 'SF', 'SG'],
    'Age': [32.0, 31.0, 28.0, 28.0, 26.0, np.nan, 28.0, 32.0, 25.0, 39.0]
}, index=[
    'LeBron James', 'Paul Millsap', 'Stephen Curry', 'Kevin Durant', 'Klay Thompson',
    'Blake Griffin', 'Russell Westbrook', 'Carmelo Anthony', 'Kawhi Leonard', 'Manu Ginobili'
])
In [ ]:
players

green-divider

Series

Most important Series methos are map and apply. DataFrames also have an apply method, which makes it confusing. For now, we'll focus ONLY in Series.


map

map is a method from Series, and will let you map from the series' values, to new values:

In [ ]:
players['Pos'].unique()
In [ ]:
players['Pos'].map({
    'PG': 'Point Guard',
    'SG': 'Shooting Guard',
    'SF': 'Small Forward',
    'PF': 'Power Forward',
}).to_frame()
In [ ]:
players['Pos'].map('Position: {}'.format).to_frame()

It takes an optional na_action parameter that specify what to do with nan values:

In [ ]:
players['Pos'].map('Position: {}'.format,
                   na_action='ignore').to_frame()

apply

In a Series, apply applies a custom function to each element and return a new Series.

For example, apply the function age_to_days to each player's age:

In [ ]:
players['Age'].apply(lambda age: age * 365).to_frame()

Sometimes (and for Series specially) the functionalities of map and apply overlap. When you have a custom function, favor apply, when you have a 1-on-1 mapping (like the dict one), use map.

In [ ]:
players['salary'].apply('{:,.2f}'.format).to_frame()
In [ ]:
players['salary'].map('{:,.2f}'.format).to_frame()

apply let's you specify other parameters (arguments and keyword arguments) to pass to the function:

In [ ]:
players['salary'].apply(lambda salary, precision: '{salary:,.{prec}f}'.format(
    salary=salary, prec=precision), args=(3, )).to_frame()
In [ ]:
players['salary'].apply(lambda salary, precision: '{salary:,.{prec}f}'.format(
    salary=salary, prec=precision), precision=3).to_frame()

green-divider

Indexes

Indexes are special, they're not as versatile as Series or DataFrames, but you can still apply functions. Index doesn't have the apply method, it on

In [ ]:
players.index.map(len)

The apply method is not defined, if you absolutely need apply, you need to reset the index first:

In [ ]:
players.reset_index()['index'].apply(len)

Most of these common operations are already provided in base String functions provided by pandas:

In [ ]:
players.index.str.len()

green-divider

DataFrames

DataFrames most important methods are apply and applymap. applymap is similar to Series' apply: it performs an operation element-wise ("value per value").


applymap

In [ ]:
players[['Age', 'salary']].applymap(lambda x: '{:,.2f}'.format(x))

Again, you're applying your function to each element individually.


apply

Probably the most interesting method of a DataFrame is apply, as it works on a per-row or per-column basis. The default behavior is "per column":

In [ ]:
def range_per_column(a_column):
    return a_column.max() - a_column.min()
In [ ]:
players[['Age', 'salary']].apply(range_per_column).to_frame()

And as you can see, the DataFrame has been "pivoted", the columns Age and salary are now the indexes of the resulting Series.

Finally, using apply per row is really useful too, because your custom function receives an entire row, and you can operate on all those row values:

In [ ]:
players
In [ ]:
def salary_per_year_of_age(a_row):
    return a_row['salary'] / a_row['Age']
In [ ]:
players.apply(salary_per_year_of_age, axis=1).to_frame()

DataFrame.apply also takes possible extra arguments:

In [ ]:
def salary_per_age_period(a_row, period=1):
    return a_row['salary'] / (a_row['Age'] * period)
In [ ]:
# per year of age
players.apply(salary_per_age_period, axis=1, period=1).to_frame()
In [ ]:
# per month of age
players.apply(salary_per_age_period, axis=1, period=12).to_frame()

purple-divider

Notebooks AI
Notebooks AI Profile20060