Profile picture

Removing Elements From Pandas DataFrames

Last updated: October 31st, 20192019-10-31Project preview

rmotr


Removing elements from Pandas DataFrames

purple-divider

Hands on!

In [1]:
import numpy as np
import pandas as pd

green-divider

The first thing we'll do is create again the DataFrame from our previous lecture:

In [2]:
df = pd.DataFrame({
    'Population': [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523, 50.233],
    'GDP': [1785387, 2833687, 3874437, 2167744, 4602367, 2950039, 17348075, 1485387],
    'Surface Area': [9984670, 640679, 357114, 301336, 377930, 242495, 9525067, 8923670],
    'HDI': [0.913, 0.888, 0.916, 0.873, 0.891, 0.907, 0.915, 0.894],
    'Currency': ['Canadian Dolar', 'Euro', 'Euro', 'Euro',
                 'Yen', 'Pound sterling', 'American Dolar', 'Real'],
    'Continent': ['America', 'Europe', 'Europe', 'Europe',
                  'Asia', 'Europe', 'America', 'America']
})

df.columns = ['Population', 'GDP', 'Surface Area', 'HDI', 'Currency', 'Continent']

df.index = ['Canada', 'France', 'Germany', 'Italy',
            'Japan', 'United Kingdom', 'United States',
            'Brazil']

df
Out[2]:
Population GDP Surface Area HDI Currency Continent
Canada 35.467 1785387 9984670 0.913 Canadian Dolar America
France 63.951 2833687 640679 0.888 Euro Europe
Germany 80.940 3874437 357114 0.916 Euro Europe
Italy 60.665 2167744 301336 0.873 Euro Europe
Japan 127.061 4602367 377930 0.891 Yen Asia
United Kingdom 64.511 2950039 242495 0.907 Pound sterling Europe
United States 318.523 17348075 9525067 0.915 American Dolar America
Brazil 50.233 1485387 8923670 0.894 Real America

green-divider

Removing elements from a DataFrame

Opposed to the concept of selection, we have "dropping". Instead of pointing out which values you'd like to select you could point which ones you'd like to drop:

 Removing rows from a DataFrame

In [3]:
df.drop('Brazil', inplace=True)

df
Out[3]:
Population GDP Surface Area HDI Currency Continent
Canada 35.467 1785387 9984670 0.913 Canadian Dolar America
France 63.951 2833687 640679 0.888 Euro Europe
Germany 80.940 3874437 357114 0.916 Euro Europe
Italy 60.665 2167744 301336 0.873 Euro Europe
Japan 127.061 4602367 377930 0.891 Yen Asia
United Kingdom 64.511 2950039 242495 0.907 Pound sterling Europe
United States 318.523 17348075 9525067 0.915 American Dolar America
In [4]:
# will return a new dataframe
df.drop(['Canada', 'Japan'])
Out[4]:
Population GDP Surface Area HDI Currency Continent
France 63.951 2833687 640679 0.888 Euro Europe
Germany 80.940 3874437 357114 0.916 Euro Europe
Italy 60.665 2167744 301336 0.873 Euro Europe
United Kingdom 64.511 2950039 242495 0.907 Pound sterling Europe
United States 318.523 17348075 9525067 0.915 American Dolar America
In [5]:
df
Out[5]:
Population GDP Surface Area HDI Currency Continent
Canada 35.467 1785387 9984670 0.913 Canadian Dolar America
France 63.951 2833687 640679 0.888 Euro Europe
Germany 80.940 3874437 357114 0.916 Euro Europe
Italy 60.665 2167744 301336 0.873 Euro Europe
Japan 127.061 4602367 377930 0.891 Yen Asia
United Kingdom 64.511 2950039 242495 0.907 Pound sterling Europe
United States 318.523 17348075 9525067 0.915 American Dolar America
In [6]:
# will return a new dataframe
df.drop(['Italy', 'Canada'], axis=0)
Out[6]:
Population GDP Surface Area HDI Currency Continent
France 63.951 2833687 640679 0.888 Euro Europe
Germany 80.940 3874437 357114 0.916 Euro Europe
Japan 127.061 4602367 377930 0.891 Yen Asia
United Kingdom 64.511 2950039 242495 0.907 Pound sterling Europe
United States 318.523 17348075 9525067 0.915 American Dolar America

 Removing columns from a DataFrame

In [7]:
# will return a new dataframe
df.drop(columns=['Continent'], inplace=True)

df
Out[7]:
Population GDP Surface Area HDI Currency
Canada 35.467 1785387 9984670 0.913 Canadian Dolar
France 63.951 2833687 640679 0.888 Euro
Germany 80.940 3874437 357114 0.916 Euro
Italy 60.665 2167744 301336 0.873 Euro
Japan 127.061 4602367 377930 0.891 Yen
United Kingdom 64.511 2950039 242495 0.907 Pound sterling
United States 318.523 17348075 9525067 0.915 American Dolar
In [8]:
#del df['Currency']
df.drop('Currency', axis=1, inplace=True)

df
Out[8]:
Population GDP Surface Area HDI
Canada 35.467 1785387 9984670 0.913
France 63.951 2833687 640679 0.888
Germany 80.940 3874437 357114 0.916
Italy 60.665 2167744 301336 0.873
Japan 127.061 4602367 377930 0.891
United Kingdom 64.511 2950039 242495 0.907
United States 318.523 17348075 9525067 0.915
In [9]:
# will return a new dataframe
df.drop(['Population', 'HDI'], axis=1)
Out[9]:
GDP Surface Area
Canada 1785387 9984670
France 2833687 640679
Germany 3874437 357114
Italy 2167744 301336
Japan 4602367 377930
United Kingdom 2950039 242495
United States 17348075 9525067
In [10]:
# will return a new dataframe
df.drop(['Population', 'HDI'], axis='columns')
Out[10]:
GDP Surface Area
Canada 1785387 9984670
France 2833687 640679
Germany 3874437 357114
Italy 2167744 301336
Japan 4602367 377930
United Kingdom 2950039 242495
United States 17348075 9525067
In [11]:
df.drop(['Canada', 'Germany'], axis='rows')
Out[11]:
Population GDP Surface Area HDI
France 63.951 2833687 640679 0.888
Italy 60.665 2167744 301336 0.873
Japan 127.061 4602367 377930 0.891
United Kingdom 64.511 2950039 242495 0.907
United States 318.523 17348075 9525067 0.915

By default, the drop method returns a new DataFrame. If you'd like to modify it "in place", you can use the inplace attribute (there's an example below).

In [12]:
df.drop(['Canada', 'Germany'], axis='rows', inplace=True)

df
Out[12]:
Population GDP Surface Area HDI
France 63.951 2833687 640679 0.888
Italy 60.665 2167744 301336 0.873
Japan 127.061 4602367 377930 0.891
United Kingdom 64.511 2950039 242495 0.907
United States 318.523 17348075 9525067 0.915

This dropping methods and custom data cleaning strategies are further developed in Data Cleaning course.

purple-divider

Notebooks AI
Notebooks AI Profile20060