 Missing Values in Pandas

Last updated: June 11th, 2019 Exercises¶

Missing Values in Pandas¶

In [ ]:
import numpy as np
import pandas as pd
import missingno as msno We are going to use a dataset of 5,000 movies scraped from IMDB. It contains information on the actors, directors, budget, and gross, as well as the IMDB rating and release year. Exercise 1¶

Read the movies dataset from data/movie.csv into the df variable.

In [ ]:
# your code goes here

In [ ]:
df = pd.read_csv('data/movies.csv') Exercise 2¶

Check if each column has at least one missing value.

In [ ]:
# your code goes here

In [ ]:
df.isna().any() Exercise 3¶

Check how many missing values each column has.

In [ ]:
# your code goes here


First get boolean values of each element whether it has a missing value or not, then sum that values.

In [ ]:
df.isnull().sum() Exercise 4¶

Calculate the percentage of missing values per column.

In [ ]:
# your code goes here

In [ ]:
#df.isnull().sum() / df.shape * 100

df.isna().mean() * 100 Exercise 5¶

Validate your previous exercise calculating the percentage of non missing values per column.

In [ ]:
# your code goes here

In [ ]:
#df.notnull().sum() / df.shape * 100

df.notna().mean() * 100 Exercise 6¶

Finally show the distribution of missing values in a plot using missingno library.

• Show a matrix plot with the missing values density over the whole data.
• Show a bar plot with the count of missing values by column.
In [ ]:
# your code goes here

In [ ]:
msno.matrix(df)

In [ ]:
# your code goes here

In [ ]:
msno.bar(df) 