Profile picture

Missing Values in Pandas

Last updated: June 11th, 20192019-06-11Project preview

rmotr


Exercises

Missing Values in Pandas

In [ ]:
import numpy as np
import pandas as pd
import missingno as msno

purple-divider

We are going to use a dataset of 5,000 movies scraped from IMDB. It contains information on the actors, directors, budget, and gross, as well as the IMDB rating and release year.

green-divider

Exercise 1

Read the movies dataset from data/movie.csv into the df variable.

In [ ]:
# your code goes here
In [ ]:
df = pd.read_csv('data/movies.csv')

df.head(15)

green-divider

Exercise 2

Check if each column has at least one missing value.

In [ ]:
# your code goes here
In [ ]:
df.isna().any()

green-divider

Exercise 3

Check how many missing values each column has.

In [ ]:
# your code goes here

First get boolean values of each element whether it has a missing value or not, then sum that values.

In [ ]:
df.isnull().sum()

green-divider

Exercise 4

Calculate the percentage of missing values per column.

In [ ]:
# your code goes here
In [ ]:
#df.isnull().sum() / df.shape[0] * 100

df.isna().mean() * 100

green-divider

Exercise 5

Validate your previous exercise calculating the percentage of non missing values per column.

In [ ]:
# your code goes here
In [ ]:
#df.notnull().sum() / df.shape[0] * 100

df.notna().mean() * 100

green-divider

Exercise 6

Finally show the distribution of missing values in a plot using missingno library.

  • Show a matrix plot with the missing values density over the whole data.
  • Show a bar plot with the count of missing values by column.
In [ ]:
# your code goes here
In [ ]:
msno.matrix(df)
In [ ]:
# your code goes here
In [ ]:
msno.bar(df)

purple-divider

Notebooks AI
Notebooks AI Profile20060