Profile picture

Pandas String Handling (.str Attribute)

Last updated: July 2nd, 20192019-07-02Project preview

rmotr


Exercises

 Pandas String handling (.str attribute)

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

purple-divider

We'll use are going to use the movies dataset from here that contains data scraped from IMDB website.

green-divider

Exercise 1

Read the movies dataset from data/movies.csv and save it as a DataFrame in a df variable.

In [ ]:
# your code goes here
In [ ]:
df = pd.read_csv('data/movies.csv')

df.head()

green-divider

Exercise 2

Get the first genre of each movie and save it in a first_genre column.

In [ ]:
# your code goes here
In [ ]:
df['first_genre'] = df['genres'].str.split('|', expand=True).loc[:,0]

df['first_genre'].head(10)

green-divider

Exercise 3

Get a list of every unique genre.

In [ ]:
# your code goes here

Use stack().unique() after splitting the values.

In [ ]:
unique_genres = df['first_genre'].str.split('|', expand=True).stack().unique()

unique_genres

green-divider

Exercise 4

How many films has dog within its plot_keywords?

In [ ]:
# your code goes here
In [ ]:
df['plot_keywords'].str.contains('dog').sum()

green-divider

Exercise 5

Create a new is_comedy boolean column indicating if the movie has Comedy genre.

In [ ]:
# your code goes here
In [ ]:
df['is_comedy'] = df['genres'].str.contains('Comedy')

df['is_comedy'].head()

green-divider

Exercise 6

Replace the following country names:

  • UKUnited Kingdom
  • USAUnited States
In [ ]:
# your code goes here
In [ ]:
# using pandas replace
'''
df.replace({
    'country': {
        'UK': 'United Kingdom',
        'USA': 'United States'
    }
}, inplace=True)
'''

# using str.replace
df['country'] = df['country'].str.replace('UK', 'United Kingdom')
df['country'] = df['country'].str.replace('USA', 'United States')

df['country'].unique()

green-divider

Exercise 7

Uppercase every movie title.

In [ ]:
# your code goes here
In [ ]:
df['movie_title'] = df['movie_title'].str.upper()

df['movie_title'].head()

purple-divider

Notebooks AI
Notebooks AI Profile20060