 Data Science - Class 1

Last updated: June 6th, 2019  Exercises¶

Bike store sales¶ Hands on!¶

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

In [ ]:
sales = pd.read_csv(
'data/sales_data.csv',
parse_dates=['Date'])

In [ ]:
sales.head() What's the mean of Customers_Age?¶

In [ ]:
# your code goes here

In [ ]:
sales['Customer_Age'].mean()


Go ahead and show a density (KDE) and a box plot with the Customer_Age data:

In [ ]:
# your code goes here

In [ ]:
sales['Customer_Age'].plot(kind='kde', figsize=(14,6))

In [ ]:
sales['Customer_Age'].plot(kind='box', vert=False, figsize=(14,6)) What's the mean of Order_Quantity?¶

In [ ]:
# your code goes here

In [ ]:
sales['Order_Quantity'].mean()


Go ahead and show a histogram and a box plot with the Order_Quantity data:

In [ ]:
# your code goes here

In [ ]:
sales['Order_Quantity'].plot(kind='hist', bins=30, figsize=(14,6))

In [ ]:
sales['Order_Quantity'].plot(kind='box', vert=False, figsize=(14,6)) How many sales per year do we have?¶

In [ ]:
# your code goes here

In [ ]:
sales['Year'].value_counts()


Go ahead and show a pie plot with the previous data:

In [ ]:
# your code goes here

In [ ]:
sales['Year'].value_counts().plot(kind='pie', figsize=(6,6)) How many sales per month do we have?¶

In [ ]:
# your code goes here

In [ ]:
sales['Month'].value_counts()


Go ahead and show a bar plot with the previous data:

In [ ]:
# your code goes here

In [ ]:
sales['Month'].value_counts().plot(kind='bar', figsize=(14,6)) Which country has the most sales quantity of sales?¶

In [ ]:
# your code goes here

In [ ]:
sales['Country'].value_counts().head(1)

In [ ]:
sales['Country'].value_counts()


Go ahead and show a bar plot of the sales per country:

In [ ]:
# your code goes here

In [ ]:
sales['Country'].value_counts().plot(kind='bar', figsize=(14,6)) Create a list of every product sold¶

In [ ]:
# your code goes here

In [ ]:
#sales.loc[:, 'Product'].unique()

sales['Product'].unique()


Create a bar plot showing the 10 most sold products (best sellers):

In [ ]:
# your code goes here

In [ ]:
sales['Product'].value_counts().head(10).plot(kind='bar', figsize=(14,6)) Can you see any relationship between Unit_Cost and Unit_Price?¶

Show a scatter plot between both columns.

In [ ]:
# your code goes here

In [ ]:
sales.plot(kind='scatter', x='Unit_Cost', y='Unit_Price', figsize=(6,6)) Can you see any relationship between Order_Quantity and Profit?¶

Show a scatter plot between both columns.

In [ ]:
# your code goes here

In [ ]:
sales.plot(kind='scatter', x='Order_Quantity', y='Profit', figsize=(6,6)) Can you see any relationship between Profit per Country?¶

Show a grouped box plot per country with the profit values.

In [ ]:
# your code goes here

In [ ]:
sales[['Profit', 'Country']].boxplot(by='Country', figsize=(10,6)) Can you see any relationship between the Customer_Age per Country?¶

Show a grouped box plot per country with the customer age values.

In [ ]:
# your code goes here

In [ ]:
sales[['Customer_Age', 'Country']].boxplot(by='Country', figsize=(10,6)) Add and calculate a new Calculated_Date column¶

Use Day, Month, Year to create a Date column (YYYY-MM-DD).

In [ ]:
# your code goes here

In [ ]:
sales['Calculated_Date'] = sales[['Year', 'Month', 'Day']].apply(lambda x: '{}-{}-{}'.format(x, x, x), axis=1) Parse your Calculated_Date column into a datetime object¶

In [ ]:
# your code goes here

In [ ]:
sales['Calculated_Date'] = pd.to_datetime(sales['Calculated_Date']) How did sales evolve through the years?¶

Show a line plot using Calculated_Date column as the x-axis and the count of sales as the y-axis.

In [ ]:
# your code goes here

In [ ]:
sales['Calculated_Date'].value_counts().plot(kind='line', figsize=(14,6)) Increase 50 U\$S revenue to every sale¶

In [ ]:
# your code goes here

In [ ]:
#sales['Revenue'] = sales['Revenue'] + 50

sales['Revenue'] += 50 How many orders were made in Canada or France?¶

In [ ]:
# your code goes here

In [ ]:
sales.loc[(sales['Country'] == 'Canada') | (sales['Country'] == 'France')].shape How many Bike Racks orders were made from Canada?¶

In [ ]:
# your code goes here

In [ ]:
sales.loc[(sales['Country'] == 'Canada') & (sales['Sub_Category'] == 'Bike Racks')].shape How many orders were made in each region (state) of France?¶

In [ ]:
# your code goes here

In [ ]:
france_states = sales.loc[sales['Country'] == 'France', 'State'].value_counts()

france_states


Go ahead and show a bar plot with the results:

In [ ]:
# your code goes here

In [ ]:
france_states.plot(kind='bar', figsize=(14,6)) How many sales were made per category?¶

In [ ]:
# your code goes here

In [ ]:
sales['Product_Category'].value_counts()


Go ahead and show a pie plot with the results:

In [ ]:
# your code goes here

In [ ]:
sales['Product_Category'].value_counts().plot(kind='pie', figsize=(6,6)) How many orders were made per accessory sub-categories?¶

In [ ]:
# your code goes here

In [ ]:
accessories = sales.loc[sales['Product_Category'] == 'Accessories', 'Sub_Category'].value_counts()

accessories


Go ahead and show a bar plot with the results:

In [ ]:
# your code goes here

In [ ]:
accessories.plot(kind='bar', figsize=(14,6)) How many orders were made per bike sub-categories?¶

In [ ]:
# your code goes here

In [ ]:
bikes = sales.loc[sales['Product_Category'] == 'Bikes', 'Sub_Category'].value_counts()

bikes


Go ahead and show a pie plot with the results:

In [ ]:
# your code goes here

In [ ]:
bikes.plot(kind='pie', figsize=(6,6)) Which gender has the most amount of sales?¶

In [ ]:
# your code goes here

In [ ]:
sales['Customer_Gender'].value_counts()

In [ ]:
sales['Customer_Gender'].value_counts().plot(kind='bar') How many sales with more than 500 in Revenue were made by men?¶

In [ ]:
# your code goes here

In [ ]:
sales.loc[(sales['Customer_Gender'] == 'M') & (sales['Revenue'] == 500)].shape Get the top-5 sales with the highest revenue¶

In [ ]:
# your code goes here

In [ ]:
sales.sort_values(['Revenue'], ascending=False).head(5) Get the sale with the highest revenue¶

In [ ]:
# your code goes here

In [ ]:
#sales.sort_values(['Revenue'], ascending=False).head(1)

cond = sales['Revenue'] == sales['Revenue'].max()

sales.loc[cond] What is the mean Order_Quantity of orders with more than 10K in revenue?¶

In [ ]:
# your code goes here

In [ ]:
cond = sales['Revenue'] > 10_000

sales.loc[cond, 'Order_Quantity'].mean() What is the mean Order_Quantity of orders with less than 10K in revenue?¶

In [ ]:
# your code goes here

In [ ]:
cond = sales['Revenue'] < 10_000

sales.loc[cond, 'Order_Quantity'].mean() How many orders were made in May of 2016?¶

In [ ]:
# your code goes here

In [ ]:
cond = (sales['Year'] == 2016) & (sales['Month'] == 'May')

sales.loc[cond].shape How many orders were made between May and July of 2016?¶

In [ ]:
# your code goes here

In [ ]:
cond = (sales['Year'] == 2016) & (sales['Month'].isin(['May', 'June', 'July']))

sales.loc[cond].shape


Show a grouped box plot per month with the profit values.

In [ ]:
# your code goes here

In [ ]:
profit_2016 = sales.loc[sales['Year'] == 2016, ['Profit', 'Month']]

profit_2016.boxplot(by='Month', figsize=(14,6)) Add 7.2% TAX on every sale Unit_Price within United States¶

In [ ]:
# your code goes here

In [ ]:
#sales.loc[sales['Country'] == 'United States', 'Unit_Price'] = sales.loc[sales['Country'] == 'United States', 'Unit_Price'] * 1.072

sales.loc[sales['Country'] == 'United States', 'Unit_Price'] *= 1.072 