Profile picture

Data Science - Class 1

Last updated: June 6th, 20192019-06-06Project preview

rmotr


Exercises

 Bike store sales

purple-divider

Hands on!

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
In [ ]:
sales = pd.read_csv(
    'data/sales_data.csv',
    parse_dates=['Date'])
In [ ]:
sales.head()

green-divider

What's the mean of Customers_Age?

In [ ]:
# your code goes here
In [ ]:
sales['Customer_Age'].mean()

Go ahead and show a density (KDE) and a box plot with the Customer_Age data:

In [ ]:
# your code goes here
In [ ]:
sales['Customer_Age'].plot(kind='kde', figsize=(14,6))
In [ ]:
sales['Customer_Age'].plot(kind='box', vert=False, figsize=(14,6))

green-divider

What's the mean of Order_Quantity?

In [ ]:
# your code goes here
In [ ]:
sales['Order_Quantity'].mean()

Go ahead and show a histogram and a box plot with the Order_Quantity data:

In [ ]:
# your code goes here
In [ ]:
sales['Order_Quantity'].plot(kind='hist', bins=30, figsize=(14,6))
In [ ]:
sales['Order_Quantity'].plot(kind='box', vert=False, figsize=(14,6))

green-divider

 How many sales per year do we have?

In [ ]:
# your code goes here
In [ ]:
sales['Year'].value_counts()

Go ahead and show a pie plot with the previous data:

In [ ]:
# your code goes here
In [ ]:
sales['Year'].value_counts().plot(kind='pie', figsize=(6,6))

green-divider

 How many sales per month do we have?

In [ ]:
# your code goes here
In [ ]:
sales['Month'].value_counts()

Go ahead and show a bar plot with the previous data:

In [ ]:
# your code goes here
In [ ]:
sales['Month'].value_counts().plot(kind='bar', figsize=(14,6))

green-divider

Which country has the most sales quantity of sales?

In [ ]:
# your code goes here
In [ ]:
sales['Country'].value_counts().head(1)
In [ ]:
sales['Country'].value_counts()

Go ahead and show a bar plot of the sales per country:

In [ ]:
# your code goes here
In [ ]:
sales['Country'].value_counts().plot(kind='bar', figsize=(14,6))

green-divider

 Create a list of every product sold

In [ ]:
# your code goes here
In [ ]:
#sales.loc[:, 'Product'].unique()

sales['Product'].unique()

Create a bar plot showing the 10 most sold products (best sellers):

In [ ]:
# your code goes here
In [ ]:
sales['Product'].value_counts().head(10).plot(kind='bar', figsize=(14,6))

green-divider

Can you see any relationship between Unit_Cost and Unit_Price?

Show a scatter plot between both columns.

In [ ]:
# your code goes here
In [ ]:
sales.plot(kind='scatter', x='Unit_Cost', y='Unit_Price', figsize=(6,6))

green-divider

Can you see any relationship between Order_Quantity and Profit?

Show a scatter plot between both columns.

In [ ]:
# your code goes here
In [ ]:
sales.plot(kind='scatter', x='Order_Quantity', y='Profit', figsize=(6,6))

green-divider

Can you see any relationship between Profit per Country?

Show a grouped box plot per country with the profit values.

In [ ]:
# your code goes here
In [ ]:
sales[['Profit', 'Country']].boxplot(by='Country', figsize=(10,6))

green-divider

Can you see any relationship between the Customer_Age per Country?

Show a grouped box plot per country with the customer age values.

In [ ]:
# your code goes here
In [ ]:
sales[['Customer_Age', 'Country']].boxplot(by='Country', figsize=(10,6))

green-divider

Add and calculate a new Calculated_Date column

Use Day, Month, Year to create a Date column (YYYY-MM-DD).

In [ ]:
# your code goes here
In [ ]:
sales['Calculated_Date'] = sales[['Year', 'Month', 'Day']].apply(lambda x: '{}-{}-{}'.format(x[0], x[1], x[2]), axis=1)

sales['Calculated_Date'].head()

green-divider

Parse your Calculated_Date column into a datetime object

In [ ]:
# your code goes here
In [ ]:
sales['Calculated_Date'] = pd.to_datetime(sales['Calculated_Date'])

sales['Calculated_Date'].head()

green-divider

How did sales evolve through the years?

Show a line plot using Calculated_Date column as the x-axis and the count of sales as the y-axis.

In [ ]:
# your code goes here
In [ ]:
sales['Calculated_Date'].value_counts().plot(kind='line', figsize=(14,6))

green-divider

 Increase 50 U$S revenue to every sale

In [ ]:
# your code goes here
In [ ]:
#sales['Revenue'] = sales['Revenue'] + 50

sales['Revenue'] += 50

green-divider

How many orders were made in Canada or France?

In [ ]:
# your code goes here
In [ ]:
sales.loc[(sales['Country'] == 'Canada') | (sales['Country'] == 'France')].shape[0]

green-divider

How many Bike Racks orders were made from Canada?

In [ ]:
# your code goes here
In [ ]:
sales.loc[(sales['Country'] == 'Canada') & (sales['Sub_Category'] == 'Bike Racks')].shape[0]

green-divider

How many orders were made in each region (state) of France?

In [ ]:
# your code goes here
In [ ]:
france_states = sales.loc[sales['Country'] == 'France', 'State'].value_counts()

france_states

Go ahead and show a bar plot with the results:

In [ ]:
# your code goes here
In [ ]:
france_states.plot(kind='bar', figsize=(14,6))

green-divider

How many sales were made per category?

In [ ]:
# your code goes here
In [ ]:
sales['Product_Category'].value_counts()

Go ahead and show a pie plot with the results:

In [ ]:
# your code goes here
In [ ]:
sales['Product_Category'].value_counts().plot(kind='pie', figsize=(6,6))

green-divider

How many orders were made per accessory sub-categories?

In [ ]:
# your code goes here
In [ ]:
accessories = sales.loc[sales['Product_Category'] == 'Accessories', 'Sub_Category'].value_counts()

accessories

Go ahead and show a bar plot with the results:

In [ ]:
# your code goes here
In [ ]:
accessories.plot(kind='bar', figsize=(14,6))

green-divider

How many orders were made per bike sub-categories?

In [ ]:
# your code goes here
In [ ]:
bikes = sales.loc[sales['Product_Category'] == 'Bikes', 'Sub_Category'].value_counts()

bikes

Go ahead and show a pie plot with the results:

In [ ]:
# your code goes here
In [ ]:
bikes.plot(kind='pie', figsize=(6,6))

green-divider

Which gender has the most amount of sales?

In [ ]:
# your code goes here
In [ ]:
sales['Customer_Gender'].value_counts()
In [ ]:
sales['Customer_Gender'].value_counts().plot(kind='bar')

green-divider

How many sales with more than 500 in Revenue were made by men?

In [ ]:
# your code goes here
In [ ]:
sales.loc[(sales['Customer_Gender'] == 'M') & (sales['Revenue'] == 500)].shape[0]

green-divider

Get the top-5 sales with the highest revenue

In [ ]:
# your code goes here
In [ ]:
sales.sort_values(['Revenue'], ascending=False).head(5)

green-divider

Get the sale with the highest revenue

In [ ]:
# your code goes here
In [ ]:
#sales.sort_values(['Revenue'], ascending=False).head(1)

cond = sales['Revenue'] == sales['Revenue'].max()

sales.loc[cond]

green-divider

What is the mean Order_Quantity of orders with more than 10K in revenue?

In [ ]:
# your code goes here
In [ ]:
cond = sales['Revenue'] > 10_000

sales.loc[cond, 'Order_Quantity'].mean()

green-divider

What is the mean Order_Quantity of orders with less than 10K in revenue?

In [ ]:
# your code goes here
In [ ]:
cond = sales['Revenue'] < 10_000

sales.loc[cond, 'Order_Quantity'].mean()

green-divider

How many orders were made in May of 2016?

In [ ]:
# your code goes here
In [ ]:
cond = (sales['Year'] == 2016) & (sales['Month'] == 'May')

sales.loc[cond].shape[0]

green-divider

How many orders were made between May and July of 2016?

In [ ]:
# your code goes here
In [ ]:
cond = (sales['Year'] == 2016) & (sales['Month'].isin(['May', 'June', 'July']))

sales.loc[cond].shape[0]

Show a grouped box plot per month with the profit values.

In [ ]:
# your code goes here
In [ ]:
profit_2016 = sales.loc[sales['Year'] == 2016, ['Profit', 'Month']]

profit_2016.boxplot(by='Month', figsize=(14,6))

green-divider

 Add 7.2% TAX on every sale Unit_Price within United States

In [ ]:
# your code goes here
In [ ]:
#sales.loc[sales['Country'] == 'United States', 'Unit_Price'] = sales.loc[sales['Country'] == 'United States', 'Unit_Price'] * 1.072

sales.loc[sales['Country'] == 'United States', 'Unit_Price'] *= 1.072

purple-divider

Notebooks AI
Notebooks AI Profile20060