COVID-19

Last updated: April 9th, 20202020-04-09Project preview

rmotr


COVID-19 Analysis

Now we will put in practice what we just learn on previous lessons.

Our final goal will be to visualize the pandemic covid-19 and it's effects.

Coronavirus (COVID-19) is an infectious disease caused by a newly discovered coronavirus.

We will use COVID-19 dataset, which have 8 numeric features.

  • Lat: Latitude of the location
  • Long: Longitude of the location
  • Date: Date of cumulative report
  • Confirmed: Cumulative number of confirmed cases till this day
  • Deaths: Cumulative number of deaths till this day
  • Recovered:Cumulative number of recovered cases till this day

separator2

Hands on!

 import libraries : Numpy, Pandas, Matplotlib, Seaborn

In [ ]:
# your code goes here
In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd 
%matplotlib inline 

green-divider

 Load the covid_19_clean_complete.csv dataset, and store it into df.

In [ ]:
# your code goes here
In [ ]:
df = pd.read_csv("covid_19_clean_complete.csv", parse_dates = ['Date'])

Show the columns name of the resulting df.

In [ ]:
# your code goes here
In [ ]:
df.columns

green-divider

Data exploration

Let's first see some descriptive statistics of the data:

In [ ]:
# your code goes here
In [ ]:
df.describe()

What do you think? Do all the statistics make sense?

In [ ]:
# It is not make sense to calculate descriptive statistics for Lat and Long. 

Now count the number of NaN in the dataset

In [ ]:
# your code goes here
In [ ]:
df.isnull().sum()

Calculate the number of active cases in a new column: 'Active'

In [ ]:
# your code goes here
In [ ]:
# Active Case = confirmed - deaths - recovered
df['Active'] = df['Confirmed'] - df['Deaths'] - df['Recovered']

green-divider

Data visualization and relationships

First we need to make some changes on the date format using datetime library

In [ ]:
from datetime import datetime as dt

df['Date'] = df['Date'].dt.normalize()
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')
In [ ]:
a = df.Date.value_counts().sort_index()
print('The first date is:',a.index[0])
print('The last date is:',a.index[-1])

Visualize the total number of confirmed cases versus time

We need to generate a new dataframe to calculate the number of total cases, and call this 'total_cases'. Note: use groupby.

In [ ]:
# your code goes here
total_cases = None
In [ ]:
total_cases = df.groupby('Date')['Date', 'Confirmed'].sum().reset_index()
total_cases['Date'] = total_cases['Date']

Now plot the time series of the total_cases.

In [ ]:
# your code goes here
plt.figure(figsize= (14,5))

## Need help!
#ax = None
#ax.set(xlabel='Date', ylabel='Total cases')

#plt.xticks(rotation = 90 ,fontsize = 10)
#plt.yticks(fontsize = 15)
#plt.xlabel("Dates",fontsize = 30)
#plt.ylabel('Total cases',fontsize = 30)
#plt.title("Worldwide Confirmed Cases Over Time" , fontsize = 30)
In [ ]:
plt.figure(figsize= (14,5))

ax = sns.pointplot( x = total_cases['Date']  ,y = total_cases['Confirmed'] , color = 'r')
ax.set(xlabel='Dates', ylabel='Total cases')

plt.xticks(rotation = 90 ,fontsize = 10)
plt.yticks(fontsize = 12)
plt.xlabel("Dates",fontsize = 14)
plt.ylabel('Total cases',fontsize = 14)
plt.title("Worldwide Confirmed Cases Over Time" , fontsize = 20)

Another option

In [ ]:
with sns.axes_style('white'):
    g = sns.relplot(x="Date", y="Deaths" ,kind="line", data=df)
    g.fig.autofmt_xdate()
    g.set_xticklabels(step=10)
    plt.title ("Covid-19 Deaths, Year:2020")

green-divider

Visualize the top 10 countries with higher cases

We need a new dataframe 'top_casualities'.

First filter the maximum number of cases for each country

In [ ]:
# your code goes here
top = None
In [ ]:
top = df.loc[df['Date'] == df['Date'].max()]

Now we will use groupby to select the ten first counties with the highest number of cases

In [ ]:
top_casualities = top.groupby(by = 'Country/Region')['Confirmed'].sum().sort_values(ascending = False).head(10).reset_index()

Plot Total cases of the top 20 countries using barplot

In [ ]:
# your code goes here
sns.set(style="darkgrid")

ax = sns.barplot(x = None, y = None)

#for i, (value, name) in enumerate(zip(top_casualities.Confirmed,top_casualities['Country/Region'])):
#    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
    
#ax.set(xlabel='Total cases', ylabel='Country/Region')
In [ ]:
sns.set(style="darkgrid")
plt.figure(figsize= (15,10))

ax = sns.barplot(x = top_casualities['Confirmed'], y = top_casualities['Country/Region'])

for i, (value, name) in enumerate(zip(top_casualities['Confirmed'],top_casualities['Country/Region'])):
    ax.text(value, i-.05, f'{value:,.0f}',  size=10, ha='left',  va='center')
ax.set(xlabel='Total cases', ylabel='Country/Region')

plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.xlabel("Total cases",fontsize = 30)
plt.ylabel('Country',fontsize = 30)
plt.title("Top 10 countries having most confirmed cases" , fontsize = 20)

USA analysis

In [ ]:
us =  df[df['Country/Region'] == 'US']
us = us.groupby(by = 'Date')['Recovered', 'Deaths', 'Confirmed', 'Active'].sum().reset_index()
us = us.iloc[33:].reset_index().drop('index', axis = 1)

Visualize the last ten rows

In [ ]:
# your code goes here
In [ ]:
us.tail(10)

Plot US's active cases over time

In [ ]:
# your code goes here

plt.figure(figsize=(15,5))

sns.pointplot(None)
In [ ]:
plt.figure(figsize=(15,5))
sns.set_color_codes("pastel")
sns.pointplot(us.index ,us.Active, color = 'b')
plt.title("US's Active Cases Over Time" , fontsize = 25)
plt.xlabel('No. of Days', fontsize = 15)
plt.ylabel('Total cases', fontsize = 15)

## Another solution
#plt.figure(figsize=(15,5))

#sns.pointplot(us.Date ,us.Active, color = 'r')
#plt.title("US's Active Cases Over Time" , fontsize = 25)
#plt.xlabel('No. of Days', fontsize = 15)
#plt.ylabel('Total cases', fontsize = 15)
#plt.xticks(rotation = 90 ,fontsize = 10)

Optional : Stacked Bar Chart

A stacked bar graph (or stacked bar chart) is a chart that uses bars to show comparisons between categories of data, but with ability to break down and compare parts of a whole.

In [ ]:
sns.set(style="whitegrid")

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(15, 5))

# Plot the total cases
sns.set_color_codes("pastel")
sns.barplot(us.index ,us.Active +us.Recovered+ us.Deaths,
            label="Active", color="b")

# Plot the recovered
sns.set_color_codes("muted")
sns.barplot(us.index ,us.Recovered+ us.Deaths, 
            label="Recovered", color="g")

# Plot the Deaths
sns.set_color_codes("dark")
sns.barplot(us.index ,us.Deaths, 
            label="Deaths", color="r")
plt.xlabel('No. of Days', fontsize = 14)
plt.ylabel('No. of cases', fontsize = 15)
# Add a legend and informative axis label
ax.legend(ncol=2, loc="upper left", frameon=True)
sns.despine(top=True)

separator2

Notebooks AI
Notebooks AI Profile20060