Remembering what we saw in 1.4. Probability Scale, do the following exercises:
In [1]:
import pandas as pd
import numpy as np
This data set contains booking information for a city hotel and a resort hotel.
In [2]:
dataset = pd.read_csv('hotel_bookings.csv')
dataset.head()
Out[2]:
We only pay attention in the month in which the tourist arrival.
In [3]:
data = dataset.iloc[:,[0,4]]
data.head()
Out[3]:
Is more likely that the people arrive in July or in March?
In [ ]:
In [4]:
data1 = data['arrival_date_month'].value_counts().to_frame()
July = data1.loc['July'][0]/data.shape[0]
March = data1.loc['March'][0]/data.shape[0]
July, March
Out[4]:
Now try a slightly more difficult exercise:
Decide which is the season more likely in which the tourist arrival to a City Hotel.
In [ ]:
In [6]:
data2 = data[data['hotel']== "City Hotel"]
data3 = data2['arrival_date_month'].value_counts().to_frame()
summer = data3.loc['June'][0]/data2.shape[0]+data3.loc['July'][0]/data2.shape[0]+data3.loc['August'][0]/data2.shape[0]
winter = data3.loc['December'][0]/data2.shape[0]+data3.loc['January'][0]/data2.shape[0]+data3.loc['February'][0]/data2.shape[0]
spring = data3.loc['March'][0]/data2.shape[0]+data3.loc['April'][0]/data2.shape[0]+data3.loc['May'][0]/data2.shape[0]
autumn = data3.loc['September'][0]/data2.shape[0]+data3.loc['October'][0]/data2.shape[0]+data3.loc['November'][0]/data2.shape[0]
summer, winter, spring, autumn
Out[6]:
In [ ]:
In [ ]: