# Assignment 1.4.2: Given Two Experiments Decide Which One Is More Likely

Last updated: September 7th, 2020

Remembering what we saw in 1.4. Probability Scale, do the following exercises:

In [1]:
import pandas as pd
import numpy as np


This data set contains booking information for a city hotel and a resort hotel.

In [2]:
dataset = pd.read_csv('hotel_bookings.csv')

Out[2]:
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults ... deposit_type agent company days_in_waiting_list customer_type adr required_car_parking_spaces total_of_special_requests reservation_status reservation_status_date
0 Resort Hotel 0 342 2015 July 27 1 0 0 2 ... No Deposit NaN NaN 0 Transient 0.0 0 0 Check-Out 2015-07-01
1 Resort Hotel 0 737 2015 July 27 1 0 0 2 ... No Deposit NaN NaN 0 Transient 0.0 0 0 Check-Out 2015-07-01
2 Resort Hotel 0 7 2015 July 27 1 0 1 1 ... No Deposit NaN NaN 0 Transient 75.0 0 0 Check-Out 2015-07-02
3 Resort Hotel 0 13 2015 July 27 1 0 1 1 ... No Deposit 304.0 NaN 0 Transient 75.0 0 0 Check-Out 2015-07-02
4 Resort Hotel 0 14 2015 July 27 1 0 2 2 ... No Deposit 240.0 NaN 0 Transient 98.0 0 1 Check-Out 2015-07-03

5 rows × 32 columns

We only pay attention in the month in which the tourist arrival.

In [3]:
data = dataset.iloc[:,[0,4]]

Out[3]:
hotel arrival_date_month
0 Resort Hotel July
1 Resort Hotel July
2 Resort Hotel July
3 Resort Hotel July
4 Resort Hotel July

Is more likely that the people arrive in July or in March?

In [ ]:


In [4]:
data1 = data['arrival_date_month'].value_counts().to_frame()
July = data1.loc['July'][0]/data.shape[0]
March = data1.loc['March'][0]/data.shape[0]
July, March

Out[4]:
(0.10604740765558254, 0.08203367116173883)

Now try a slightly more difficult exercise:

Decide which is the season more likely in which the tourist arrival to a City Hotel.

In [ ]:


In [6]:
data2 = data[data['hotel']== "City Hotel"]
data3 = data2['arrival_date_month'].value_counts().to_frame()
summer = data3.loc['June'][0]/data2.shape[0]+data3.loc['July'][0]/data2.shape[0]+data3.loc['August'][0]/data2.shape[0]
winter = data3.loc['December'][0]/data2.shape[0]+data3.loc['January'][0]/data2.shape[0]+data3.loc['February'][0]/data2.shape[0]
spring = data3.loc['March'][0]/data2.shape[0]+data3.loc['April'][0]/data2.shape[0]+data3.loc['May'][0]/data2.shape[0]
autumn = data3.loc['September'][0]/data2.shape[0]+data3.loc['October'][0]/data2.shape[0]+data3.loc['November'][0]/data2.shape[0]
summer, winter, spring, autumn

Out[6]:
(0.314698096558679,
0.16176730114710702,
0.2794655237615026,
0.24406907853271148)
In [ ]:


In [ ]: