Bayes

Last updated: September 6th, 20202020-09-06Project preview
In [1]:
import pandas as pd
import numpy as np

This dataset comprises the day, outlook, humidity, and wind conditions. The final column is 'Play' and indicates if we can play outside.

In [16]:
toy_dataset = pd.read_csv('Play.csv')
toy_dataset
Out[16]:
Day Outlook Humidity Wind Play
0 D1 sunny high weak no
1 D2 sunny high strong no
2 D3 overcast high weak yes
3 D4 rain high weak yes
4 D5 rain normal weak yes
5 D6 rain normal strong no
6 D7 overcast normal strong yes
7 D8 sunny high weak no
8 D9 sunny normal weak yes
9 D10 rain normal weak yes

We can compute the conditional probabilities and organize the information in a table.

In [60]:
data_Y=toy_dataset[(toy_dataset["Play"]== "yes")]
data_SY = toy_dataset[(toy_dataset["Outlook"] == "sunny") & (toy_dataset["Play"]== "yes")]
data_SN = toy_dataset[(toy_dataset["Outlook"] == "sunny") & (toy_dataset["Play"]== "no")] 
SY = data_SY.shape[0]/data_Y.shape[0]                                   
SN = data_SN.shape[0]/(toy_dataset.shape[0]-data_Y.shape[0])                                 
data_OY = toy_dataset[(toy_dataset["Outlook"] == "overcast") & (toy_dataset["Play"]== "yes")]
data_ON = toy_dataset[(toy_dataset["Outlook"] == "overcast") & (toy_dataset["Play"]== "no")] 
OY = data_OY.shape[0]/data_Y.shape[0]                                   
ON = data_ON.shape[0]/(toy_dataset.shape[0]-data_Y.shape[0]) 
data_RY = toy_dataset[(toy_dataset["Outlook"] == "rain") & (toy_dataset["Play"]== "yes")]
data_RN = toy_dataset[(toy_dataset["Outlook"] == "rain") & (toy_dataset["Play"]== "no")]
RY = data_RY.shape[0]/data_Y.shape[0]                                   
RN = data_RN.shape[0]/(toy_dataset.shape[0]-data_Y.shape[0])                                     
In [61]:
Outlook = {
    "sunny": {
        "yes": SY,
        "no": SN
    },
    "overcast": {
        "yes": OY,
        "no": ON
    },
    "rain": {
        "yes": RY,
        "no": RN
    }
}
In [62]:
df = pd.DataFrame([key for key in Outlook.keys()], columns=['Outlook'])
df['yes'] = [value['yes'] for value in Outlook.values()]
df['no'] = [value['no'] for value in Outlook.values()]
df
Out[62]:
Outlook yes no
0 sunny 0.166667 0.75
1 overcast 0.333333 0.00
2 rain 0.500000 0.25

We want to know if we can play outside given that we know the outlook.

$P(yes|sunny)=\dfrac{P(sunny|yes)\times P(yes)}{P(sunny)}$

Note that the sum of the second column is the probability of playing outside, and the sum of the first row is the probability of being sunny.

In [75]:
(df.loc[0][1]*(df.loc[0][1]+df.loc[1][1]+df.loc[2][1]))/(df.loc[0][1]+df.loc[0][2])
Out[75]:
0.18181818181818182

exercise 1: Calculate $P(no|rain)$

In [ ]:
 

$P(no|rain)=\dfrac{P(rain|no)\times P(no)}{P(rain)}$

In [74]:
(df.loc[2][2]*(df.loc[0][2]+df.loc[1][2]+df.loc[2][2]))/(df.loc[2][1]+df.loc[2][2])
Out[74]:
0.3333333333333333

exercise 2: Calculate $P(yes|overcast)$

In [ ]:
 

$P(yes|overcast)=\dfrac{P(overcast|yes)\times P(yes)}{P(overcast)}$

In [76]:
(df.loc[1][1]*(df.loc[0][1]+df.loc[1][1]+df.loc[2][1]))/(df.loc[1][1]+df.loc[1][2])
Out[76]:
1.0

You can continue to see the probability of playing outside given the other weather conditions.

In [ ]:
 
Notebooks AI
Notebooks AI Profile20060