Formalising Independence

Last updated: September 10th, 20202020-09-10Project preview
In [3]:
import pandas as pd
import numpy as np
In [5]:
toy_dataset = pd.read_csv('Churn Modeling.csv')
toy_dataset.head() 
Out[5]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0

We calculated in the previous assignment the conditional probability of a person withdrawing their account given that it is French.

In [6]:
data_AB = toy_dataset[(toy_dataset["Geography"] == "France") & (toy_dataset["Exited"]==1)]
data_A= toy_dataset[toy_dataset["Exited"]==1]
data_AB.shape[0]/data_A.shape[0]
Out[6]:
0.39764359351988215

Now, $P(Exited | French)=P(Exited)?$

In [9]:
data_AB = toy_dataset[(toy_dataset["Geography"] == "France") & (toy_dataset["Exited"]==1)]
data_A= toy_dataset[toy_dataset["Geography"] == "France"]
data_B= toy_dataset[toy_dataset["Exited"]==1]
data_AB.shape[0]/data_A.shape[0], data_B.shape[0]/toy_dataset.shape[0]
Out[9]:
(0.16154766653370561, 0.2037)

$ P(Exited | French) \neq P(Exited) \Rightarrow \text{ The event aren't independent.}$

Using all the conditional probability calculated in the previous assignment, proof if the event are independent.

exercise 1: $P(Exited | Male)=P(Exited)?$ $P(Exited | Female)=P(Exited)?$

In [ ]:
 
In [15]:
data_FE = toy_dataset[(toy_dataset["Gender"] == "Female") & (toy_dataset["Exited"]==1)]
data_F= toy_dataset[toy_dataset["Gender"] == "Female"]
data_ME = toy_dataset[(toy_dataset["Gender"] == "Male") & (toy_dataset["Exited"]==1)]
PFE=data_FE.shape[0]/data_F.shape[0]
PME=data_ME.shape[0]/(toy_dataset.shape[0]-data_F.shape[0])
PFE, PME,  data_B.shape[0]/toy_dataset.shape[0]
Out[15]:
(0.2507153863086066, 0.16455928165658787, 0.2037)

exercise 2: $P(Exited | ES<100000)=P(Exited)?$

In [ ]:
 
In [16]:
data_LE = toy_dataset[(toy_dataset["EstimatedSalary"] < 100000) & (toy_dataset["Exited"]==1)]
data_L= toy_dataset[toy_dataset["EstimatedSalary"] < 100000]
data_PE = toy_dataset[(toy_dataset["EstimatedSalary"] >= 100000) & (toy_dataset["Exited"]==1)]
PLE=data_LE.shape[0]/data_L.shape[0]
PPE=data_PE.shape[0]/(toy_dataset.shape[0]-data_L.shape[0])
PPE, PLE, data_B.shape[0]/toy_dataset.shape[0]
Out[16]:
(0.20838323353293414, 0.19899799599198398, 0.2037)

exercise 3: $P(Exited | CS>750)=P(Exited)?$

In [ ]:
 
In [17]:
data_CP750 = toy_dataset[(toy_dataset["CreditScore"] > 750) & (toy_dataset["Exited"]==1)]
data_P= toy_dataset[toy_dataset["CreditScore"] > 750]
data_CL750 = toy_dataset[(toy_dataset["CreditScore"] <= 750) & (toy_dataset["Exited"]==1)]
CL750=data_CL750.shape[0]/(toy_dataset.shape[0]-data_L.shape[0])
CP750=data_CP750.shape[0]/data_P.shape[0]
CP750, CL750, data_B.shape[0]/toy_dataset.shape[0]
Out[17]:
(0.19586983729662077, 0.3441117764471058, 0.2037)
In [ ]:
 
Notebooks AI
Notebooks AI Profile20060