In [32]:

```
import pandas as pd
import numpy as np
```

In [33]:

```
toy_dataset = pd.read_csv('toy_dataset.csv')
toy_dataset.head()
```

Out[33]:

**Example 1:** Let's estimate the probabilities for picking a person who lives in Dallas calculating the proportion of times we get this result.
We select for example 100 rows aleatory:

In [34]:

```
data = toy_dataset.sample(120)
data
```

Out[34]:

We take the rows that the city is Dallas:

In [35]:

```
data1 = data[data["City"]=="Dallas"]
```

In [36]:

```
data1.shape
```

Out[36]:

There are `data1.shape[0]`

rows in this conditions

In [37]:

```
proportion = data1.shape[0]/120
proportion
```

Out[37]:

In [38]:

```
round(proportion, 2) #round the answer to 2 decimal
```

Out[38]:

or

In [39]:

```
proportion_ = len(data1)/120
round(proportion_, 2)
```

Out[39]:

**Example 2:** Now we can calculate the proportion of person or rows of *data* who have more than 30 years:

In [40]:

```
data2 = data[data["Age"]>30]
```

In [41]:

```
Proportion = len(data2)/120
round(Proportion, 2) #round the answer to 2 decimal
```

Out[41]:

**Exercise 1:** Estimate the probability with 1000 trials of pick a person they're earning less than $50000. Round your answer to 1 decimal place.

In [ ]:

```
```

In [59]:

```
data_1000 = toy_dataset.sample(1000)
data3 = data_1000[data_1000["Income"]<50000]
proportion = data3.shape[0]/1000
print("Empirical probability: ",round(proportion, 1))
```

In [58]:

```
def people_earning_less_than_50000(sample):
income_50000 = 0
for v in sample.values:
if v[0]<50000:
income_50000 += 1
return income_50000
data4 = toy_dataset.sample(1000)['Income'].to_frame()
proportion = people_earning_less_than_50000(data4)/1000
print("Empirical probability: ",round(proportion,1))
```

**Exercise 2:** Estimate the probability with 1000 trials of pick three people and at least one of them being from Boston. Round your answer to 1 decimal place.

In [ ]:

```
```

In [57]:

```
at_least_one_from_Boston = 0
for n in range(1000):
if 'Boston' in toy_dataset.sample(3)['City'].values:
at_least_one_from_Boston += 1
print("Number of experiment that have at least one from Boston:",at_least_one_from_Boston)
proportion_Boston = at_least_one_from_Boston/1000
print("Empirical probability: ",proportion_Boston)
```

In [46]:

```
#Using that we haven't seen yet
data6 = toy_dataset.sample(1000)['City'].value_counts().to_frame()
data6
```

Out[46]:

In [47]:

```
proportion_Boston = data6.loc['Boston'][0]/1000
proportion_Boston
```

Out[47]:

In [48]:

```
proportion_NoBoston = (data6.loc['New York City'][0]+data6.loc['Los Angeles'][0]+data6.loc['Dallas'][0]+data6.loc['Mountain View'][0]+data6.loc['Austin'][0]+data6.loc['Washington D.C.'][0]+data6.loc['San Diego'][0])/1000
proportion_NoBoston
```

Out[48]:

In [49]:

```
(proportion_Boston)*(proportion_NoBoston)**2+(proportion_Boston)**2*(proportion_NoBoston)+(proportion_Boston)**3
```

Out[49]:

In [ ]:

```
```

In [ ]:

```
```

In [ ]:

```
```

In [ ]:

```
```

In [ ]:

```
```

In [ ]:

```
```