# Missing Data¶

## Hands on!¶

```
import numpy as np
import pandas as pd
```

What does "missing data" mean? What is a missing value? It depends on the origin of the data and the context it was generated. For example, for a survey, a * Salary* field with an empty value, or a number 0, or an invalid value (a string for example) can be considered "missing data". These concepts are related to the values that Python will consider "Falsy":

```
falsy_values = (0, False, None, '', [], {})
```

For Python, all the values above are considered "falsy":

```
any(falsy_values)
```

Numpy has a special "nullable" value for numbers which is `np.nan`

. It's *NaN*: "Not a number"

```
np.nan
```

The `np.nan`

value is kind of a virus. Everything that it touches becomes `np.nan`

:

```
3 + np.nan
```

```
a = np.array([1, 2, 3, np.nan, np.nan, 4])
```

```
a.sum()
```

```
a.mean()
```

This is better than regular `None`

values, which in the previous examples would have raised an exception:

```
3 + None
```

For a numeric array, the `None`

value is replaced by `np.nan`

:

```
a = np.array([1, 2, 3, np.nan, None, 4], dtype='float')
```

```
a
```

As we said, `np.nan`

is like a virus. If you have any `nan`

value in an array and you try to perform an operation on it, you'll get unexpected results:

```
a = np.array([1, 2, 3, np.nan, np.nan, 4])
```

```
a.mean()
```

```
a.sum()
```

Numpy also supports an "Infinite" type:

```
np.inf
```

Which also behaves as a virus:

```
3 + np.inf
```

```
np.inf / 3
```

```
np.inf / np.inf
```

```
b = np.array([1, 2, 3, np.inf, np.nan, 4], dtype=np.float)
```

```
b.sum()
```

### Checking for `nan`

or `inf`

¶

There are two functions: `np.isnan`

and `np.isinf`

that will perform the desired checks:

```
np.isnan(np.nan)
```

```
np.isinf(np.inf)
```

And the joint operation can be performed with `np.isfinite`

.

```
np.isfinite(np.nan), np.isfinite(np.inf)
```

`np.isnan`

and `np.isinf`

also take arrays as inputs, and return boolean arrays as results:

```
np.isnan(np.array([1, 2, 3, np.nan, np.inf, 4]))
```

```
np.isinf(np.array([1, 2, 3, np.nan, np.inf, 4]))
```

```
np.isfinite(np.array([1, 2, 3, np.nan, np.inf, 4]))
```

*Note: It's not so common to find infinite values. From now on, we'll keep working with only np.nan*

### Filtering them out¶

Whenever you're trying to perform an operation with a Numpy array and you know there might be missing values, you'll need to filter them out before proceeding, to avoid `nan`

propagation. We'll use a combination of the previous `np.isnan`

+ boolean arrays for this purpose:

```
a = np.array([1, 2, 3, np.nan, np.nan, 4])
```

```
a[~np.isnan(a)]
```

Which is equivalent to:

```
a[np.isfinite(a)]
```

And with that result, all the operation can be now performed:

```
a[np.isfinite(a)].sum()
```

```
a[np.isfinite(a)].mean()
```