# Creating categories with Cutting and Binning¶

Sometimes we need to divide a field with a continuous range of data into discrete categories. For example, you might divide age of users as `0-14`

, `15-35`

, `36-60`

, `+60`

.

Although not directly using grouping constructs, it is worth explaining the process of discretization of continuous data. Discretization is a means of slicing up continuous data into a set of "bins", where each bin represents a range of the continuous sample and the items are then placed into the appropriate bin—hence the term "binning".

Discretization in pandas is performed using the `pd.cut()`

and `pd.qcut()`

functions.

## Hands on!¶

```
import pandas as pd
import numpy as np
```

### Define our sample data¶

```
ages = np.append(np.random.randint(0, 99, size=16), [14, 35, 60])
```

```
ages
```

```
bins = [0, 14, 35, 60, 100]
```

### cut¶

When using `cut`

, bins will be **evenly spaced according to the values** themselves and not the frequency of those values.

This function is also useful for going from a continuous variable to a categorical variable. For example, cut could convert ages to groups of age ranges. Supports binning into an equal number of bins, or a pre-specified array of bins.

```
categories = pd.cut(ages, bins)
```

```
categories
```

```
categories.codes
```

```
categories.categories
```

```
categories.value_counts()
```

```
np.sort(ages)
```

Mathematically speaking, the categories created have been split including the right value (for example, age `14`

is included in the first category `(0, 14]`

). You can change which one is the inclusive side with the `right`

parameter. By default, `right`

is `True`

, which makes it the inclusive side.

```
lefty_cats = pd.cut(ages, bins, right=False)
```

```
lefty_cats.value_counts()
```

You can also pass labels to give better names to your bins:

```
categories = pd.cut(ages, bins, labels=['Age 0-14', 'Age 15-35', 'Age 36-60', '+60'])
```

```
categories
```

```
categories.value_counts()
```

### qcut¶

But, what happens if you don't know how many bins you'll employ? You need to split the data in similar sized bins.

The `qcut`

is used to discretize a given variable into **equal-size bins**, using quantiles and the distribution of the data. So, when you ask for quantiles with `qcut`

, the bins will be chosen so that you have the same number of values in each bin.

```
pd.qcut(ages, 4).value_counts()
```

In this case, `qcut`

has chosen the bin "categories" for us, based on the distribution of the data.