Profile picture

Intro to Pandas Series

Last updated: May 14th, 20192019-05-14Project preview

rmotr


Intro to Pandas Series

A Series is a one-dimensional array-like object containing a typed sequence of values and an associated array of data labels, called its index.

purple-divider

 Hands on!

In [1]:
import numpy as np
import pandas as pd

green-divider

Series creation

pd.Series' constructor accepts the following parameters:

  • data: (required) has all the data we want to store on the Series and could be an scalar value, a Python sequence or an unidimensional NumPy ndarray.
  • index: (optional), has all the labels that we want to assign to our data values and could be a Python sequence or an unidimensional NumPy ndarray. Default value: np.arange(0, len(data)).
  • dtype: (optional) any NumPy data type.
In [2]:
series = pd.Series([1, 2, 3, 4, 5])
series
Out[2]:
0    1
1    2
2    3
3    4
4    5
dtype: int64

Series have an associated type:

In [3]:
# Show first values of our Series
series.head()
Out[3]:
0    1
1    2
2    3
3    4
4    5
dtype: int64
In [4]:
series.dtype
Out[4]:
dtype('int64')
In [5]:
series = pd.Series([1, 2, 3, 4, 5], dtype=np.float)
series
Out[5]:
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64
In [6]:
series.dtype
Out[6]:
dtype('float64')
In [7]:
series = pd.Series(['a', 'b', 'c', 'd', 'e'])
series
Out[7]:
0    a
1    b
2    c
3    d
4    e
dtype: object
In [8]:
# Using a ndarraynp.array([2, 4, 6, 8, 10
array = np.array([2, 4, 6, 8, 10])
series = pd.Series(array)
series
Out[8]:
0     2
1     4
2     6
3     8
4    10
dtype: int64
In [9]:
# With predefined index
series = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
series
Out[9]:
a    1
b    2
c    3
d    4
e    5
dtype: int64
In [10]:
# Using a dictionary (index will be defined using keys)
series = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}, dtype=np.float64)
series
Out[10]:
a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
dtype: float64

green-divider

Series attributes

These are the most common attributes to get information about a Series:

In [11]:
series = pd.Series(data=[1, 2, 3, 4, 5],
                   index=['a', 'b', 'c', 'd', 'e'],
                   dtype=np.float64)
series
Out[11]:
a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
dtype: float64
In [12]:
# Type of our Series
series.dtype
Out[12]:
dtype('float64')
In [13]:
# Values of a series
series.values
Out[13]:
array([1., 2., 3., 4., 5.])
In [14]:
type(series.values)
Out[14]:
numpy.ndarray
In [15]:
# Index of a series
series.index
Out[15]:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
In [16]:
# Dimension of the Series
series.ndim
Out[16]:
1
In [17]:
# Shape of the Series
series.shape
Out[17]:
(5,)
In [18]:
# Number of Series elements
series.size
Out[18]:
5

green-divider

The Group of Seven

We'll start analyzing "The Group of Seven". Which is a political formed by Canada, France, Germany, Italy, Japan, the United Kingdom and the United States. We'll start by analyzing population, and for that, we'll use a pandas.Series object.

In [19]:
# In millions
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])

g7_pop
Out[19]:
0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

Someone might not know we're representing population in millions of inhabitants. Series can have a name, to better document the purpose of the Series:

In [20]:
g7_pop.name = 'G7 Population in millions'

g7_pop
Out[20]:
0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

Series are pretty similar to numpy arrays:

In [21]:
g7_pop.dtype
Out[21]:
dtype('float64')
In [22]:
type(series.values)
Out[22]:
numpy.ndarray
In [23]:
g7_pop.ndim
Out[23]:
1
In [24]:
g7_pop.shape
Out[24]:
(7,)
In [25]:
g7_pop.size
Out[25]:
7

And they look like simple Python lists or Numpy Arrays. But they're actually more similar to Python dicts.

In [26]:
g7_pop
Out[26]:
0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64
In [27]:
g7_pop.index
Out[27]:
RangeIndex(start=0, stop=7, step=1)

Assigning Series indexes

In contrast to lists, we can explicitly define the index:

In [28]:
g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]
In [29]:
g7_pop
Out[29]:
Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

Compare it with the following table:

image

Removing indexes

We can also remove current indexes from our Series, going back to the original indexes. To do that we use the reset_index() method with drop=True parameter:

In [30]:
g7_pop
Out[30]:
Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64
In [31]:
g7_pop.reset_index(drop=True)
Out[31]:
0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64
In [32]:
g7_pop
Out[32]:
Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

Note that reset_index() will return a new Series, so if we want to keep it we need to assign it to a variable, or use inplace=True parameter to modify the original Series.

In [33]:
g7_pop.reset_index(drop=True, inplace=True)
In [34]:
g7_pop
Out[34]:
0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

Creating a Series with indexes already

We can create a new Series with its indexes labels in a single step:

In [35]:
values = [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523]
indexes = ['Canada', 'France', 'Germany', 'Italy',
           'Japan', 'United Kingdom', 'United States']

pd.Series(values,
          index=indexes,
          name='G7 Population in millions')
Out[35]:
Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

 Creating a Series from a data dictionary

We can say that Series look like "ordered dictionaries". We can actually create Series out of dictionaries:

In [36]:
data_dic = {
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}

g7_pop = pd.Series(data_dic,
                   name='G7 Population in millions')
In [37]:
g7_pop
Out[37]:
Canada             35.467
France             63.951
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: G7 Population in millions, dtype: float64

Creating a Series out of other Series

You can also create Series out of other series, specifying indexes:

In [38]:
pd.Series(g7_pop,
          index=['France', 'Germany', 'Italy', 'Spain'])
Out[38]:
France     63.951
Germany    80.940
Italy      60.665
Spain         NaN
Name: G7 Population in millions, dtype: float64

purple-divider

Notebooks AI
Notebooks AI Profile20060