Profile picture

Pandas Indexes

Last updated: October 31st, 20192019-10-31Project preview

rmotr


Pandas Indexes

The Index object in Pandas is an structure in itself, and it can be thought of either as an immutable array or as an ordered set.

Those views have some interesting consequences in the operations available on Index objects.

purple-divider

Hands on!

In [104]:
import numpy as np
import pandas as pd

green-divider

What is a Pandas Index?

A pandas Index is an immutable NumPy array implementing an ordered, sliceable set. It's used to store axis labels for all pandas objects (Series and DataFrames).

As the Index operates like an array, we can use standard Python indexing notation to to retrieve values or slices:

In [105]:
my_index = pd.Index([4, 6, 7, 10, 14])
In [106]:
my_index
Out[106]:
Int64Index([4, 6, 7, 10, 14], dtype='int64')
In [107]:
my_index[0]
Out[107]:
4
In [108]:
my_index[:-1]
Out[108]:
Int64Index([4, 6, 7, 10], dtype='int64')

Index objects also have many of the attributes familiar from NumPy arrays:

In [109]:
my_index.size
Out[109]:
5
In [110]:
my_index.shape
Out[110]:
(5,)
In [111]:
my_index.dtype
Out[111]:
dtype('int64')

green-divider

Index as Immutable Array

One difference between Index objects and NumPy arrays is that indices are immutable: that is, they cannot be modified via the normal means:

In [112]:
my_index[3]
Out[112]:
10
In [113]:
my_index[3] = 11
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-113-46d1b59afee0> in <module>
----> 1 my_index[3] = 11

~/.virtualenvs/jupyter-ext/lib/python3.6/site-packages/pandas/core/indexes/base.py in __setitem__(self, key, value)
   4258 
   4259     def __setitem__(self, key, value):
-> 4260         raise TypeError("Index does not support mutable operations")
   4261 
   4262     def __getitem__(self, key):

TypeError: Index does not support mutable operations

An error was raised when we tried to modify our index.

This immutability makes it safter to share indices between multiple DataFrames and arrays, without the potential for nasty side-effects from inadvertent index modification.

green-divider

Index as Ordered Set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic.

Recall that Python has a built-in set object. The Index object follows many of the conventions of this built-in set object, so that unions, intersections, differences, and other combinations can be computed in a familiar way:

In [114]:
my_index
Out[114]:
Int64Index([4, 6, 7, 10, 14], dtype='int64')
In [115]:
my_index + my_index
Out[115]:
Int64Index([8, 12, 14, 20, 28], dtype='int64')
In [116]:
my_index - my_index
Out[116]:
Int64Index([0, 0, 0, 0, 0], dtype='int64')

green-divider

 Indexes in action

The first thing we'll do is load in a DataFrame the data of some orders made in an Australian shop we have in the orders.csv file.

In [117]:
df = pd.read_csv('orders.csv',
                 parse_dates=['order_date', 'ship_date'])
In [118]:
df.head()
Out[118]:
order_date city product_name product_category ship_date cost_price retail_price order_quantity shipping_price
0 2013-08-09 Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
1 2013-08-15 Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
2 2014-09-04 Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
3 2013-03-23 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
4 2013-06-20 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63

Rows and columns both have indexes, rows indices are called as index and for columns its general column names.

In pandas, both Series and DataFrames have indexes. As you can see in the "table" above, pandas has assigned a numeric, autoincremental index automatically to each "row" in our DataFrame. In our case, we know that each row represents an order.

We use indexes for three main reasons:

  • Identification: index will identify each element uniquely, so we can identify what data we are working with.
  • Selection: indexes will help us find and select data from pandas data structures.
  • Alignment: we can easily order our data based on our indexes.
In [119]:
# row index
df.index
Out[119]:
RangeIndex(start=0, stop=292, step=1)
In [120]:
# column index
df.columns
Out[120]:
Index(['order_date', 'city', 'product_name', 'product_category', 'ship_date',
       'cost_price', 'retail_price', 'order_quantity', 'shipping_price'],
      dtype='object')

green-divider

 Changing DataFrames column index

In [121]:
df.columns = ['Order Date', 'City', 'Product Name', 'Product Category', 'Ship Date',
              'Cost Price', 'Retail Price', 'Order Quantity', 'Shipping Price']
In [122]:
df.head()
Out[122]:
Order Date City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
0 2013-08-09 Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
1 2013-08-15 Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
2 2014-09-04 Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
3 2013-03-23 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
4 2013-06-20 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63

green-divider

 Changing DataFrames row index

Let's go ahead and reassign the index to a more useful one:

In [123]:
df.head()
Out[123]:
Order Date City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
0 2013-08-09 Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
1 2013-08-15 Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
2 2014-09-04 Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
3 2013-03-23 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
4 2013-06-20 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63
In [124]:
df.index
Out[124]:
RangeIndex(start=0, stop=292, step=1)

We can use the values of a column as the new row index:

In [125]:
df = df.set_index(['Order Date'])
In [126]:
df.head()
Out[126]:
City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
Order Date
2013-08-09 Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
2013-08-15 Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
2014-09-04 Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
2013-03-23 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
2013-06-20 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63

Also, we can define a new index.

In this case we'll asign a list of unique UUIDs as index:

In [127]:
import uuid

uuid.uuid4()
Out[127]:
UUID('54bad24f-4b90-44f1-a57a-545904545f96')
In [143]:
df.index = [str(uuid.uuid4()) for row in range(df.shape[0])]
In [144]:
df.head()
Out[144]:
City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
41a32a83-6f05-402e-b247-3f9cd295eb2a Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
743abe7f-0d4c-409b-98db-83ac7c31bb3b Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
ac976148-d14d-4a56-ba77-c45457925b2f Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
38b35717-e24f-4cf8-9633-75f9f36bb747 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
cabb205e-25a8-40da-90cd-6a0752bd5f16 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63
In [145]:
df.index[0:10]
Out[145]:
Index(['41a32a83-6f05-402e-b247-3f9cd295eb2a',
       '743abe7f-0d4c-409b-98db-83ac7c31bb3b',
       'ac976148-d14d-4a56-ba77-c45457925b2f',
       '38b35717-e24f-4cf8-9633-75f9f36bb747',
       'cabb205e-25a8-40da-90cd-6a0752bd5f16',
       '0f38ce45-f449-453f-99ea-e90ea918458f',
       '4c5eacbe-c011-451e-b9af-c89ed5ba4221',
       '4088e4eb-c906-4598-8c2f-a6a8a6f112e6',
       '45708171-7630-4478-b3cf-2fc04ebd2048',
       '4bcecbd2-6aa5-4933-8e31-70d6b9ebffd6'],
      dtype='object')

green-divider

 Removing indexes

We can also discard current indexes from our DataFrame at any time including it as a new column of our data. To do that we use the reset_index() method. New index will be a numerical sequence.

Note 1: that reset_index() will return a new DataFrame, so if we want to keep it we need to assign it to a variable.

Note 2: also, if we don't want to keep the old index as a column we can drop it using the drop=True parameter.

In [146]:
df = df.reset_index()

df.head()
Out[146]:
index City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
0 41a32a83-6f05-402e-b247-3f9cd295eb2a Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
1 743abe7f-0d4c-409b-98db-83ac7c31bb3b Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
2 ac976148-d14d-4a56-ba77-c45457925b2f Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
3 38b35717-e24f-4cf8-9633-75f9f36bb747 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
4 cabb205e-25a8-40da-90cd-6a0752bd5f16 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63

Also, we can restore a set of columns as DataFrame index:

In [147]:
df = df.set_index(['index'])

df.head()
Out[147]:
City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
index
41a32a83-6f05-402e-b247-3f9cd295eb2a Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
743abe7f-0d4c-409b-98db-83ac7c31bb3b Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
ac976148-d14d-4a56-ba77-c45457925b2f Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
38b35717-e24f-4cf8-9633-75f9f36bb747 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
cabb205e-25a8-40da-90cd-6a0752bd5f16 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63

green-divider

 Changing DataFrames row and column index at once

In [148]:
df.head()
Out[148]:
City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
index
41a32a83-6f05-402e-b247-3f9cd295eb2a Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
743abe7f-0d4c-409b-98db-83ac7c31bb3b Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
ac976148-d14d-4a56-ba77-c45457925b2f Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
38b35717-e24f-4cf8-9633-75f9f36bb747 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
cabb205e-25a8-40da-90cd-6a0752bd5f16 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63
In [154]:
df.rename(
    columns={
        'Product Category': 'Category',
        'Order Quantity': 'Quantity'
    },
    index={
        '41a32a83-6f05-402e-b247-3f9cd295eb2a': str(uuid.uuid4()),
        '743abe7f-0d4c-409b-98db-83ac7c31bb3b': str(uuid.uuid4())
    }
)
Out[154]:
City Product Name Category Ship Date Cost Price Retail Price Quantity Shipping Price
index
cb3c3752-1dc0-46b0-a3d2-8d2ceb9cdfc0 Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
372c1a02-48d9-4474-99dd-9e20418afb6f Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
ac976148-d14d-4a56-ba77-c45457925b2f Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
38b35717-e24f-4cf8-9633-75f9f36bb747 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
cabb205e-25a8-40da-90cd-6a0752bd5f16 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63
... ... ... ... ... ... ... ... ...
9488a9e0-0527-443b-967a-5dacf3dcf28f Sydney Binding Machine Supplies Office Supplies 2013-12-29 18.38 29.17 37 6.27
cca8f4ba-1489-4160-bc28-52babcba7ef3 Sydney Artisan Poly Binder Pockets Office Supplies 2016-07-07 2.26 3.58 36 5.47
bf4b4256-5061-4931-b39b-96d730ef1bc5 Sydney Smiths Pen Style Liquid Stix; Assorted (yellow... Office Supplies 2015-11-30 3.88 6.47 16 1.22
b1a43ea8-fdb9-4ced-bb04-c952c9acfc0f Sydney Artisan 487 Labels Office Supplies 2016-08-13 2.29 3.69 13 0.50
941d74f7-4594-465b-a5f6-618a488a9d02 Sydney Steady Major Accent Highlighters Office Supplies 2013-11-08 3.75 7.08 29 2.35

292 rows × 8 columns

In [156]:
df.rename(index=str.upper)
Out[156]:
City Product Name Product Category Ship Date Cost Price Retail Price Order Quantity Shipping Price
index
41A32A83-6F05-402E-B247-3F9CD295EB2A Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
743ABE7F-0D4C-409B-98DB-83AC7C31BB3B Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
AC976148-D14D-4A56-BA77-C45457925B2F Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
38B35717-E24F-4CF8-9633-75F9F36BB747 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
CABB205E-25A8-40DA-90CD-6A0752BD5F16 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63
... ... ... ... ... ... ... ... ...
9488A9E0-0527-443B-967A-5DACF3DCF28F Sydney Binding Machine Supplies Office Supplies 2013-12-29 18.38 29.17 37 6.27
CCA8F4BA-1489-4160-BC28-52BABCBA7EF3 Sydney Artisan Poly Binder Pockets Office Supplies 2016-07-07 2.26 3.58 36 5.47
BF4B4256-5061-4931-B39B-96D730EF1BC5 Sydney Smiths Pen Style Liquid Stix; Assorted (yellow... Office Supplies 2015-11-30 3.88 6.47 16 1.22
B1A43EA8-FDB9-4CED-BB04-C952C9ACFC0F Sydney Artisan 487 Labels Office Supplies 2016-08-13 2.29 3.69 13 0.50
941D74F7-4594-465B-A5F6-618A488A9D02 Sydney Steady Major Accent Highlighters Office Supplies 2013-11-08 3.75 7.08 29 2.35

292 rows × 8 columns

In [158]:
df.rename(columns=lambda x: x.lower()).head()
Out[158]:
city product name product category ship date cost price retail price order quantity shipping price
index
41a32a83-6f05-402e-b247-3f9cd295eb2a Sydney Cando S750 Color Inkjet Printer Technology 2013-08-16 75.00 120.97 35 26.30
743abe7f-0d4c-409b-98db-83ac7c31bb3b Sydney Steady Liquid Accent Tank-Style Highlighters Office Supplies 2013-08-16 1.31 2.84 13 0.93
ac976148-d14d-4a56-ba77-c45457925b2f Sydney Apex Preferred Stainless Steel Scissors Office Supplies 2014-09-05 2.50 5.68 23 3.60
38b35717-e24f-4cf8-9633-75f9f36bb747 Sydney Smiths Gold Paper Clips Office Supplies 2013-03-26 1.82 2.98 22 1.58
cabb205e-25a8-40da-90cd-6a0752bd5f16 Sydney OIC Colored Binder Clips, Assorted Sizes Office Supplies 2013-06-21 2.29 3.58 10 1.63

purple-divider

Notebooks AI
Notebooks AI Profile20060