Profile picture

šŸ‘‰ From Excel to Pandas: Pivot Tables

Last updated: May 24th, 20192019-05-24Project preview

Pivot Tables are one of the most powerful features in MS Excel. In reality, they're just a simple summarizing (or grouping) technique. People familiar with SQL will quickly realize that a Pivot Table is just a GROUP BY operation.

Pandas has its own implementation of Pivot Tables with the function pivot_table. The same functionality can usually be replicated using a regular groupby() operation, but the function pivot_table is usually a lot more simple to use.

We recommend you to check the documentation page of pivot_table.

InĀ [1]:
import pandas as pd
import numpy as np
InĀ [2]:
df = pd.read_excel('sales.xlsx', sheet_name='Data')
InĀ [3]:
df.head()
Out[3]:
Date Month Year Customer ID Customer Age Age Group Customer Gender Country State Product_Category Sub Category Product Frame Size Order Quantity Unit Cost Unit Price Cost Revenue Profit
0 2013-11-26 November 2013 11019 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike NaN 8 45 120 360 950 590
1 2015-11-26 November 2015 11019 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike NaN 8 45 120 360 950 590
2 2014-03-23 March 2014 11039 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 23 45 120 1035 2401 1366
3 2016-03-23 March 2016 11039 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 20 45 120 900 2088 1188
4 2014-05-15 May 2014 11046 47 Adults (35-64) F Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 4 45 120 180 418 238
InĀ [4]:
df.shape
Out[4]:
(113036, 19)

Step 1 - Summary of revenueĀ¶

image

It's time to get started with our pivot tables. The first step is to make a simple aggregation that calculates the total revenue. With pandas we don't need a pivot table, we can just do:

InĀ [6]:
df['Revenue'].sum()
Out[6]:
85271008

Step 2 - Revenue per countryĀ¶

image

Now is when aggregations get a little bit more interesting. To calculate the revenue per country, we will have to group per country:

InĀ [7]:
df['Revenue'].groupby(df['Country']).sum().to_frame()
Out[7]:
Revenue
Country
Australia 21302059
Canada 7935738
France 8432872
Germany 8978596
United Kingdom 10646196
United States 27975547

As we mentioned before, we can also use the pivot_table function from pandas, which takes which values to aggregate, and what column serves as the index:

InĀ [8]:
pd.pivot_table(df, values='Revenue', index=['Country'], aggfunc=np.sum)
Out[8]:
Revenue
Country
Australia 21302059
Canada 7935738
France 8432872
Germany 8978596
United Kingdom 10646196
United States 27975547

Something important to mention here is that we're always specifying the aggregation function np.sum. Changing this function will give you other summary statistics (for example mean, max, min, etc).

Step 3 - Revenue per country, discriminated per yearĀ¶

image

We'll take the previous result, and also group it per year. As a summary, the Group By operation is by both, Country and Year. The order here matters:

InĀ [9]:
df['Revenue'].groupby([df['Country'], df['Year']]).sum().unstack()
Out[9]:
Year 2011 2012 2013 2014 2015 2016
Country
Australia 2529914 2557511 4107470 2802842 5712339 3591983
Canada 789798 835527 1443537 1471522 1731954 1663400
France 946624 962153 1431870 1411100 1877928 1803197
Germany 833603 835188 1452704 1702508 1931091 2223502
United Kingdom 823481 834885 1868796 2014044 2484181 2620809
United States 3041468 3150719 4935660 4750708 6286498 5810494

Using the pivot_table function is probably a lot simpler, in this case we just need to add the parameter columns:

InĀ [10]:
pd.pivot_table(df, values='Revenue', index=['Country'], columns=['Year'], aggfunc=np.sum)
Out[10]:
Year 2011 2012 2013 2014 2015 2016
Country
Australia 2529914 2557511 4107470 2802842 5712339 3591983
Canada 789798 835527 1443537 1471522 1731954 1663400
France 946624 962153 1431870 1411100 1877928 1803197
Germany 833603 835188 1452704 1702508 1931091 2223502
United Kingdom 823481 834885 1868796 2014044 2484181 2620809
United States 3041468 3150719 4935660 4750708 6286498 5810494

Step 4 - Revenue *and Cost* per country, discriminated per yearĀ¶

image

Let's add one more value to calculate: Cost.

InĀ [11]:
df[['Cost', 'Revenue']].groupby([df['Country'], df['Year']]).sum().unstack()
Out[11]:
Cost Revenue
Year 2011 2012 2013 2014 2015 2016 2011 2012 2013 2014 2015 2016
Country
Australia 1819448 1841420 2767650 1807369 3909993 2380149 2529914 2557511 4107470 2802842 5712339 3591983
Canada 486304 515562 727385 733386 903574 852231 789798 835527 1443537 1471522 1731954 1663400
France 669251 682302 917013 889401 1228631 1165992 946624 962153 1431870 1411100 1877928 1803197
Germany 563115 563908 897526 1017319 1214589 1362144 833603 835188 1452704 1702508 1931091 2223502
United Kingdom 519664 525968 1079965 1131345 1466754 1508647 823481 834885 1868796 2014044 2484181 2620809
United States 2025805 2094830 2891290 2709817 3771887 3408274 3041468 3150719 4935660 4750708 6286498 5810494

Using pivot_table, we just need to add the new column we want to calculate values for:

InĀ [12]:
pd.pivot_table(df, values=['Revenue', 'Cost'], index=['Country'], columns=['Year'], aggfunc=np.sum)
Out[12]:
Cost Revenue
Year 2011 2012 2013 2014 2015 2016 2011 2012 2013 2014 2015 2016
Country
Australia 1819448 1841420 2767650 1807369 3909993 2380149 2529914 2557511 4107470 2802842 5712339 3591983
Canada 486304 515562 727385 733386 903574 852231 789798 835527 1443537 1471522 1731954 1663400
France 669251 682302 917013 889401 1228631 1165992 946624 962153 1431870 1411100 1877928 1803197
Germany 563115 563908 897526 1017319 1214589 1362144 833603 835188 1452704 1702508 1931091 2223502
United Kingdom 519664 525968 1079965 1131345 1466754 1508647 823481 834885 1868796 2014044 2484181 2620809
United States 2025805 2094830 2891290 2709817 3771887 3408274 3041468 3150719 4935660 4750708 6286498 5810494

Step 5 - Revenue per country *and per state*, also discriminated per yearĀ¶

image

We can add an extra level of complexity, in which we'll group, not just per country, but also per states in those countries:

InĀ [13]:
df['Revenue'].groupby([df['Country'], df['State'], df['Year']]).sum().unstack()
Out[13]:
Year 2011 2012 2013 2014 2015 2016
Country State
Australia New South Wales 1175012.0 1171056.0 1697649.0 1229432.0 2354993.0 1575353.0
Queensland 622846.0 642944.0 975721.0 618695.0 1392224.0 813837.0
South Australia 138199.0 141107.0 307194.0 179033.0 413574.0 218167.0
Tasmania 72073.0 61490.0 134687.0 58222.0 187793.0 65919.0
Victoria 521784.0 540914.0 992219.0 717460.0 1363755.0 918707.0
Canada Alberta 16553.0 11638.0 7348.0 6161.0 7468.0 7635.0
British Columbia 773245.0 823889.0 1435633.0 1465361.0 1723997.0 1655765.0
Ontario NaN NaN 556.0 NaN 489.0 NaN
France Charente-Maritime 9431.0 9286.0 24848.0 12605.0 34984.0 14893.0
Essonne 110012.0 116489.0 167261.0 161419.0 210373.0 207978.0
Garonne (Haute) 22639.0 18047.0 23372.0 38383.0 35313.0 46152.0
Hauts de Seine 98740.0 97361.0 169771.0 149672.0 237402.0 184194.0
Loir et Cher 7811.0 7330.0 14652.0 14674.0 19184.0 18882.0
Loiret 42157.0 35924.0 60502.0 44273.0 89767.0 52549.0
Moselle 15772.0 21545.0 45622.0 58071.0 61184.0 64029.0
Nord 142918.0 141501.0 238297.0 213140.0 306578.0 282218.0
Pas de Calais 10383.0 8306.0 3380.0 11952.0 3288.0 13104.0
Seine (Paris) 164430.0 196499.0 292355.0 261186.0 384638.0 344171.0
Seine Saint Denis 135079.0 133580.0 194534.0 197854.0 242820.0 227767.0
Seine et Marne 57622.0 40888.0 46961.0 61394.0 63613.0 102807.0
Somme 20838.0 20477.0 11331.0 16607.0 11458.0 21468.0
Val d'Oise 8480.0 5653.0 32323.0 30401.0 38878.0 32560.0
Val de Marne 16445.0 12819.0 11950.0 16398.0 12393.0 23441.0
Yveline 83867.0 96448.0 94711.0 123071.0 126055.0 166984.0
Germany Bayern 117195.0 120829.0 193034.0 267226.0 278047.0 365654.0
Brandenburg 7268.0 11249.0 34355.0 33486.0 48520.0 34892.0
Hamburg 124618.0 137850.0 277548.0 281576.0 368779.0 368634.0
Hessen 230224.0 230652.0 316229.0 386423.0 409265.0 500598.0
Nordrhein-Westfalen 155142.0 141564.0 288261.0 365008.0 395595.0 459330.0
Saarland 199156.0 193044.0 343277.0 368789.0 430885.0 494394.0
United Kingdom England 823481.0 834885.0 1868796.0 2014044.0 2484181.0 2620809.0
United States Alabama NaN NaN 50.0 54.0 54.0 52.0
Arizona NaN NaN 2386.0 NaN 6993.0 NaN
California 1870178.0 1981377.0 3140474.0 2996200.0 4024256.0 3660135.0
Florida NaN NaN 2924.0 799.0 6593.0 698.0
Georgia 2208.0 1472.0 536.0 80.0 481.0 85.0
Illinois NaN NaN 5750.0 2185.0 10043.0 2163.0
Kentucky NaN NaN 108.0 1773.0 108.0 1767.0
Massachusetts NaN NaN 2042.0 NaN 2042.0 NaN
Minnesota NaN NaN NaN 758.0 NaN 708.0
Mississippi NaN NaN NaN 910.0 NaN 774.0
Missouri NaN NaN 891.0 NaN 848.0 NaN
Montana NaN NaN 1016.0 NaN 989.0 NaN
New York NaN NaN 592.0 2016.0 653.0 1996.0
North Carolina NaN NaN 213.0 NaN 192.0 NaN
Ohio NaN NaN 2337.0 2641.0 2234.0 2535.0
Oregon 384007.0 370269.0 608587.0 602854.0 739265.0 745428.0
South Carolina NaN NaN NaN 703.0 NaN 538.0
Texas 1268.0 1268.0 734.0 1440.0 778.0 1347.0
Utah NaN NaN 1782.0 NaN 2255.0 NaN
Virginia NaN NaN NaN 604.0 NaN 568.0
Washington 772644.0 790751.0 1162664.0 1137600.0 1486221.0 1391630.0
Wyoming 11163.0 5582.0 2574.0 91.0 2493.0 70.0

And using pivot_table just means adding a new item to index:

InĀ [14]:
pd.pivot_table(df, values=['Revenue'], index=['Country', 'State'], columns=['Year'], aggfunc=np.sum)
Out[14]:
Revenue
Year 2011 2012 2013 2014 2015 2016
Country State
Australia New South Wales 1175012.0 1171056.0 1697649.0 1229432.0 2354993.0 1575353.0
Queensland 622846.0 642944.0 975721.0 618695.0 1392224.0 813837.0
South Australia 138199.0 141107.0 307194.0 179033.0 413574.0 218167.0
Tasmania 72073.0 61490.0 134687.0 58222.0 187793.0 65919.0
Victoria 521784.0 540914.0 992219.0 717460.0 1363755.0 918707.0
Canada Alberta 16553.0 11638.0 7348.0 6161.0 7468.0 7635.0
British Columbia 773245.0 823889.0 1435633.0 1465361.0 1723997.0 1655765.0
Ontario NaN NaN 556.0 NaN 489.0 NaN
France Charente-Maritime 9431.0 9286.0 24848.0 12605.0 34984.0 14893.0
Essonne 110012.0 116489.0 167261.0 161419.0 210373.0 207978.0
Garonne (Haute) 22639.0 18047.0 23372.0 38383.0 35313.0 46152.0
Hauts de Seine 98740.0 97361.0 169771.0 149672.0 237402.0 184194.0
Loir et Cher 7811.0 7330.0 14652.0 14674.0 19184.0 18882.0
Loiret 42157.0 35924.0 60502.0 44273.0 89767.0 52549.0
Moselle 15772.0 21545.0 45622.0 58071.0 61184.0 64029.0
Nord 142918.0 141501.0 238297.0 213140.0 306578.0 282218.0
Pas de Calais 10383.0 8306.0 3380.0 11952.0 3288.0 13104.0
Seine (Paris) 164430.0 196499.0 292355.0 261186.0 384638.0 344171.0
Seine Saint Denis 135079.0 133580.0 194534.0 197854.0 242820.0 227767.0
Seine et Marne 57622.0 40888.0 46961.0 61394.0 63613.0 102807.0
Somme 20838.0 20477.0 11331.0 16607.0 11458.0 21468.0
Val d'Oise 8480.0 5653.0 32323.0 30401.0 38878.0 32560.0
Val de Marne 16445.0 12819.0 11950.0 16398.0 12393.0 23441.0
Yveline 83867.0 96448.0 94711.0 123071.0 126055.0 166984.0
Germany Bayern 117195.0 120829.0 193034.0 267226.0 278047.0 365654.0
Brandenburg 7268.0 11249.0 34355.0 33486.0 48520.0 34892.0
Hamburg 124618.0 137850.0 277548.0 281576.0 368779.0 368634.0
Hessen 230224.0 230652.0 316229.0 386423.0 409265.0 500598.0
Nordrhein-Westfalen 155142.0 141564.0 288261.0 365008.0 395595.0 459330.0
Saarland 199156.0 193044.0 343277.0 368789.0 430885.0 494394.0
United Kingdom England 823481.0 834885.0 1868796.0 2014044.0 2484181.0 2620809.0
United States Alabama NaN NaN 50.0 54.0 54.0 52.0
Arizona NaN NaN 2386.0 NaN 6993.0 NaN
California 1870178.0 1981377.0 3140474.0 2996200.0 4024256.0 3660135.0
Florida NaN NaN 2924.0 799.0 6593.0 698.0
Georgia 2208.0 1472.0 536.0 80.0 481.0 85.0
Illinois NaN NaN 5750.0 2185.0 10043.0 2163.0
Kentucky NaN NaN 108.0 1773.0 108.0 1767.0
Massachusetts NaN NaN 2042.0 NaN 2042.0 NaN
Minnesota NaN NaN NaN 758.0 NaN 708.0
Mississippi NaN NaN NaN 910.0 NaN 774.0
Missouri NaN NaN 891.0 NaN 848.0 NaN
Montana NaN NaN 1016.0 NaN 989.0 NaN
New York NaN NaN 592.0 2016.0 653.0 1996.0
North Carolina NaN NaN 213.0 NaN 192.0 NaN
Ohio NaN NaN 2337.0 2641.0 2234.0 2535.0
Oregon 384007.0 370269.0 608587.0 602854.0 739265.0 745428.0
South Carolina NaN NaN NaN 703.0 NaN 538.0
Texas 1268.0 1268.0 734.0 1440.0 778.0 1347.0
Utah NaN NaN 1782.0 NaN 2255.0 NaN
Virginia NaN NaN NaN 604.0 NaN 568.0
Washington 772644.0 790751.0 1162664.0 1137600.0 1486221.0 1391630.0
Wyoming 11163.0 5582.0 2574.0 91.0 2493.0 70.0

Step 6 - Filtering by some other columns using pd.queryĀ¶

image

Finally we can filter products using the standard filtering methods in pandas. I'm going to use the query method, although I could be using the regular .loc[CONDITION] one we're used to:

InĀ [15]:
df['Product_Category'].value_counts()
Out[15]:
Accessories    70120
Bikes          25982
Clothing       16934
Name: Product_Category, dtype: int64
InĀ [16]:
df.query('Product_Category == "Bikes"')['Revenue'].sum()
Out[16]:
61782134

The standard pandas method we're used to:

InĀ [17]:
df.loc[df['Product_Category'] == 'Bikes', 'Revenue'].sum()
Out[17]:
61782134

Another example with query:

InĀ [18]:
df.query('Product_Category in ["Accessories", "Clothing"]')['Revenue'].sum()
Out[18]:
23488874

And finally, we can use the filtered dataframe as the source of the pivot_table function.

In this example we show the same Pivot Table as before, but with sales of Bikes only:

InĀ [19]:
pd.pivot_table(
    df.query('Product_Category == "Bikes"'),
    values='Revenue',
    index=['Country'],
    columns=['Year'],
    aggfunc=np.sum)
Out[19]:
Year 2011 2012 2013 2014 2015 2016
Country
Australia 2529914 2557511 3050202 1655516 4677632 2482043
Canada 789798 835527 589502 471155 901419 687602
France 946624 962153 959578 810165 1422728 1222877
Germany 833603 835188 1015098 1030713 1508441 1569739
United Kingdom 823481 834885 1262663 1202055 1900942 1832968
United States 3041468 3150719 2981744 2441639 4387921 3576921

And here sales of either Accessories or Clothing, without contemplating Bikes:

InĀ [20]:
pd.pivot_table(
    df.query('Product_Category in ["Accessories", "Clothing"]'),
    values='Revenue',
    index=['Country'],
    columns=['Year'],
    aggfunc=np.sum)
Out[20]:
Year 2013 2014 2015 2016
Country
Australia 1057268 1147326 1034707 1109940
Canada 854035 1000367 830535 975798
France 472292 600935 455200 580320
Germany 437606 671795 422650 653763
United Kingdom 606133 811989 583239 787841
United States 1953916 2309069 1898577 2233573

Step 7 - Other aggregationsĀ¶

image

It's very simple to change the aggregations used for our Pivot Tables. For example, using the average (np.mean):

InĀ [21]:
pd.pivot_table(
    df,
    values='Revenue',
    index=['Country'],
    columns=['Year'],
    aggfunc=np.mean)
Out[21]:
Year 2011 2012 2013 2014 2015 2016
Country
Australia 2945.185099 2977.311991 769.910028 485.424662 1070.728960 622.096121
Canada 3494.681416 3697.022124 467.769605 389.600741 561.229423 440.402436
France 4062.763948 4129.412017 624.997818 474.319328 819.697948 606.116639
Germany 3577.695279 3584.497854 644.500444 556.011757 856.739574 726.160026
United Kingdom 3107.475472 3150.509434 630.072825 562.739313 837.552596 732.274099
United States 3532.483159 3659.371661 579.915404 464.344443 738.632123 567.930212

Or even multiple aggregations at the same time:

InĀ [22]:
pd.pivot_table(
    df,
    values='Revenue',
    index=['Country'],
    columns=['Year'],
    aggfunc=[np.sum, np.mean])
Out[22]:
sum mean
Year 2011 2012 2013 2014 2015 2016 2011 2012 2013 2014 2015 2016
Country
Australia 2529914 2557511 4107470 2802842 5712339 3591983 2945.185099 2977.311991 769.910028 485.424662 1070.728960 622.096121
Canada 789798 835527 1443537 1471522 1731954 1663400 3494.681416 3697.022124 467.769605 389.600741 561.229423 440.402436
France 946624 962153 1431870 1411100 1877928 1803197 4062.763948 4129.412017 624.997818 474.319328 819.697948 606.116639
Germany 833603 835188 1452704 1702508 1931091 2223502 3577.695279 3584.497854 644.500444 556.011757 856.739574 726.160026
United Kingdom 823481 834885 1868796 2014044 2484181 2620809 3107.475472 3150.509434 630.072825 562.739313 837.552596 732.274099
United States 3041468 3150719 4935660 4750708 6286498 5810494 3532.483159 3659.371661 579.915404 464.344443 738.632123 567.930212

Or even different aggregations for different values:

InĀ [23]:
pd.pivot_table(
    df,
    values=['Revenue', 'Cost'],
    index=['Country'],
    columns=['Year'],
    aggfunc={
        'Revenue': np.sum,
        'Cost': np.mean
    })
Out[23]:
Cost Revenue
Year 2011 2012 2013 2014 2015 2016 2011 2012 2013 2014 2015 2016
Country
Australia 2118.100116 2143.678696 518.772259 313.018531 732.894658 412.218393 2529914 2557511 4107470 2802842 5712339 3591983
Canada 2151.787611 2281.247788 235.704796 194.171565 292.797797 225.637014 789798 835527 1443537 1471522 1731954 1663400
France 2872.321888 2928.334764 400.267569 298.958319 536.285901 391.930084 946624 962153 1431870 1411100 1877928 1803197
Germany 2416.802575 2420.206009 398.192547 332.240039 538.859361 444.854344 833603 835188 1452704 1702508 1931091 2223502
United Kingdom 1960.996226 1984.784906 364.114970 316.106454 494.522589 421.527522 823481 834885 1868796 2014044 2484181 2620809
United States 2352.851336 2433.019744 339.712137 264.863356 443.177887 333.132050 3041468 3150719 4935660 4750708 6286498 5810494

Step 8 - Adding derived columns to our Pivot TablesĀ¶

image

Adding derived columns to an Excel pivot table can be confusing. But with Pandas, a Pivot Table is just a regular DataFrame, which means we can manipulate it in whatever way we need.

a) let's create a new column ProfitĀ¶

This column contains the result of Revenue - Cost. The simplest approach here is to add a new column Profit to the original DataFrame and then use it to create the Pivot Table:

InĀ [24]:
df.head()
Out[24]:
Date Month Year Customer ID Customer Age Age Group Customer Gender Country State Product_Category Sub Category Product Frame Size Order Quantity Unit Cost Unit Price Cost Revenue Profit
0 2013-11-26 November 2013 11019 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike NaN 8 45 120 360 950 590
1 2015-11-26 November 2015 11019 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike NaN 8 45 120 360 950 590
2 2014-03-23 March 2014 11039 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 23 45 120 1035 2401 1366
3 2016-03-23 March 2016 11039 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 20 45 120 900 2088 1188
4 2014-05-15 May 2014 11046 47 Adults (35-64) F Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 4 45 120 180 418 238
InĀ [25]:
df['Profit'] = df['Revenue'] - df['Cost']
InĀ [26]:
df.head()
Out[26]:
Date Month Year Customer ID Customer Age Age Group Customer Gender Country State Product_Category Sub Category Product Frame Size Order Quantity Unit Cost Unit Price Cost Revenue Profit
0 2013-11-26 November 2013 11019 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike NaN 8 45 120 360 950 590
1 2015-11-26 November 2015 11019 19 Youth (<25) M Canada British Columbia Accessories Bike Racks Hitch Rack - 4-Bike NaN 8 45 120 360 950 590
2 2014-03-23 March 2014 11039 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 23 45 120 1035 2401 1366
3 2016-03-23 March 2016 11039 49 Adults (35-64) M Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 20 45 120 900 2088 1188
4 2014-05-15 May 2014 11046 47 Adults (35-64) F Australia New South Wales Accessories Bike Racks Hitch Rack - 4-Bike NaN 4 45 120 180 418 238

and now we can create our Pivot Table:

InĀ [27]:
pd.pivot_table(df, values=['Revenue', 'Cost', 'Profit'], index=['Country'], aggfunc=np.sum)
Out[27]:
Cost Profit Revenue
Country
Australia 14526029 6776030 21302059
Canada 4218442 3717296 7935738
France 5552590 2880282 8432872
Germany 5618601 3359995 8978596
United Kingdom 6232343 4413853 10646196
United States 16901903 11073644 27975547

b) Create the column Margin (calculated as Profit/Revenue)Ā¶

To do this, we can't modify the original DataFrame. We must first get a reference to the Pivot Table:

InĀ [28]:
table = pd.pivot_table(df, values=['Revenue', 'Cost', 'Profit'], index=['Country'], aggfunc=np.sum)
table
Out[28]:
Cost Profit Revenue
Country
Australia 14526029 6776030 21302059
Canada 4218442 3717296 7935738
France 5552590 2880282 8432872
Germany 5618601 3359995 8978596
United Kingdom 6232343 4413853 10646196
United States 16901903 11073644 27975547
InĀ [29]:
table['Margin'] = table['Profit'] / table['Revenue']
InĀ [30]:
table
Out[30]:
Cost Profit Revenue Margin
Country
Australia 14526029 6776030 21302059 0.318093
Canada 4218442 3717296 7935738 0.468425
France 5552590 2880282 8432872 0.341554
Germany 5618601 3359995 8978596 0.374223
United Kingdom 6232343 4413853 10646196 0.414594
United States 16901903 11073644 27975547 0.395833

Finally, export the Pivot TableĀ¶

We can simply export the generated table to open it with Excel later.

InĀ [31]:
table = pd.pivot_table(df, values=['Revenue'], index=['Country', 'State'], columns=['Year'], aggfunc=np.sum)
table
Out[31]:
Revenue
Year 2011 2012 2013 2014 2015 2016
Country State
Australia New South Wales 1175012.0 1171056.0 1697649.0 1229432.0 2354993.0 1575353.0
Queensland 622846.0 642944.0 975721.0 618695.0 1392224.0 813837.0
South Australia 138199.0 141107.0 307194.0 179033.0 413574.0 218167.0
Tasmania 72073.0 61490.0 134687.0 58222.0 187793.0 65919.0
Victoria 521784.0 540914.0 992219.0 717460.0 1363755.0 918707.0
Canada Alberta 16553.0 11638.0 7348.0 6161.0 7468.0 7635.0
British Columbia 773245.0 823889.0 1435633.0 1465361.0 1723997.0 1655765.0
Ontario NaN NaN 556.0 NaN 489.0 NaN
France Charente-Maritime 9431.0 9286.0 24848.0 12605.0 34984.0 14893.0
Essonne 110012.0 116489.0 167261.0 161419.0 210373.0 207978.0
Garonne (Haute) 22639.0 18047.0 23372.0 38383.0 35313.0 46152.0
Hauts de Seine 98740.0 97361.0 169771.0 149672.0 237402.0 184194.0
Loir et Cher 7811.0 7330.0 14652.0 14674.0 19184.0 18882.0
Loiret 42157.0 35924.0 60502.0 44273.0 89767.0 52549.0
Moselle 15772.0 21545.0 45622.0 58071.0 61184.0 64029.0
Nord 142918.0 141501.0 238297.0 213140.0 306578.0 282218.0
Pas de Calais 10383.0 8306.0 3380.0 11952.0 3288.0 13104.0
Seine (Paris) 164430.0 196499.0 292355.0 261186.0 384638.0 344171.0
Seine Saint Denis 135079.0 133580.0 194534.0 197854.0 242820.0 227767.0
Seine et Marne 57622.0 40888.0 46961.0 61394.0 63613.0 102807.0
Somme 20838.0 20477.0 11331.0 16607.0 11458.0 21468.0
Val d'Oise 8480.0 5653.0 32323.0 30401.0 38878.0 32560.0
Val de Marne 16445.0 12819.0 11950.0 16398.0 12393.0 23441.0
Yveline 83867.0 96448.0 94711.0 123071.0 126055.0 166984.0
Germany Bayern 117195.0 120829.0 193034.0 267226.0 278047.0 365654.0
Brandenburg 7268.0 11249.0 34355.0 33486.0 48520.0 34892.0
Hamburg 124618.0 137850.0 277548.0 281576.0 368779.0 368634.0
Hessen 230224.0 230652.0 316229.0 386423.0 409265.0 500598.0
Nordrhein-Westfalen 155142.0 141564.0 288261.0 365008.0 395595.0 459330.0
Saarland 199156.0 193044.0 343277.0 368789.0 430885.0 494394.0
United Kingdom England 823481.0 834885.0 1868796.0 2014044.0 2484181.0 2620809.0
United States Alabama NaN NaN 50.0 54.0 54.0 52.0
Arizona NaN NaN 2386.0 NaN 6993.0 NaN
California 1870178.0 1981377.0 3140474.0 2996200.0 4024256.0 3660135.0
Florida NaN NaN 2924.0 799.0 6593.0 698.0
Georgia 2208.0 1472.0 536.0 80.0 481.0 85.0
Illinois NaN NaN 5750.0 2185.0 10043.0 2163.0
Kentucky NaN NaN 108.0 1773.0 108.0 1767.0
Massachusetts NaN NaN 2042.0 NaN 2042.0 NaN
Minnesota NaN NaN NaN 758.0 NaN 708.0
Mississippi NaN NaN NaN 910.0 NaN 774.0
Missouri NaN NaN 891.0 NaN 848.0 NaN
Montana NaN NaN 1016.0 NaN 989.0 NaN
New York NaN NaN 592.0 2016.0 653.0 1996.0
North Carolina NaN NaN 213.0 NaN 192.0 NaN
Ohio NaN NaN 2337.0 2641.0 2234.0 2535.0
Oregon 384007.0 370269.0 608587.0 602854.0 739265.0 745428.0
South Carolina NaN NaN NaN 703.0 NaN 538.0
Texas 1268.0 1268.0 734.0 1440.0 778.0 1347.0
Utah NaN NaN 1782.0 NaN 2255.0 NaN
Virginia NaN NaN NaN 604.0 NaN 568.0
Washington 772644.0 790751.0 1162664.0 1137600.0 1486221.0 1391630.0
Wyoming 11163.0 5582.0 2574.0 91.0 2493.0 70.0
InĀ [32]:
table.to_excel('pivot_table.xlsx')
Notebooks AI
Notebooks AI Profile20060