pandas_multiindex

Pandas MultiIndex

The below examples are my summarized notes based on the Pandas docs on advanced indexing. The documentation on Pandas is probably the best in the whole Python community. If you want to use the full power of Pandas yourself, start reading!

In [1]:
import pandas as pd
import numpy as np

Data

In [2]:
categories = ['One', 'Two', 'Three', 'Four']
years = pd.date_range(end=pd.datetime.now(), periods=4, freq='A').year
In [3]:
idx = pd.MultiIndex.from_product([categories, years], names=['category','year'])
In [4]:
df = pd.DataFrame(data=np.random.rand(16,8),
                  index=idx,
                  columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])
In [5]:
df
Out[5]:
A B C D E F G H
category year
One 2013 0.360327 0.985771 0.245880 0.685196 0.349286 0.451878 0.791817 0.729653
2014 0.564336 0.012453 0.021836 0.100025 0.566593 0.705033 0.623865 0.278955
2015 0.054157 0.404670 0.845568 0.552805 0.025161 0.837562 0.016496 0.386726
2016 0.709684 0.489236 0.892152 0.333964 0.536528 0.243267 0.686957 0.188799
Two 2013 0.570331 0.982830 0.826117 0.592185 0.148816 0.048780 0.135223 0.509763
2014 0.234321 0.395749 0.363677 0.823273 0.379848 0.425719 0.453727 0.916822
2015 0.748280 0.371602 0.066802 0.273414 0.989124 0.453888 0.383820 0.113457
2016 0.769954 0.525683 0.490095 0.035429 0.315791 0.545273 0.720569 0.818346
Three 2013 0.106577 0.380808 0.927207 0.287780 0.492644 0.086994 0.889591 0.933095
2014 0.561857 0.506371 0.962382 0.163667 0.484239 0.374857 0.094392 0.643644
2015 0.197825 0.710943 0.390355 0.811361 0.581469 0.154026 0.469202 0.669786
2016 0.667968 0.503369 0.187386 0.908971 0.998324 0.833267 0.655844 0.209545
Four 2013 0.980583 0.585293 0.011230 0.427428 0.576045 0.964909 0.971147 0.883435
2014 0.040763 0.927800 0.615672 0.731586 0.768721 0.078744 0.984370 0.565257
2015 0.862900 0.051075 0.831173 0.845419 0.022534 0.754840 0.002456 0.634279
2016 0.458775 0.018586 0.433801 0.531906 0.608046 0.599256 0.986199 0.039592

MultiIndex Columns

I recomend creating you multi-header as a list of tuples, then using pd.MultiIndex.from_tuples, set it as the DataFrame columns.

In [6]:
columns = [('Group1' if i < 4 else 'Group2',col) for i,col in enumerate(df.columns)]
In [7]:
idx = pd.MultiIndex.from_tuples(columns, names=['group','subgroup'])
In [8]:
df.columns = idx
In [9]:
df
Out[9]:
group Group1 Group2
subgroup A B C D E F G H
category year
One 2013 0.360327 0.985771 0.245880 0.685196 0.349286 0.451878 0.791817 0.729653
2014 0.564336 0.012453 0.021836 0.100025 0.566593 0.705033 0.623865 0.278955
2015 0.054157 0.404670 0.845568 0.552805 0.025161 0.837562 0.016496 0.386726
2016 0.709684 0.489236 0.892152 0.333964 0.536528 0.243267 0.686957 0.188799
Two 2013 0.570331 0.982830 0.826117 0.592185 0.148816 0.048780 0.135223 0.509763
2014 0.234321 0.395749 0.363677 0.823273 0.379848 0.425719 0.453727 0.916822
2015 0.748280 0.371602 0.066802 0.273414 0.989124 0.453888 0.383820 0.113457
2016 0.769954 0.525683 0.490095 0.035429 0.315791 0.545273 0.720569 0.818346
Three 2013 0.106577 0.380808 0.927207 0.287780 0.492644 0.086994 0.889591 0.933095
2014 0.561857 0.506371 0.962382 0.163667 0.484239 0.374857 0.094392 0.643644
2015 0.197825 0.710943 0.390355 0.811361 0.581469 0.154026 0.469202 0.669786
2016 0.667968 0.503369 0.187386 0.908971 0.998324 0.833267 0.655844 0.209545
Four 2013 0.980583 0.585293 0.011230 0.427428 0.576045 0.964909 0.971147 0.883435
2014 0.040763 0.927800 0.615672 0.731586 0.768721 0.078744 0.984370 0.565257
2015 0.862900 0.051075 0.831173 0.845419 0.022534 0.754840 0.002456 0.634279
2016 0.458775 0.018586 0.433801 0.531906 0.608046 0.599256 0.986199 0.039592

Slicing

Pandas already has great documenation on using slicers. Below if my simplified TLDR version.

To allow slicing, you must first sort the index.

For MultiIndex-ed objects to be indexed & sliced effectively, they need to be sorted. As with any index, you can use sort_index. - Pandas docs

In [10]:
df = df.sort_index()

For the most convenient and logical slicing, use the builtin pd.IndexSlice tool

In [11]:
idx = pd.IndexSlice

Example slice on all levels avialble.

In [12]:
df.loc[idx[['One', 'Two'],[2014,2015]],idx[['Group1', 'Group2'],['C','D','E']]]
Out[12]:
group Group1 Group2
subgroup C D E
category year
One 2014 0.021836 0.100025 0.566593
2015 0.845568 0.552805 0.025161
Two 2014 0.363677 0.823273 0.379848
2015 0.066802 0.273414 0.989124

Slice on partial indices, and use : for accessing all others. Below is warning from the docs.

Warning You should specify all axes in the .loc specifier, meaning the indexer for the index and for the columns. There are some ambiguous cases where the passed indexer could be mis-interpreted as indexing both axes, rather than into say the MuliIndex for the rows. - Pandas docs

In [13]:
df.loc[idx[:,[2016]],idx['Group2',:]]
Out[13]:
group Group2
subgroup E F G H
category year
Four 2016 0.608046 0.599256 0.986199 0.039592
One 2016 0.536528 0.243267 0.686957 0.188799
Three 2016 0.998324 0.833267 0.655844 0.209545
Two 2016 0.315791 0.545273 0.720569 0.818346

Reindexing

I recommend first creating a new index from your individually altered components, then reindex all levels at one time.

In [14]:
new_categories = list(df.index.get_level_values('category').unique()) + ['Zero', 'Five']
new_categories
Out[14]:
['Four', 'One', 'Three', 'Two', 'Zero', 'Five']
In [15]:
new_years = list(range(df.index.get_level_values('year').min()-2, 
                       df.index.get_level_values('year').max()+2))
new_years
Out[15]:
[2011, 2012, 2013, 2014, 2015, 2016, 2017]
In [16]:
idx = pd.MultiIndex.from_product([new_categories, new_years])
In [17]:
df.reindex(idx, fill_value=0)
Out[17]:
group Group1 Group2
subgroup A B C D E F G H
Four 2011 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2012 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2013 0.980583 0.585293 0.011230 0.427428 0.576045 0.964909 0.971147 0.883435
2014 0.040763 0.927800 0.615672 0.731586 0.768721 0.078744 0.984370 0.565257
2015 0.862900 0.051075 0.831173 0.845419 0.022534 0.754840 0.002456 0.634279
2016 0.458775 0.018586 0.433801 0.531906 0.608046 0.599256 0.986199 0.039592
2017 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
One 2011 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2012 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2013 0.360327 0.985771 0.245880 0.685196 0.349286 0.451878 0.791817 0.729653
2014 0.564336 0.012453 0.021836 0.100025 0.566593 0.705033 0.623865 0.278955
2015 0.054157 0.404670 0.845568 0.552805 0.025161 0.837562 0.016496 0.386726
2016 0.709684 0.489236 0.892152 0.333964 0.536528 0.243267 0.686957 0.188799
2017 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Three 2011 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2012 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2013 0.106577 0.380808 0.927207 0.287780 0.492644 0.086994 0.889591 0.933095
2014 0.561857 0.506371 0.962382 0.163667 0.484239 0.374857 0.094392 0.643644
2015 0.197825 0.710943 0.390355 0.811361 0.581469 0.154026 0.469202 0.669786
2016 0.667968 0.503369 0.187386 0.908971 0.998324 0.833267 0.655844 0.209545
2017 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Two 2011 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2012 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2013 0.570331 0.982830 0.826117 0.592185 0.148816 0.048780 0.135223 0.509763
2014 0.234321 0.395749 0.363677 0.823273 0.379848 0.425719 0.453727 0.916822
2015 0.748280 0.371602 0.066802 0.273414 0.989124 0.453888 0.383820 0.113457
2016 0.769954 0.525683 0.490095 0.035429 0.315791 0.545273 0.720569 0.818346
2017 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Zero 2011 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2012 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2013 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2014 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2015 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2016 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2017 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Five 2011 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2012 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2013 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2014 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2015 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2016 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2017 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000