pandas_apply_some_function

Pandas Apply Some Function

Below is a basic tutorial on applying some function to a Pandas DataFrame in various ways. For more advanced stuff, you should really educate yourself on the Pandas concept of "split-apply-combine"

We consider two scenarios, using two different data formats. Using the Pandas definitions, we consider these as "stacked" and "unstacked". For more on this, see Pandas docs on reshaping data


Enter some function: Compound Annual Growth Rate (CAGR)

$$ {CAGR={Last/First}^{1/n}-1} $$

https://en.wikipedia.org/wiki/Compound_annual_growth_rate

In [1]:
import pandas as pd
import numpy as np

Scenario 1: Unstacked

In [2]:
df = pd.DataFrame(data=np.random.rand(36,4),
                  index=pd.date_range(end=pd.datetime.now(), periods=36, freq='MS'),
                  columns=['A', 'B', 'C', 'D'])
In [3]:
df.shape
Out[3]:
(36, 4)
In [4]:
df.index.names = ['year_month_start']
df.columns.names = ['category']
In [5]:
df.head()
Out[5]:
category A B C D
year_month_start
2014-11-01 17:59:06.080579 0.064960 0.912398 0.414499 0.703432
2014-12-01 17:59:06.080579 0.483200 0.903033 0.277916 0.951477
2015-01-01 17:59:06.080579 0.312150 0.052920 0.973052 0.227768
2015-02-01 17:59:06.080579 0.992858 0.858479 0.018013 0.207192
2015-03-01 17:59:06.080579 0.024849 0.876164 0.431758 0.621175
In [6]:
df.tail()
Out[6]:
category A B C D
year_month_start
2017-06-01 17:59:06.080579 0.939403 0.091247 0.561496 0.626564
2017-07-01 17:59:06.080579 0.209496 0.991615 0.318620 0.781343
2017-08-01 17:59:06.080579 0.564252 0.774313 0.164976 0.701979
2017-09-01 17:59:06.080579 0.914075 0.659888 0.714164 0.472263
2017-10-01 17:59:06.080579 0.270031 0.884495 0.501254 0.037834

Unstacked: Compute per Category

Compute over total history of our data.

In this case, make sure your data is sorted!

In [16]:
df = df.sort_index()
In [17]:
cagr = (df.iloc[-1] / df.iloc[0]) ** (1/len(df)) - 1
In [18]:
cagr = cagr.to_frame()
cagr.columns = ['cagr']
In [19]:
cagr
Out[19]:
cagr
category
A 0.040371
B -0.000862
C 0.005293
D -0.077980

Unstacked: Compute per Category and Time Period

Group by year and compute growth over months.

In [10]:
gb = df.groupby(pd.TimeGrouper(freq='A'))
In [24]:
def_cagr = lambda d: (d.iloc[-1] / d.iloc[0]) ** (1/len(d)) - 1
In [25]:
cagr = gb.apply(def_cagr)
In [26]:
cagr.index.names = ['year_end']
In [27]:
cagr
Out[27]:
category A B C D
year_end
2014-12-31 1.727354 -0.005145 -0.181168 0.163023
2015-12-31 0.071589 0.138647 -0.099803 0.015065
2016-12-31 -0.087442 -0.012366 0.013480 -0.032596
2017-12-31 -0.086492 0.100415 -0.060074 -0.214525

Scenario 2: Stacked

What if our data looked different? Below is the same data in a different format, where the category and date make up a multi-index.

In [28]:
df = df.stack().swaplevel().sort_index().to_frame()
In [29]:
df.index = df.index.set_names('category', level=0)
In [30]:
df.columns = ['data']
In [31]:
df.head()
Out[31]:
data
category year_month_start
A 2014-11-01 17:59:06.080579 0.064960
2014-12-01 17:59:06.080579 0.483200
2015-01-01 17:59:06.080579 0.312150
2015-02-01 17:59:06.080579 0.992858
2015-03-01 17:59:06.080579 0.024849
In [32]:
df.tail()
Out[32]:
data
category year_month_start
D 2017-06-01 17:59:06.080579 0.626564
2017-07-01 17:59:06.080579 0.781343
2017-08-01 17:59:06.080579 0.701979
2017-09-01 17:59:06.080579 0.472263
2017-10-01 17:59:06.080579 0.037834

Stacked: Compute per Category

In [33]:
gb = df.reset_index().groupby('category')
In [34]:
cagr = (gb.nth(-1).data / gb.nth(0).data) ** (1/gb.size()) - 1
In [35]:
cagr = cagr.to_frame()
cagr.columns = ['cagr']
In [36]:
cagr
Out[36]:
cagr
category
A 0.040371
B -0.000862
C 0.005293
D -0.077980

Stacked: Compute per Category and Time Period

In [37]:
gb = df.reset_index(level=0).groupby(['category', pd.TimeGrouper(freq='A')])
In [38]:
cagr = (gb.nth(-1).data / gb.nth(0).data) ** (1/gb.size()) - 1
In [39]:
cagr = cagr.to_frame()
cagr.columns = ['cagr']
cagr.index.names = ['category', 'year_end']
In [40]:
cagr
Out[40]:
cagr
category year_end
A 2014-12-31 1.727354
2015-12-31 0.071589
2016-12-31 -0.087442
2017-12-31 -0.086492
B 2014-12-31 -0.005145
2015-12-31 0.138647
2016-12-31 -0.012366
2017-12-31 0.100415
C 2014-12-31 -0.181168
2015-12-31 -0.099803
2016-12-31 0.013480
2017-12-31 -0.060074
D 2014-12-31 0.163023
2015-12-31 0.015065
2016-12-31 -0.032596
2017-12-31 -0.214525