Pandas Apply Some Function¶
Below is a basic tutorial on applying some function to a Pandas DataFrame in various ways. For more advanced stuff, you should really educate yourself on the Pandas concept of "split-apply-combine"
We consider two scenarios, using two different data formats. Using the Pandas definitions, we consider these as "stacked" and "unstacked". For more on this, see Pandas docs on reshaping data
Enter some function: Compound Annual Growth Rate (CAGR)
$$ {CAGR={Last/First}^{1/n}-1} $$
import pandas as pd
import numpy as np
Scenario 1: Unstacked¶
df = pd.DataFrame(data=np.random.rand(36,4),
index=pd.date_range(end=pd.datetime.now(), periods=36, freq='MS'),
columns=['A', 'B', 'C', 'D'])
df.shape
df.index.names = ['year_month_start']
df.columns.names = ['category']
df.head()
df.tail()
Unstacked: Compute per Category¶
Compute over total history of our data.
In this case, make sure your data is sorted!
df = df.sort_index()
cagr = (df.iloc[-1] / df.iloc[0]) ** (1/len(df)) - 1
cagr = cagr.to_frame()
cagr.columns = ['cagr']
cagr
Unstacked: Compute per Category and Time Period¶
Group by year and compute growth over months.
gb = df.groupby(pd.TimeGrouper(freq='A'))
def_cagr = lambda d: (d.iloc[-1] / d.iloc[0]) ** (1/len(d)) - 1
cagr = gb.apply(def_cagr)
cagr.index.names = ['year_end']
cagr
Scenario 2: Stacked¶
What if our data looked different? Below is the same data in a different format, where the category and date make up a multi-index.
df = df.stack().swaplevel().sort_index().to_frame()
df.index = df.index.set_names('category', level=0)
df.columns = ['data']
df.head()
df.tail()
Stacked: Compute per Category¶
gb = df.reset_index().groupby('category')
cagr = (gb.nth(-1).data / gb.nth(0).data) ** (1/gb.size()) - 1
cagr = cagr.to_frame()
cagr.columns = ['cagr']
cagr
Stacked: Compute per Category and Time Period¶
gb = df.reset_index(level=0).groupby(['category', pd.TimeGrouper(freq='A')])
cagr = (gb.nth(-1).data / gb.nth(0).data) ** (1/gb.size()) - 1
cagr = cagr.to_frame()
cagr.columns = ['cagr']
cagr.index.names = ['category', 'year_end']
cagr