pandas_dataframes_to_excel_with_toc

Multiple DataFrames to Excel with Table of Contents

In [1]:
import pandas as pd
import numpy as np

Create Sample Database of DataFrames and Titles

First store the dataframes in a dictionary accessable by the DataFrame title to be used in table of contents.

In [2]:
database = {}
for i in range(5):
    df = pd.DataFrame(data=np.random.rand(10,5),
                      index=pd.date_range(end=pd.datetime.now(), periods=10, freq='AS'),
                      columns=list(range(5)))
    df = df.sort_index(ascending=False)
    database['DataFrame {}'.format(i)] = df

preview your database is as expected

In [3]:
len(database)
Out[3]:
5
In [4]:
database.keys()
Out[4]:
dict_keys(['DataFrame 0', 'DataFrame 1', 'DataFrame 2', 'DataFrame 3', 'DataFrame 4'])

and the first DataFrame in our database looks like:

In [5]:
database['DataFrame 0']
Out[5]:
0 1 2 3 4
2018-01-01 15:26:49.656255 0.159358 0.092781 0.123526 0.382781 0.258993
2017-01-01 15:26:49.656255 0.717533 0.169581 0.956188 0.531756 0.556561
2016-01-01 15:26:49.656255 0.586399 0.897709 0.820273 0.997370 0.906336
2015-01-01 15:26:49.656255 0.458850 0.407505 0.877071 0.472491 0.974023
2014-01-01 15:26:49.656255 0.156699 0.322385 0.566123 0.760208 0.577940
2013-01-01 15:26:49.656255 0.740037 0.851457 0.243506 0.242941 0.927949
2012-01-01 15:26:49.656255 0.651608 0.817087 0.927313 0.783384 0.671474
2011-01-01 15:26:49.656255 0.970842 0.564952 0.298710 0.045744 0.342489
2010-01-01 15:26:49.656255 0.570517 0.930760 0.209710 0.633520 0.833400
2009-01-01 15:26:49.656255 0.948057 0.683936 0.945394 0.635590 0.415048

In the output excel we want to have the first sheet as a table of contents, and each subsequent sheet corresponding to our DataFrames in the database.

The table of contents should be simply a list of our predefined DataFrame titles, which will link to the appropriate sheet.

And finally, at the top of each Excel sheet with a DataFrame, we want to add a link back to the table of contents for convenience.

IMPORTANT: First note that a hyperlink in an Excel Cell looks like =HYPERLINK("#0!A1","[TEXT]") where #0!A1 corresponds to the sheet and cell being linked to, and [TEXT] represents the text to be displayed in the Excel cell.

These links will be created for all DataFrames, saved by their sheet number when writing the DataFrame to excel, then compiled and written to the first sheet as TOC all at once.

In [6]:
sheet_name = 'DataFrame 0'
sheet_num = 1
xls_toc_label = '=HYPERLINK("#{}!A1","{}")'.format(str(sheet_num),sheet_name)
print(xls_toc_label)
=HYPERLINK("#1!A1","DataFrame 0")

The below code may be more efficient in a different way - alternative solutions are welcomed!

In [7]:
df = database['DataFrame 0']
In [8]:
# transpose dataframe
df = df.T
# add link as dummy column
df['dummy'] = '=HYPERLINK("#0!A1","Go Back To Table of Contents")'
# set link to index of dataframe
df = df.set_index('dummy', append=True)
# move link to first level of index
df = df.reorder_levels([df.index.nlevels-1] + list(range(df.index.nlevels-1)), axis=0)
# remove index names by setting all to None
df.index.names = [None]*len(df.index.names)
# transpose dataframe back to original where link becomes first level in header
df = df.T
In [9]:
df
Out[9]:
=HYPERLINK("#0!A1","Go Back To Table of Contents")
0 1 2 3 4
2018-01-01 15:26:49.656255 0.159358 0.092781 0.123526 0.382781 0.258993
2017-01-01 15:26:49.656255 0.717533 0.169581 0.956188 0.531756 0.556561
2016-01-01 15:26:49.656255 0.586399 0.897709 0.820273 0.997370 0.906336
2015-01-01 15:26:49.656255 0.458850 0.407505 0.877071 0.472491 0.974023
2014-01-01 15:26:49.656255 0.156699 0.322385 0.566123 0.760208 0.577940
2013-01-01 15:26:49.656255 0.740037 0.851457 0.243506 0.242941 0.927949
2012-01-01 15:26:49.656255 0.651608 0.817087 0.927313 0.783384 0.671474
2011-01-01 15:26:49.656255 0.970842 0.564952 0.298710 0.045744 0.342489
2010-01-01 15:26:49.656255 0.570517 0.930760 0.209710 0.633520 0.833400
2009-01-01 15:26:49.656255 0.948057 0.683936 0.945394 0.635590 0.415048

Apply Method to All DataFrames - Write to Excel

In [21]:
# instantiate an Excel writer
xls_writer = pd.ExcelWriter('data/pandas_dataframes_to_excel_with_toc.xlsx')

# save toc data as dictionary with first entry as TOC
xls_toc = {'0': 'Table of Contents'}

# create toc placeholder as first sheet in Excel
df = pd.DataFrame()
df.to_excel(xls_writer, '0')

# iterate database, write DataFrames, and save toc data
for i,(title,df) in enumerate(database.items()):
    # save DataFrame title and sheet link in TOC dictionary
    sheet_name = title
    sheet_num = i+1
    xls_toc[sheet_num] = '=HYPERLINK("#{}!A1","{}")'.format(str(sheet_num),sheet_name)

    # add header level to DataFrame as link to TOC
    df = df.T
    df['dummy'] = '=HYPERLINK("#0!A1","Go Back To Table of Contents")'
    df = df.set_index('dummy', append=True)
    df = df.reorder_levels([df.index.nlevels-1] + list(range(df.index.nlevels-1)), axis=0)
    df.index.names = [None]*len(df.index.names)
    df = df.T

    # write DataFrame to Excel
    df.to_excel(xls_writer, str(sheet_num))

# compile TOC and write as first sheet
df_toc = pd.DataFrame.from_dict(xls_toc, orient='index')
df_toc.columns = ['Title']
df_toc.index.naes = ['Sheet']
df_toc.to_excel(xls_writer, '0', index=True)

# save and close Excel
xls_writer.save()
xls_writer.close()

Result

As you can see below, the links are correctly applied and working.

Excel TOC

Excel DataFrame