Multiple DataFrames to Excel with Table of Contents¶
import pandas as pd
import numpy as np
Create Sample Database of DataFrames and Titles¶
First store the dataframes in a dictionary accessable by the DataFrame title to be used in table of contents.
database = {}
for i in range(5):
df = pd.DataFrame(data=np.random.rand(10,5),
index=pd.date_range(end=pd.datetime.now(), periods=10, freq='AS'),
columns=list(range(5)))
df = df.sort_index(ascending=False)
database['DataFrame {}'.format(i)] = df
preview your database is as expected¶
len(database)
database.keys()
and the first DataFrame in our database looks like:¶
database['DataFrame 0']
Example Method of Creating Links in Excel¶
In the output excel we want to have the first sheet as a table of contents, and each subsequent sheet corresponding to our DataFrames in the database.
The table of contents should be simply a list of our predefined DataFrame titles, which will link to the appropriate sheet.
And finally, at the top of each Excel sheet with a DataFrame, we want to add a link back to the table of contents for convenience.
IMPORTANT: First note that a hyperlink in an Excel Cell looks like =HYPERLINK("#0!A1","[TEXT]")
where #0!A1
corresponds to the sheet and cell being linked to, and [TEXT]
represents the text to be displayed in the Excel cell.
example link to first DataFrame in TOC¶
These links will be created for all DataFrames, saved by their sheet number when writing the DataFrame to excel, then compiled and written to the first sheet as TOC all at once.
sheet_name = 'DataFrame 0'
sheet_num = 1
xls_toc_label = '=HYPERLINK("#{}!A1","{}")'.format(str(sheet_num),sheet_name)
print(xls_toc_label)
example link to TOC in first DataFrame¶
The below code may be more efficient in a different way - alternative solutions are welcomed!
df = database['DataFrame 0']
# transpose dataframe
df = df.T
# add link as dummy column
df['dummy'] = '=HYPERLINK("#0!A1","Go Back To Table of Contents")'
# set link to index of dataframe
df = df.set_index('dummy', append=True)
# move link to first level of index
df = df.reorder_levels([df.index.nlevels-1] + list(range(df.index.nlevels-1)), axis=0)
# remove index names by setting all to None
df.index.names = [None]*len(df.index.names)
# transpose dataframe back to original where link becomes first level in header
df = df.T
confirm we now have a link as the first level in the DataFrame's header¶
df
Apply Method to All DataFrames - Write to Excel¶
# instantiate an Excel writer
xls_writer = pd.ExcelWriter('data/pandas_dataframes_to_excel_with_toc.xlsx')
# save toc data as dictionary with first entry as TOC
xls_toc = {'0': 'Table of Contents'}
# create toc placeholder as first sheet in Excel
df = pd.DataFrame()
df.to_excel(xls_writer, '0')
# iterate database, write DataFrames, and save toc data
for i,(title,df) in enumerate(database.items()):
# save DataFrame title and sheet link in TOC dictionary
sheet_name = title
sheet_num = i+1
xls_toc[sheet_num] = '=HYPERLINK("#{}!A1","{}")'.format(str(sheet_num),sheet_name)
# add header level to DataFrame as link to TOC
df = df.T
df['dummy'] = '=HYPERLINK("#0!A1","Go Back To Table of Contents")'
df = df.set_index('dummy', append=True)
df = df.reorder_levels([df.index.nlevels-1] + list(range(df.index.nlevels-1)), axis=0)
df.index.names = [None]*len(df.index.names)
df = df.T
# write DataFrame to Excel
df.to_excel(xls_writer, str(sheet_num))
# compile TOC and write as first sheet
df_toc = pd.DataFrame.from_dict(xls_toc, orient='index')
df_toc.columns = ['Title']
df_toc.index.naes = ['Sheet']
df_toc.to_excel(xls_writer, '0', index=True)
# save and close Excel
xls_writer.save()
xls_writer.close()
Result¶
As you can see below, the links are correctly applied and working.