Pandas Explode Column¶
This notebook demonstrates how to explode a column with nested values, either in CSV format or a dictionary (e.g. JSON).
NOTE: Method 3 of the CSV explosdion is the most efficient, and skip down to the Explode Dict Column for a super efficient way of exploding a dictionary of values in a Pandas DataFrame.
In [1]:
import pandas as pd
Data¶
In [2]:
df = pd.DataFrame({'csv' : ['value1',
'value1,value2',
'value1,value2,value3,value4,value5,value6',
'value1',
'value1,value2,value3,value4'],
'json' : [{'key1':'value1', 'key2':'value2.1', 'key3':'value3.1', 'key4':'value4.1'},
{'key1':'value2', 'key2':'value2.2', 'key3':'value3.2', 'key4':'value4.2'},
{'key1':'value3', 'key2':'value2.3', 'key3':'value3.3'},
{'key1':'value4', 'key2':'value2.4'},
{'key1':'value5', 'key2':'value2.5'}]
})
In [3]:
df.index.names = ['id']
In [4]:
df
Out[4]:
Explode CSV Column¶
Method 1¶
In [5]:
data = {i:row.csv.split(',') for i,row in df.iterrows()}
In [6]:
df_exploded = pd.DataFrame.from_dict(data, orient='index').stack().to_frame()
In [7]:
df_exploded.columns = ['value']
df_exploded.index.names = ['id', 'csv_sequence']
In [8]:
df_exploded
Out[8]:
Method 2¶
In [9]:
df_exploded = df['csv'].str.split(pat=',', expand=True).stack().to_frame()
In [10]:
df_exploded.columns = ['value']
df_exploded.index.names = ['id', 'csv_sequence']
In [11]:
df_exploded
Out[11]:
Method 3¶
In [12]:
df_exploded = df.csv.apply(lambda x: pd.Series(x.split(','))).stack().to_frame()
In [13]:
df_exploded.columns = ['value']
df_exploded.index.names = ['id', 'csv_sequence']
In [14]:
df_exploded
Out[14]:
Join¶
In [15]:
df.join(df_exploded, how='left').reset_index(level='csv_sequence')
Out[15]:
Explode Dict Column¶
In [16]:
df
Out[16]:
In [17]:
df.json.apply(pd.Series)
Out[17]:
In [18]:
df.join(df.json.apply(pd.Series), how='left')
Out[18]: