visualize_ns_data_in_tableau

Introduction

As I commute daily with the train, I thought a simple data project would be to visualize my travel history. This can be used as a template or inspiration for others.

NS is the train company in the netherlands. You can download your data by going to https://www.ns.nl/mijnns#/reishistorie and selecting Periode > CSV > Download.

We will visualize the trajectories in this data quickly in Tabelau. To map the trips geographically we use the methodology outline by Tableau here https://onlinehelp.tableau.com/current/pro/desktop/en-us/maps_howto_origin_destination.htm

To get the data in this format, follow this notebook.

Lastly, to make things simple, I am not going to try and visualize the specific routes I took. However, I have found a usable geographic dataset (shape file) of Netherlands train routes (from here), and therefor we will plot this next to my trajectories simply for reference. For more on how to import spatial files into Tableau see here.

Dependencies

In [1]:
import pandas as pd

Import NS Dataset

In [2]:
df = pd.read_csv('data/reistransacties-a21a7c18-b5e2-4f67-8a68-1c76c3ebdda8-1555156489-25d6adc3-1680-49a1-bea7-4f7d15eedf35.csv')
In [3]:
df.head()
Out[3]:
Datum Check in Vertrek Check uit Bestemming Af Bij Transactie Kl Product Prive/ Zakelijk Opmerking
0 15-01-2019 17:47 Amsterdam Sloterdijk 18:29 Leiden Centraal €0,00 €0,00 Reis 2.0 Traject Vrij (maand) zakelijk NaN
1 16-01-2019 07:58 Leiden Centraal 08:36 Amsterdam Sloterdijk €0,00 €0,00 Reis 2.0 Traject Vrij (maand) zakelijk NaN
2 16-01-2019 18:09 Amsterdam Sloterdijk 18:43 Leiden Centraal €0,00 €0,00 Reis 2.0 Traject Vrij (maand) zakelijk NaN
3 17-01-2019 08:25 Leiden Centraal 08:27 Leiden Centraal €0,00 €0,00 Reis 2.0 Traject Vrij (maand) zakelijk NaN
4 17-01-2019 08:30 Leiden Centraal 09:14 Amsterdam Sloterdijk €0,00 €0,00 Reis 2.0 Traject Vrij (maand) zakelijk NaN

Create Unique Path ID for each Trip

In [4]:
df.loc[:,'path_id'] = df.Vertrek+'-'+df.Bestemming

Split Dataset in Two, one for each Location in Path

In [5]:
df1 = df.copy()

df1 = df1[df1.Transactie == 'Reis'][['Datum','Check in','Vertrek','Check uit','path_id']]

df1 = df1.rename(columns={'Vertrek':'Location'})

df1.loc[:,'path_order'] = 1
In [6]:
df1.head()
Out[6]:
Datum Check in Location Check uit path_id path_order
0 15-01-2019 17:47 Amsterdam Sloterdijk 18:29 Amsterdam Sloterdijk-Leiden Centraal 1
1 16-01-2019 07:58 Leiden Centraal 08:36 Leiden Centraal-Amsterdam Sloterdijk 1
2 16-01-2019 18:09 Amsterdam Sloterdijk 18:43 Amsterdam Sloterdijk-Leiden Centraal 1
3 17-01-2019 08:25 Leiden Centraal 08:27 Leiden Centraal-Leiden Centraal 1
4 17-01-2019 08:30 Leiden Centraal 09:14 Leiden Centraal-Amsterdam Sloterdijk 1
In [7]:
df2 = df.copy()

df2 = df2[df2.Transactie == 'Reis'][['Datum','Check in','Bestemming','Check uit','path_id']]

df2 = df2.rename(columns={'Bestemming':'Location'})

df2.loc[:,'path_order'] = 2
In [8]:
df2.head()
Out[8]:
Datum Check in Location Check uit path_id path_order
0 15-01-2019 17:47 Leiden Centraal 18:29 Amsterdam Sloterdijk-Leiden Centraal 2
1 16-01-2019 07:58 Amsterdam Sloterdijk 08:36 Leiden Centraal-Amsterdam Sloterdijk 2
2 16-01-2019 18:09 Leiden Centraal 18:43 Amsterdam Sloterdijk-Leiden Centraal 2
3 17-01-2019 08:25 Leiden Centraal 08:27 Leiden Centraal-Leiden Centraal 2
4 17-01-2019 08:30 Amsterdam Sloterdijk 09:14 Leiden Centraal-Amsterdam Sloterdijk 2

Create Coordinates of Station Locations

In [9]:
list(df.Bestemming.replace(' ',pd.np.nan).dropna().unique())
Out[9]:
['Leiden Centraal',
 'Amsterdam Sloterdijk',
 'Eindhoven',
 'Den Haag Centraal',
 'Amsterdam Zuid',
 'Castricum',
 'Amsterdam Centraal',
 'Schiphol Airport']
In [10]:
# create this list in Excel

Import Station Coordinates

In [11]:
df_coordinates = pd.read_excel('data/NS_Master.xlsx', sheet_name='Coordinates')
In [12]:
# here I print all coordinates i have already collected for your reference
df_coordinates
Out[12]:
Location Latitude Longitude
0 Castricum 52.545611 4.659252
1 Beverwijk 52.478269 4.655210
2 Amsterdam Centraal 52.379128 4.900272
3 Amsterdam Zuid 52.339015 4.874336
4 Groningen 53.211030 6.564066
5 Duivendrecht 52.323604 4.936506
6 Leiden Centraal 52.166404 4.482089
7 Rotterdam Centraal 51.923955 4.470000
8 Diemen Zuid 52.330255 4.954847
9 Delft 52.007545 4.356530
10 Rotterdam Blaak 51.919852 4.489156
11 Amsterdam Sloterdijk 52.386785 4.846802
12 Den Haag Centraal 52.081261 4.323975
13 Amsterdam Lelylaan 52.357870 4.834040
14 Den Haag HS 52.070164 4.322235
15 Eindhoven 51.443391 5.479171

JOIN Coordinates and UNION

In [13]:
df1 = pd.merge(left=df1, right=df_coordinates, how='left', on='Location')
In [14]:
df2 = pd.merge(left=df2, right=df_coordinates, how='left', on='Location')
In [15]:
df_out = pd.concat([df1,df2], axis=0)
In [16]:
df_out.head()
Out[16]:
Datum Check in Location Check uit path_id path_order Latitude Longitude
0 15-01-2019 17:47 Amsterdam Sloterdijk 18:29 Amsterdam Sloterdijk-Leiden Centraal 1 52.386785 4.846802
1 16-01-2019 07:58 Leiden Centraal 08:36 Leiden Centraal-Amsterdam Sloterdijk 1 52.166404 4.482089
2 16-01-2019 18:09 Amsterdam Sloterdijk 18:43 Amsterdam Sloterdijk-Leiden Centraal 1 52.386785 4.846802
3 17-01-2019 08:25 Leiden Centraal 08:27 Leiden Centraal-Leiden Centraal 1 52.166404 4.482089
4 17-01-2019 08:30 Leiden Centraal 09:14 Leiden Centraal-Amsterdam Sloterdijk 1 52.166404 4.482089

Export

In [17]:
df_out.to_csv("data/ns_tableau.csv", sep=',')

Tableau Dashboard

In [1]:
%%html
<div class="d-flex justify-content-center">
    <div class='tableauPlaceholder' id='viz1556362972158' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;NS&#47;NSVisualizeMijnReisHistorie&#47;NSReisHistorieDashboard&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='NSVisualizeMijnReisHistorie&#47;NSReisHistorieDashboard' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;NS&#47;NSVisualizeMijnReisHistorie&#47;NSReisHistorieDashboard&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1556362972158');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='2027px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>
</div>