Strava API¶
The official Strava API docs are here https://developers.strava.com/docs/
First go and create a free application to generate your access token. Go here https://www.strava.com/settings/api
I have already done this and defined my access token as an environment variable, called STRAVA_ACCESS_TOKEN
.
import os
strava_access_token = os.environ['STRAVA_ACCESS_TOKEN']
Swagger Client¶
You interface with Strava's API via Swagger, and the OpenAPI Specification.
Using a Swagger API comes with a few prequequisites, including the Swagger CodeGen Command Line Interface (CLI), and the Java Runtime Environment (JRE).
To get the Swagger CLI on Windows, i did the following:
- create a folder in the same location as this notebook, called
swagger
- from there, run the following command via
Windows PowerShell
Invoke-WebRequest -OutFile swagger-codegen-cli.jar http://central.maven.org/maven2/io/swagger/swagger-codegen-cli/2.3.1/swagger-codegen-cli-2.3.1.jar
After you have Swagger CodeGen and JRE installed, test everything is working with the following:
# (Windows)
java -jar swagger/swagger-codegen-cli.jar help
Now you need to generate the Strava Swagger API locally to be able to use it. From within the resources folder, run the following command:
# (Windows)
java -jar swagger-codegen-cli.jar generate -i https://developers.strava.com/swagger/swagger.json -l python
This will download the API specification as a Python library in a folder called swagger_client
.
We will temporarily add the swagger
to our python PATH
for making it available during the execution of this notebook only.
import sys
sys.path.append(os.path.abspath('swagger'))
import swagger_client
swagger_client.configuration.access_token = strava_access_token
Getting Over the Humps¶
Surprisingly, this took a bit longer than I hoped to figure everything out. Here are a few issues I ran into and how to solve them.
Issue 1¶
The examples on Strava website assume your code is running from the same directory you generated the swagger API, where swagger_client
can be a "relative import", otherwise you will need to add the swagger_client
folder to your Python PATH
.
Issue 2¶
Bug in the Strava Python examples. If you are using Python 3 ignore this, but if you are using Python 2 the following code in the examples is incorrect.
from __future__ import print_statement
Change it to.
from __future__ import print_function
Issue 3¶
Another bug in Strava Python examples.
You must define every API class before using it. And it appears that every time we instantiate a new module, it resets the access token to an empty string.
Setting the access token at the client level, then providing the authenticated client with the following, also does not seem to work.
swagger_client.configuration.access_token = strava_access_token
api_instance = swagger_client.AthletesApi(api_client=swagger_client)
You can try and hardcode your access_token into the swagger configuration.py
, and maybe this is what Strava assumes you do, but this is probably not a good idea.
Alternatively it may be possible to provide the access token in config.json
when building the swagger api with the -c
keyword argument, but I could not get this to work.
Test Working¶
Okay, we got it working.
from swagger_client.rest import ApiException
api_instance = swagger_client.AthletesApi()
api_instance.api_client.configuration.access_token = strava_access_token
try:
r = api_instance.get_logged_in_athlete()
except ApiException as e:
print(e)
athlete_id = r.id
print(athlete_id)
print(r.follower_count)
print(r.friend_count)
print(round(r.follower_count / r.friend_count,2))
[(x.name, x.country) for x in r.clubs]
r.shoes
r.bikes
List Activities¶
Using the method at https://developers.strava.com/docs/reference/#api-Activities-getLoggedInAthleteActivities
Let's get all my activities, using pagination, at 30 activites per page. If you know you have A LOT of activities, you should probably set some limit here.
api_instance = swagger_client.ActivitiesApi()
api_instance.api_client.configuration.access_token = strava_access_token
results = []
i = 1
while True:
print('page:',i)
activities = api_instance.get_logged_in_athlete_activities(page=i, per_page=30)
if activities:
results.append(activities)
i+=1
else:
break
Conversion¶
# flatten the list of lists
activities = [a for r in results for a in r]
# convert each Strava Activity Object to Python dictionary for more general use
records = [a.to_dict() for a in activities]
len(activities)
len(records)
Data Dump¶
We should backup the data on disk, in case we want to use it later without needing to call the API again.
NOTE: Since the data contains properties of type datetime
, and Python does not know what to do with these when converting a dict
to json
string, we define two functions to "serialize" and "deserialize" the types to ISO format during read/write.
Write¶
import json
from datetime import datetime
class datetime2iso(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
else:
return json.JSONEncoder.default(self, obj)
with open('data/strava_activites.json', 'w+') as f:
for d in records:
f.write(json.dumps(d, cls=datetime2iso))
f.write('\n')
Read¶
import json
import dateutil
def iso2datetime(obj):
d = {}
for k,v in obj:
if isinstance(v, str):
try:
d[k] = dateutil.parser.parse(v)
except ValueError:
d[k] = v
else:
d[k] = v
return d
records = []
with open('data/strava_activites.json', 'r') as f:
for d in f.readlines():
records.append(json.loads(d, object_pairs_hook=iso2datetime))
Preview¶
len(records)
records[0]
Data Analysis¶
Let's quickly bring this into Pandas to summarize the data.
Using the definitions provided by Strava here on Detailed Activity Model.
import pandas as pd
df = pd.DataFrame.from_records(records)
df.shape
Preview¶
df.head(2)
Conversions¶
# convert distance in meters to kilometers
df.loc[:,'distance'] = df.distance / 1000
# convert moving in seconds time to minutes
df.loc[:,'moving_time'] = df.moving_time / 60
# convert average speed in meters per second to minutes per kilometer
df.loc[:,'average_speed_mpk'] = 16.666 / df.average_speed
# create new column average speed in kilometers per hour
df.loc[:,'average_speed_kph'] = 60 / df.average_speed_mpk
Formatting¶
# format date to be used in plot labels
df.loc[:,'start_date_formatted'] = df.start_date.dt.strftime('%b %Y')
df.loc[:,'start_date_year'] = df.start_date.dt.year
df.loc[:,'start_date_month'] = df.start_date.dt.month
df.loc[:,'start_date_weekday'] = df.start_date.dt.weekday
Statistics¶
Counts¶
df.type.value_counts()
df.groupby('timezone').type.value_counts().to_frame()
Describe¶
df.groupby('type')\
[['average_speed_mpk',
'average_speed_kph',
'distance','moving_time',
'total_elevation_gain',
'achievement_count']]\
.describe()\
.T
Correlations¶
df[['average_speed_mpk',
'average_speed_kph',
'distance','moving_time',
'total_elevation_gain',
'achievement_count']]\
.corr()
Plots¶
For visualizations in this notebook i am using a library called Altair. For more on that see https://altair-viz.github.io/
import altair as alt
print(alt.__version__)
alt.renderers.enable('default')
print(alt.renderers.active)
data = df.loc[df.type=='Run',:][['id','start_date_formatted','average_speed_mpk']]
data.loc[:,'average_speed_mpk'] = data.average_speed_mpk.apply(lambda d: round(d,1))
bar = alt.Chart(data).mark_bar().encode(
alt.X('average_speed_mpk:Q', bin=alt.Bin(step=.25)),
alt.Y('count()'),
tooltip=['average_speed_mpk:Q','count()']
)
bar.title = 'Average Speed (min. per km.) by Count of Runs'
rule = alt.Chart(data).mark_rule(color='orange').encode(
x='mean(average_speed_mpk):Q',
size=alt.value(2)
)
alt.layer(
bar,
rule
)
data = df.loc[df.type=='Run',:][['id','start_date_formatted','distance']]
data.loc[:,'distance'] = data.distance.apply(lambda d: round(d))
bar = alt.Chart(data).mark_bar().encode(
alt.X('distance:Q', bin=alt.Bin(step=1)),
alt.Y('count()'),
tooltip=['distance:Q','count()']
)
bar.title = 'Distance (km.) by Count of Runs'
rule = alt.Chart(data).mark_rule(color='orange').encode(
x='mean(distance):Q',
size=alt.value(2)
)
alt.layer(
bar,
rule
)
data = df.loc[df['type'].isin(['Run','Ride']),:]
data = data[['id','start_date_formatted','type','distance','achievement_count']]
chart = alt.Chart(data).transform_calculate(
url='https://www.strava.com/activities/' + alt.datum.id
).mark_circle().encode(
x='distance',
y='achievement_count',
color='type',
href='url:N',
tooltip=['start_date_formatted','distance','achievement_count','url:N']
).facet(
column='type'
).resolve_scale(
x='independent',
y='independent'
)
chart.title = 'Activities by Acheivement Count and Distance (click circle to go to activity)'
chart
data_run = df.loc[df['type'] == 'Run',:]
data_run = data_run[['id','start_date_formatted','type','distance','average_speed_mpk']]
chart_run = alt.Chart(data_run).transform_calculate(
url='https://www.strava.com/activities/' + alt.datum.id
).mark_circle().encode(
x='distance',
y='average_speed_mpk',
color='type',
href='url:N',
tooltip=['start_date_formatted','distance','average_speed_mpk','url:N']
).interactive()
chart_run.title = 'Runs'
data_ride = df.loc[df['type'] == 'Ride',:]
data_ride = data_ride[['id','start_date_formatted','type','distance','average_speed_kph']]
chart_ride = alt.Chart(data_ride).transform_calculate(
url='https://www.strava.com/activities/' + alt.datum.id
).mark_circle().encode(
x='distance',
y='average_speed_kph',
color='type',
href='url:N',
tooltip=['start_date_formatted','distance','average_speed_kph','url:N']
).interactive()
chart_ride.title = 'Rides'
chart = alt.hconcat(chart_run, chart_ride)
chart.title = 'Activity Type by Average Speed and Distance (click circle to go to activity)'
chart
data = df[['id','type','start_date','distance','moving_time']].copy()
data.loc[:,'month'] = data.loc[:,'start_date'].dt.strftime('%Y-%m-01')
chart = alt.Chart(data).mark_bar().encode(
x='month',
y='count(id)',
color='type',
tooltip=['month','type','count(id)']
).interactive()
chart.title = 'Activities over Time'
chart
data = df[['id','type','start_date_month','moving_time','distance']].copy()
chart_activities = alt.Chart(data).mark_bar().encode(
x='start_date_month',
y='count(id)',
color='type',
tooltip=['start_date_month','type','count(id)']
).interactive()
chart_time = alt.Chart(data).mark_bar().encode(
x='start_date_month',
y='sum(moving_time)',
color='type',
tooltip=['start_date_month','type','sum(moving_time)']
).interactive()
chart_distance = alt.Chart(data).mark_bar().encode(
x='start_date_month',
y='sum(distance)',
color='type',
tooltip=['start_date_month','type','sum(distance)']
).interactive()
chart_activities.title = 'Number of Activities by Month'
chart_time.title = 'Total Moving Time by Month'
chart_distance.title = 'Total Distance by Month'
chart = alt.hconcat(chart_activities, chart_time, chart_distance)
chart
data = df[['id','type','start_date_weekday','moving_time','distance']].copy()
chart_activities = alt.Chart(data).mark_bar().encode(
x='start_date_weekday',
y='count(id)',
color='type',
tooltip=['start_date_weekday','type','count(id)']
).interactive()
chart_time = alt.Chart(data).mark_bar().encode(
x='start_date_weekday',
y='sum(moving_time)',
color='type',
tooltip=['start_date_weekday','type','sum(moving_time)']
).interactive()
chart_distance = alt.Chart(data).mark_bar().encode(
x='start_date_weekday',
y='sum(distance)',
color='type',
tooltip=['start_date_weekday','type','sum(distance)']
).interactive()
chart_activities.title = 'Number of Activities by Weekday'
chart_time.title = 'Total Moving Time by Weekday'
chart_distance.title = 'Total Distance by Weekday'
chart = alt.hconcat(chart_activities, chart_time, chart_distance)
chart
Geographic Data¶
Strava provides here a summarized version of each activity. As well the route map is summarized as a simple list of latitue and longitude coordinates, and encoded using Google's Polyline Algorithm.
Handy enough, there is a polyline library in Python which we can use to decode this. Get it with pip install polyline
import polyline
coo = polyline.decode(records[0]['map']['summary_polyline'])
coo[:10]
Now we can apply basic plotting techniques to represent the coordinates geographically, to get something that looks similar to https://www.strava.com/heatmap.
Below is a personal heatmap generated in Strava. Just an FYI, viewing personal heatmaps in Strava is a paid feature(!)
A simple way to do this in Python is start with a basic Matplotlib canvas.
Since the activies are so far unfiliterd, and I know I have recorded activies in various parts of the world, let's apply some basic filtering to get valid activity coordinates from general geographic locations of interest (using rounding of coordincates).
I'm not going to show too many details here, and adding backgrounds to maps is complicated. However, this can be personally very interesting to see which neighborhoods or routes you favor over others by just glancing this over.
Matplotlib¶
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Des Moines, IA USA¶
coo_list = [polyline.decode(x['map']['summary_polyline']) for x in records
if x['timezone'] == '(GMT-06:00) America/Chicago'
and x['start_latlng'] is not None
and x['start_latlng'][0] <= 41.6 and round(x['start_latlng'][0],1) >= 41.2 and round(x['start_latlng'][1]) == -94]
print('Activities:', len(coo_list))
fig = plt.figure(figsize=(12,12))
fig.suptitle('Strava Activity in Des Moines, Iowa')
ax = plt.Axes(fig, [0., 0., 1., 1.], )
ax.set_aspect('equal')
ax.set_axis_off()
fig.add_axes(ax)
for coo in coo_list:
lat,lon = map(list, zip(*coo))
plt.plot(lon, lat, lw=0.5, alpha=.9)
Altair¶
(Vega Lite / D3)¶
To do the same thing in Altair, we need to get the data into a format required by d3, preferably the compressed geo file format topojson
.
The steps taken are then the following:
- Transform data in Python using
- decode the polylines into list of longitute latitude
- convert list of longitute latitude to list of
Shapely
Points - convert list of points into
Shapely
LineStrings - convert
Pandas
DataFrame with LineStrings and metadata toGeoPandas
DataFrame
- Export as
GeoJson
- Convert
GeoJSON
toTopoJSON
usinggeo2topo
geo2topo -q 1e6 line=geojson.json > topojson.json
- Import topojson into
Altair
and plot
Requirements:
- GeoPandas and dependencies http://geopandas.org/install.html
- TopoJSON Server https://github.com/topojson/topojson-server#installing
Caveats:
- at time of writing we cannot add hover tooltips in Altair because Vega Lite does not yet support interactive geoshapes (follow the issue here https://github.com/altair-viz/altair/issues/679)
Background Reading:
- for more on how D3 uses TopoJSON read Mike Bostock's Command Line Cartography (formerly Let's Make a Map)
from geopandas import GeoDataFrame
from shapely.geometry import Point, LineString
def decode_map(x):
if not x['summary_polyline'] == None:
return polyline.decode(x['summary_polyline'])
Des Moines, IA USA¶
df_tmp = df.loc[df.timezone == '(GMT-06:00) America/Chicago',:]
df_tmp = df_tmp.loc[df.start_latlng.apply(lambda x: round(x[0],1) <= 41.6 and round(x[0],1) >= 41.2 and round(x[1]) == -94 if not x == None else False),:]
df_tmp.loc[:,'map_decoded'] = df_tmp.loc[:,'map'].apply(decode_map)
df_tmp.loc[:,'geometry_list'] = df_tmp.loc[:,'map_decoded'].apply(lambda d: [Point(y,x) for x,y in d] if not d == None else pd.np.nan)
df_map = df_tmp.loc[df_tmp.geometry_list.isnull() == False,:]
df_map.loc[:,'geo_line'] = df_map.loc[:,'geometry_list'].apply(LineString)
gdf = GeoDataFrame(df_map[['id','start_date_formatted','type','geo_line']], geometry='geo_line')
gdf.head()
gdf.to_file('public_data/geojson.json', driver="GeoJSON")
now run below command to generate topojson file¶
geo2topo -q 1e6 line=geojson.json > topojson.json
data = alt.topo_feature('https://knanne.github.io/notebooks/jupyter/public_data/topojson.json', 'line')
data
chart = alt.Chart(data).mark_geoshape(
strokeWidth=.7,
opacity=.9,
filled=False
).properties(
title='Strava Activity in Des Moines, Iowa',
width=900,
height=700
)
chart