visualize_strava_data_in_python

Strava API

The official Strava API docs are here https://developers.strava.com/docs/

First go and create a free application to generate your access token. Go here https://www.strava.com/settings/api

I have already done this and defined my access token as an environment variable, called STRAVA_ACCESS_TOKEN.

In [1]:
import os
In [2]:
strava_access_token = os.environ['STRAVA_ACCESS_TOKEN']

Swagger Client

You interface with Strava's API via Swagger, and the OpenAPI Specification.

Using a Swagger API comes with a few prequequisites, including the Swagger CodeGen Command Line Interface (CLI), and the Java Runtime Environment (JRE).

To get the Swagger CLI on Windows, i did the following:

  • create a folder in the same location as this notebook, called swagger
  • from there, run the following command via Windows PowerShell
    Invoke-WebRequest -OutFile swagger-codegen-cli.jar http://central.maven.org/maven2/io/swagger/swagger-codegen-cli/2.3.1/swagger-codegen-cli-2.3.1.jar
    

After you have Swagger CodeGen and JRE installed, test everything is working with the following:

# (Windows)
java -jar swagger/swagger-codegen-cli.jar help

Now you need to generate the Strava Swagger API locally to be able to use it. From within the resources folder, run the following command:

# (Windows)
java -jar swagger-codegen-cli.jar generate -i https://developers.strava.com/swagger/swagger.json -l python

This will download the API specification as a Python library in a folder called swagger_client.

We will temporarily add the swagger to our python PATH for making it available during the execution of this notebook only.

In [3]:
import sys
sys.path.append(os.path.abspath('swagger'))
In [4]:
import swagger_client
In [5]:
swagger_client.configuration.access_token = strava_access_token

Getting Over the Humps

Surprisingly, this took a bit longer than I hoped to figure everything out. Here are a few issues I ran into and how to solve them.

Issue 1

The examples on Strava website assume your code is running from the same directory you generated the swagger API, where swagger_client can be a "relative import", otherwise you will need to add the swagger_client folder to your Python PATH.

Issue 2

Bug in the Strava Python examples. If you are using Python 3 ignore this, but if you are using Python 2 the following code in the examples is incorrect.

from __future__ import print_statement

Change it to.

from __future__ import print_function

Issue 3

Another bug in Strava Python examples.

You must define every API class before using it. And it appears that every time we instantiate a new module, it resets the access token to an empty string.

Setting the access token at the client level, then providing the authenticated client with the following, also does not seem to work.

swagger_client.configuration.access_token = strava_access_token
api_instance = swagger_client.AthletesApi(api_client=swagger_client)

You can try and hardcode your access_token into the swagger configuration.py, and maybe this is what Strava assumes you do, but this is probably not a good idea.

Alternatively it may be possible to provide the access token in config.json when building the swagger api with the -c keyword argument, but I could not get this to work.

Test Working

Okay, we got it working.

In [6]:
from swagger_client.rest import ApiException
In [7]:
api_instance = swagger_client.AthletesApi()
In [8]:
api_instance.api_client.configuration.access_token = strava_access_token
In [9]:
try: 
    r = api_instance.get_logged_in_athlete()
except ApiException as e:
    print(e)
In [10]:
athlete_id = r.id
print(athlete_id)
2164216
In [11]:
print(r.follower_count)
print(r.friend_count)
print(round(r.follower_count / r.friend_count,2))
44
60
0.73
In [12]:
[(x.name, x.country) for x in r.clubs]
Out[12]:
[('Monkeys On Bikes', 'United States'),
 ('Ride for Nokor Tep', 'Singapore'),
 ('The Strava Club', 'United States'),
 ('Relive Running', 'Netherlands'),
 ('Reddit Running', 'United States')]
In [13]:
r.shoes
Out[13]:
[{'distance': 982220.0,
  'id': 'g187719',
  'name': 'Nike',
  'primary': True,
  'resource_state': 2}]
In [14]:
r.bikes
Out[14]:
[{'distance': 858703.0,
  'id': 'b892090',
  'name': "GF 29'er",
  'primary': True,
  'resource_state': 2}, {'distance': 855093.0,
  'id': 'b892097',
  'name': 'Sweet Felt',
  'primary': False,
  'resource_state': 2}]

List Activities

Using the method at https://developers.strava.com/docs/reference/#api-Activities-getLoggedInAthleteActivities

Let's get all my activities, using pagination, at 30 activites per page. If you know you have A LOT of activities, you should probably set some limit here.

In [15]:
api_instance = swagger_client.ActivitiesApi()
api_instance.api_client.configuration.access_token = strava_access_token

results = []
i = 1
while True:
    print('page:',i)
    activities = api_instance.get_logged_in_athlete_activities(page=i, per_page=30)
    if activities:
        results.append(activities)
        i+=1
    else:
        break
page: 1
page: 2
page: 3
page: 4
page: 5
page: 6
page: 7
page: 8
page: 9

Conversion

In [16]:
# flatten the list of lists
activities = [a for r in results for a in r]
In [17]:
# convert each Strava Activity Object to Python dictionary for more general use
records = [a.to_dict() for a in activities]
In [18]:
len(activities)
Out[18]:
214
In [19]:
len(records)
Out[19]:
214

Data Dump

We should backup the data on disk, in case we want to use it later without needing to call the API again.

NOTE: Since the data contains properties of type datetime, and Python does not know what to do with these when converting a dict to json string, we define two functions to "serialize" and "deserialize" the types to ISO format during read/write.

Write

In [20]:
import json
from datetime import datetime

class datetime2iso(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        else:
            return json.JSONEncoder.default(self, obj)
In [21]:
with open('data/strava_activites.json', 'w+') as f:
    for d in records:
        f.write(json.dumps(d, cls=datetime2iso))
        f.write('\n')

Read

In [1]:
import json
import dateutil

def iso2datetime(obj):
    d = {}
    for k,v in obj:
        if isinstance(v, str):
            try:
                d[k] = dateutil.parser.parse(v)
            except ValueError:
                d[k] = v
        else:
            d[k] = v
    return d
In [2]:
records = []
with open('data/strava_activites.json', 'r') as f:
    for d in f.readlines():
        records.append(json.loads(d, object_pairs_hook=iso2datetime))

Preview

In [3]:
len(records)
Out[3]:
214
In [4]:
records[0]
Out[4]:
{'id': 2678355068,
 'external_id': None,
 'upload_id': 2840271607,
 'athlete': {'id': 2164216},
 'name': 'Morning pick me up to Castle',
 'distance': 4687.1,
 'moving_time': 1535,
 'elapsed_time': 1711,
 'total_elevation_gain': 77.9,
 'elev_high': 161.5,
 'elev_low': 117.4,
 'type': 'Run',
 'start_date': datetime.datetime(2019, 9, 4, 6, 46, 8, tzinfo=tzutc()),
 'start_date_local': datetime.datetime(2019, 9, 4, 8, 46, 8, tzinfo=tzutc()),
 'timezone': '(GMT+01:00) Europe/Budapest',
 'start_latlng': [47.49, 19.03],
 'end_latlng': [47.49, 19.03],
 'achievement_count': 3,
 'kudos_count': 2,
 'comment_count': 0,
 'athlete_count': 1,
 'photo_count': 0,
 'total_photo_count': 3,
 'map': {'id': 'a2678355068',
  'polyline': None,
  'summary_polyline': 'eh{`HsjcsBk@aACb@@LQdB_@zBC`AEPMBMQIi@Nc@b@m@`@_A`@sB@e@Qs@CSBg@GMKO]SMQcAc@[?m@QGMI[EWMe@?OmAm@EG@[Ea@IW_@q@GW@S^q@BM?MUs@Ei@JcAA}@CUGQWQ_@KCEYm@EUDWL]t@}Ap@gAP_@`@e@f@aAFi@P{@Vu@d@g@Ta@p@{@Ns@d@aAVeAVyBNm@CCI@o@VGFERE@BWNe@Vi@j@q@Py@FKASB[Vs@d@gBN[Ni@d@oARmAB_ACk@c@i@WEu@@OJCFAFJN\\P^\\Rb@@b@YbAO`@EDE@IGE@GPED[Dg@QKAKKo@sAILs@l@HLBEAIW_@GAOB@FcAj@s@h@q@XUPQHEASj@I|@Ml@a@fAkAzAGDC?BF?HWx@QfABn@V`A?RCNMP{@l@k@XiAv@SFAJB\\GhA@pBCRQl@i@zCW`@Yj@S`AQb@?HFERk@tA_CFGZGNBDJ?LENq@p@k@r@AJ?TFf@U\\[XWb@Qf@Kn@?XDXTp@h@bARv@tA~BFFVj@TT^n@LHz@\\Ne@Jk@Pe@NkAPOHW@HJ[Bm@J_@FLCDVFPCTAn@\\N@DBVr@NRh@dBRRFPBVT~@?JP`@P@LHPZPhAFj@r@rBFJTAXb@DJCRKLMHEFWJc@v@Ql@DR'},
 'trainer': False,
 'commute': False,
 'manual': False,
 'private': False,
 'flagged': False,
 'workout_type': 3,
 'average_speed': 3.053,
 'max_speed': 10.5,
 'has_kudoed': False,
 'gear_id': 'g187719',
 'kilojoules': None,
 'average_watts': None,
 'device_watts': None,
 'max_watts': None,
 'weighted_average_watts': None}

Data Analysis

Let's quickly bring this into Pandas to summarize the data.

Using the definitions provided by Strava here on Detailed Activity Model.

In [5]:
import pandas as pd
In [6]:
df = pd.DataFrame.from_records(records)
In [7]:
df.shape
Out[7]:
(214, 39)

Preview

In [8]:
df.head(2)
Out[8]:
achievement_count athlete athlete_count average_speed average_watts comment_count commute device_watts distance elapsed_time ... start_date_local start_latlng timezone total_elevation_gain total_photo_count trainer type upload_id weighted_average_watts workout_type
0 3 {'id': 2164216} 1 3.053 NaN 0 False None 4687.1 1711 ... 2019-09-04 08:46:08+00:00 [47.49, 19.03] (GMT+01:00) Europe/Budapest 77.9 3 False Run 2.840272e+09 None 3.0
1 2 {'id': 2164216} 1 3.051 NaN 2 False None 6748.7 2344 ... 2019-09-02 19:03:44+00:00 [47.49, 19.03] (GMT+01:00) Europe/Budapest 117.9 0 False Run 2.835674e+09 None 3.0

2 rows × 39 columns

Conversions

In [9]:
# convert distance in meters to kilometers
df.loc[:,'distance'] = df.distance / 1000

# convert moving in seconds time to minutes
df.loc[:,'moving_time'] = df.moving_time / 60

# convert average speed in meters per second to minutes per kilometer
df.loc[:,'average_speed_mpk'] = 16.666 / df.average_speed

# create new column average speed in kilometers per hour
df.loc[:,'average_speed_kph'] = 60 / df.average_speed_mpk

Formatting

In [10]:
# format date to be used in plot labels
df.loc[:,'start_date_formatted'] = df.start_date.dt.strftime('%b %Y')
df.loc[:,'start_date_year'] = df.start_date.dt.year
df.loc[:,'start_date_month'] = df.start_date.dt.month
df.loc[:,'start_date_weekday'] = df.start_date.dt.weekday

Statistics

Counts

In [11]:
df.type.value_counts()
Out[11]:
Run          167
Ride          43
Hike           2
AlpineSki      1
Walk           1
Name: type, dtype: int64
In [26]:
df.groupby('timezone').type.value_counts().to_frame()
Out[26]:
type
timezone type
(GMT+01:00) Europe/Amsterdam Run 94
Ride 12
Hike 1
(GMT+01:00) Europe/Budapest Run 7
Ride 1
(GMT+01:00) Europe/Vienna AlpineSki 1
(GMT+08:00) Asia/Kuala_Lumpur Hike 1
Run 1
(GMT+08:00) Asia/Singapore Ride 9
Run 1
(GMT-06:00) America/Chicago Run 64
Ride 21
Walk 1

Describe

In [24]:
df.groupby('type')\
    [['average_speed_mpk',
      'average_speed_kph',
      'distance','moving_time',
      'total_elevation_gain',
      'achievement_count']]\
    .describe()\
    .T
Out[24]:
type AlpineSki Hike Ride Run Walk
average_speed_mpk count 1.000000 2.000000 43.000000 167.000000 1.000000
mean 2.662300 11.130909 3.026470 5.821886 16.212062
std NaN 4.283397 0.869522 1.100746 NaN
min 2.662300 8.102090 2.163011 4.444267 16.212062
25% 2.662300 9.616500 2.499694 5.223633 16.212062
50% 2.662300 11.130909 2.818059 5.498515 16.212062
75% 2.662300 12.645319 3.369700 6.047209 16.212062
max 2.662300 14.159728 6.864086 13.934783 16.212062
average_speed_kph count 1.000000 2.000000 43.000000 167.000000 1.000000
mean 22.536901 5.821433 20.934419 10.543916 3.700948
std NaN 2.240204 4.241072 1.353559 NaN
min 22.536901 4.237369 8.741150 4.305772 3.700948
25% 22.536901 5.029401 17.806312 9.921997 3.700948
50% 22.536901 5.821433 21.291252 10.912036 3.700948
75% 22.536901 6.613465 24.003960 11.486259 3.700948
max 22.536901 7.405496 27.739110 13.500540 3.700948
distance count 1.000000 2.000000 43.000000 167.000000 1.000000
mean 6.348000 7.263100 48.064842 6.019595 7.400000
std NaN 3.112543 30.471867 1.803985 NaN
min 6.348000 5.062200 4.750600 1.057500 7.400000
25% 6.348000 6.162650 23.090150 4.983600 7.400000
50% 6.348000 7.263100 40.012500 5.754600 7.400000
75% 6.348000 8.363550 61.853650 7.000000 7.400000
max 6.348000 9.464000 121.092000 12.309700 7.400000
moving_time count 1.000000 2.000000 43.000000 167.000000 1.000000
mean 16.900000 74.175000 135.685659 35.127345 120.000000
std NaN 3.523749 79.699345 13.295977 NaN
min 16.900000 71.683333 20.300000 4.700000 120.000000
25% 16.900000 72.929167 79.808333 27.583333 120.000000
50% 16.900000 74.175000 114.783333 32.283333 120.000000
75% 16.900000 75.420833 171.475000 40.608333 120.000000
max 16.900000 76.666667 341.583333 112.000000 120.000000
total_elevation_gain count 1.000000 2.000000 43.000000 167.000000 1.000000
mean 3.600000 209.700000 245.886047 25.802994 0.000000
std NaN 276.337330 320.976514 38.220239 NaN
min 3.600000 14.300000 0.000000 0.000000 0.000000
25% 3.600000 112.000000 51.150000 0.000000 0.000000
50% 3.600000 209.700000 113.700000 18.800000 0.000000
75% 3.600000 307.400000 250.400000 33.000000 0.000000
max 3.600000 405.100000 1370.300000 393.000000 0.000000
achievement_count count 1.000000 2.000000 43.000000 167.000000 1.000000
mean 0.000000 0.000000 7.767442 1.305389 0.000000
std NaN 0.000000 9.841209 2.081799 NaN
min 0.000000 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 0.000000 0.000000
50% 0.000000 0.000000 4.000000 0.000000 0.000000
75% 0.000000 0.000000 12.500000 2.000000 0.000000
max 0.000000 0.000000 40.000000 11.000000 0.000000

Correlations

In [27]:
df[['average_speed_mpk',
    'average_speed_kph',
    'distance','moving_time',
    'total_elevation_gain',
    'achievement_count']]\
    .corr()
Out[27]:
average_speed_mpk average_speed_kph distance moving_time total_elevation_gain achievement_count
average_speed_mpk 1.000000 -0.835719 -0.541654 -0.348793 -0.325337 -0.404554
average_speed_kph -0.835719 1.000000 0.768040 0.610007 0.494665 0.543474
distance -0.541654 0.768040 1.000000 0.958207 0.869368 0.448066
moving_time -0.348793 0.610007 0.958207 1.000000 0.865119 0.367788
total_elevation_gain -0.325337 0.494665 0.869368 0.865119 1.000000 0.225868
achievement_count -0.404554 0.543474 0.448066 0.367788 0.225868 1.000000

Plots

For visualizations in this notebook i am using a library called Altair. For more on that see https://altair-viz.github.io/

In [29]:
import altair as alt
print(alt.__version__)
alt.renderers.enable('default')
print(alt.renderers.active)
3.2.0
default
In [30]:
data = df.loc[df.type=='Run',:][['id','start_date_formatted','average_speed_mpk']]
data.loc[:,'average_speed_mpk'] = data.average_speed_mpk.apply(lambda d: round(d,1))

bar = alt.Chart(data).mark_bar().encode(
    alt.X('average_speed_mpk:Q', bin=alt.Bin(step=.25)),
    alt.Y('count()'),
    tooltip=['average_speed_mpk:Q','count()']
)

bar.title = 'Average Speed (min. per km.) by Count of Runs'

rule = alt.Chart(data).mark_rule(color='orange').encode(
    x='mean(average_speed_mpk):Q',
    size=alt.value(2)
)

alt.layer(
    bar,
    rule
)
Out[30]:
In [31]:
data = df.loc[df.type=='Run',:][['id','start_date_formatted','distance']]
data.loc[:,'distance'] = data.distance.apply(lambda d: round(d))

bar = alt.Chart(data).mark_bar().encode(
    alt.X('distance:Q', bin=alt.Bin(step=1)),
    alt.Y('count()'),
    tooltip=['distance:Q','count()']
)

bar.title = 'Distance (km.) by Count of Runs'

rule = alt.Chart(data).mark_rule(color='orange').encode(
    x='mean(distance):Q',
    size=alt.value(2)
)

alt.layer(
    bar,
    rule
)
Out[31]:
In [33]:
data = df.loc[df['type'].isin(['Run','Ride']),:]
data = data[['id','start_date_formatted','type','distance','achievement_count']]

chart = alt.Chart(data).transform_calculate(
    url='https://www.strava.com/activities/' + alt.datum.id
).mark_circle().encode(
    x='distance',
    y='achievement_count',
    color='type',
    href='url:N',
    tooltip=['start_date_formatted','distance','achievement_count','url:N']
).facet(
    column='type'
).resolve_scale(
    x='independent',
    y='independent'
)

chart.title = 'Activities by Acheivement Count and Distance (click circle to go to activity)'

chart
Out[33]:
In [34]:
data_run = df.loc[df['type'] == 'Run',:]
data_run = data_run[['id','start_date_formatted','type','distance','average_speed_mpk']]

chart_run = alt.Chart(data_run).transform_calculate(
    url='https://www.strava.com/activities/' + alt.datum.id
).mark_circle().encode(
    x='distance',
    y='average_speed_mpk',
    color='type',
    href='url:N',
    tooltip=['start_date_formatted','distance','average_speed_mpk','url:N']
).interactive()

chart_run.title = 'Runs'

data_ride = df.loc[df['type'] == 'Ride',:]
data_ride = data_ride[['id','start_date_formatted','type','distance','average_speed_kph']]

chart_ride = alt.Chart(data_ride).transform_calculate(
    url='https://www.strava.com/activities/' + alt.datum.id
).mark_circle().encode(
    x='distance',
    y='average_speed_kph',
    color='type',
    href='url:N',
    tooltip=['start_date_formatted','distance','average_speed_kph','url:N']
).interactive()

chart_ride.title = 'Rides'

chart = alt.hconcat(chart_run, chart_ride)

chart.title = 'Activity Type by Average Speed and Distance (click circle to go to activity)'

chart
Out[34]:
In [38]:
data = df[['id','type','start_date','distance','moving_time']].copy()
data.loc[:,'month'] = data.loc[:,'start_date'].dt.strftime('%Y-%m-01')

chart = alt.Chart(data).mark_bar().encode(
    x='month',
    y='count(id)',
    color='type',
    tooltip=['month','type','count(id)']
).interactive()

chart.title = 'Activities over Time'

chart
Out[38]:
In [39]:
data = df[['id','type','start_date_month','moving_time','distance']].copy()

chart_activities = alt.Chart(data).mark_bar().encode(
    x='start_date_month',
    y='count(id)',
    color='type',
    tooltip=['start_date_month','type','count(id)']
).interactive()

chart_time = alt.Chart(data).mark_bar().encode(
    x='start_date_month',
    y='sum(moving_time)',
    color='type',
    tooltip=['start_date_month','type','sum(moving_time)']
).interactive()

chart_distance = alt.Chart(data).mark_bar().encode(
    x='start_date_month',
    y='sum(distance)',
    color='type',
    tooltip=['start_date_month','type','sum(distance)']
).interactive()

chart_activities.title = 'Number of Activities by Month'
chart_time.title = 'Total Moving Time by Month'
chart_distance.title = 'Total Distance by Month'

chart = alt.hconcat(chart_activities, chart_time, chart_distance)

chart
Out[39]:
In [40]:
data = df[['id','type','start_date_weekday','moving_time','distance']].copy()

chart_activities = alt.Chart(data).mark_bar().encode(
    x='start_date_weekday',
    y='count(id)',
    color='type',
    tooltip=['start_date_weekday','type','count(id)']
).interactive()

chart_time = alt.Chart(data).mark_bar().encode(
    x='start_date_weekday',
    y='sum(moving_time)',
    color='type',
    tooltip=['start_date_weekday','type','sum(moving_time)']
).interactive()

chart_distance = alt.Chart(data).mark_bar().encode(
    x='start_date_weekday',
    y='sum(distance)',
    color='type',
    tooltip=['start_date_weekday','type','sum(distance)']
).interactive()

chart_activities.title = 'Number of Activities by Weekday'
chart_time.title = 'Total Moving Time by Weekday'
chart_distance.title = 'Total Distance by Weekday'

chart = alt.hconcat(chart_activities, chart_time, chart_distance)

chart
Out[40]:

Geographic Data

Strava provides here a summarized version of each activity. As well the route map is summarized as a simple list of latitue and longitude coordinates, and encoded using Google's Polyline Algorithm.

Handy enough, there is a polyline library in Python which we can use to decode this. Get it with pip install polyline

In [41]:
import polyline
In [42]:
coo = polyline.decode(records[0]['map']['summary_polyline'])
In [43]:
coo[:10]
Out[43]:
[(47.49459, 19.02778),
 (47.49481, 19.02811),
 (47.49483, 19.02793),
 (47.49482, 19.02786),
 (47.49491, 19.02735),
 (47.49507, 19.02673),
 (47.49509, 19.0264),
 (47.49512, 19.02631),
 (47.49519, 19.02629),
 (47.49526, 19.02638)]

Now we can apply basic plotting techniques to represent the coordinates geographically, to get something that looks similar to https://www.strava.com/heatmap.

Below is a personal heatmap generated in Strava. Just an FYI, viewing personal heatmaps in Strava is a paid feature(!)

Strava Personal Heatmap

A simple way to do this in Python is start with a basic Matplotlib canvas.

Since the activies are so far unfiliterd, and I know I have recorded activies in various parts of the world, let's apply some basic filtering to get valid activity coordinates from general geographic locations of interest (using rounding of coordincates).

I'm not going to show too many details here, and adding backgrounds to maps is complicated. However, this can be personally very interesting to see which neighborhoods or routes you favor over others by just glancing this over.

Matplotlib

In [44]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Des Moines, IA USA

In [45]:
coo_list = [polyline.decode(x['map']['summary_polyline']) for x in records
            if x['timezone'] == '(GMT-06:00) America/Chicago'
            and x['start_latlng'] is not None
            and x['start_latlng'][0] <= 41.6 and round(x['start_latlng'][0],1) >= 41.2 and round(x['start_latlng'][1]) == -94]
In [46]:
print('Activities:', len(coo_list))
Activities: 56
In [47]:
fig = plt.figure(figsize=(12,12))
fig.suptitle('Strava Activity in Des Moines, Iowa')
ax = plt.Axes(fig, [0., 0., 1., 1.], )
ax.set_aspect('equal')
ax.set_axis_off()
fig.add_axes(ax)

for coo in coo_list:
    lat,lon = map(list, zip(*coo))
    plt.plot(lon, lat, lw=0.5, alpha=.9)

Altair

(Vega Lite / D3)

To do the same thing in Altair, we need to get the data into a format required by d3, preferably the compressed geo file format topojson.

The steps taken are then the following:

  1. Transform data in Python using
    • decode the polylines into list of longitute latitude
    • convert list of longitute latitude to list of Shapely Points
    • convert list of points into Shapely LineStrings
    • convert Pandas DataFrame with LineStrings and metadata to GeoPandas DataFrame
  2. Export as GeoJson
  3. Convert GeoJSON to TopoJSON using geo2topo
    geo2topo -q 1e6 line=geojson.json > topojson.json
    
  4. Import topojson into Altair and plot

Requirements:

Caveats:

Background Reading:

In [48]:
from geopandas import GeoDataFrame
from shapely.geometry import Point, LineString

def decode_map(x):
    if not x['summary_polyline'] == None:
        return polyline.decode(x['summary_polyline'])

Des Moines, IA USA

In [49]:
df_tmp = df.loc[df.timezone == '(GMT-06:00) America/Chicago',:]
df_tmp = df_tmp.loc[df.start_latlng.apply(lambda x: round(x[0],1) <= 41.6  and round(x[0],1) >= 41.2 and round(x[1]) == -94 if not x == None else False),:]
In [50]:
df_tmp.loc[:,'map_decoded'] = df_tmp.loc[:,'map'].apply(decode_map)

df_tmp.loc[:,'geometry_list'] = df_tmp.loc[:,'map_decoded'].apply(lambda d: [Point(y,x) for x,y in d] if not d == None else pd.np.nan)

df_map = df_tmp.loc[df_tmp.geometry_list.isnull() == False,:]

df_map.loc[:,'geo_line'] = df_map.loc[:,'geometry_list'].apply(LineString)

gdf = GeoDataFrame(df_map[['id','start_date_formatted','type','geo_line']], geometry='geo_line')
In [51]:
gdf.head()
Out[51]:
id start_date_formatted type geo_line
35 2060397406 Jan 2019 Run LINESTRING (-93.76034 41.56645, -93.7587500000...
96 984934815 May 2017 Run LINESTRING (-93.76942 41.55954, -93.7675299999...
99 817262248 Dec 2016 Run LINESTRING (-93.76782 41.55941, -93.7664099999...
100 815054880 Dec 2016 Run LINESTRING (-93.77070000000001 41.56226, -93.7...
104 672812022 Aug 2016 Run LINESTRING (-93.76779000000001 41.5594, -93.76...
In [53]:
gdf.to_file('public_data/geojson.json', driver="GeoJSON")

now run below command to generate topojson file

geo2topo -q 1e6 line=geojson.json > topojson.json
In [55]:
data = alt.topo_feature('https://knanne.github.io/notebooks/jupyter/public_data/topojson.json', 'line')
In [56]:
data
Out[56]:
UrlData({
  format: TopoDataFormat({
    feature: 'line',
    type: 'topojson'
  }),
  url: 'https://knanne.github.io/notebooks/jupyter/public_data/topojson.json'
})
In [57]:
chart = alt.Chart(data).mark_geoshape(
    strokeWidth=.7,
    opacity=.9,
    filled=False
).properties(
    title='Strava Activity in Des Moines, Iowa',
    width=900,
    height=700
)

chart
Out[57]: