Upsampling trajectory data with Google Directions API

Jonathan Mandt
3 min readMar 20, 2020
Heat Map of T-Drive Data Set

Evaluating and plotting trajectorial data has always one crucial requirement. The ability to accurately plot this data always depends on how far apart two consecutive coordinates are from each other, both in time and space.

In this article I would like to show how you can upsample trajectorial data with the Google Directions API. As example data I chose the T-Drive data set, but you could basically reuse the exact code if you adjust the column names and formats to your input data (an example is given below).

The T-Drive data set contains 17,662,250 GPS points, which were recorded within Beijing from February the 2nd to February the 8th 2008, representing the trajectories of 10,357 different taxi drivers. For each taxi there is a text file containing the whole trajectory of the vehicle.

The average sampling rate is one data point every 177 seconds considering all trajectories of all taxis in the greater area of Beijing. As recording rates vary a lot, we filtered those, where the time gap between two consecutive points was greater than 30 seconds. The figure above shows a heat map of the remaining points with recording rate less than 30 seconds. The colors intensify with the number of taxis being in the same area, whereby blue indicates areas with fewer taxis and red implies areas with a very high number of taxis.

You can download the whole T-Drive dataset here.

So every 177 seconds is not a very high density of datapoints. In order to be able to simulate Taxi drives through Beijing you would need a much higher density. Thats where Googles Direction API comes into play. I used it to find in between point between to consecutive GPS coordinates if the time delta in between them was less than 30 seconds.

Here is a sample of how the initial data looks like. vin stands for vehicle identification number. The other fields should be self explanatory.

To upsample the trajectory data, I first calculated the time between points. The mean for the dataset with vin = 1191 was 483 seconds, what is more than 8 minutes. This file consists of 1102 records and after upsampling, we had a total of 7974 with a mean delta of 67 after the first iteration. If we would like to push this to the limit we could do another iteration.

ATTENTION: Each call to the Google Directions API costs money.

In addition to that I tried MapBox Map Matching API to get even better results as this API claims to also provides points in between such that you can plot the data accurately onto a map. Sadly, the confidence value it returns was in all cases very low or no data was returned. I only tested it for Beijing, so I cannot say if it is more accurate for other locations.

You can find my code here to test it yourself. Thanks for reading.

--

--