Scraping Uber ride history data

This month I’ll complete 1 year riding with Uber. The process of booking a ride and the response time has been amazing. This post however is on a different matter. As a data aficionado I was curious on the various locations I had travelled over the year and the cost each month. Heading over to the Uber API docs was a disappointment as Uber does not provide any api for getting the ride history data. My next plan was to scrape the data from Uber pages using PHP or Python. Just when I was going to start the project to scrape Uber trip data, a little Google search returned a nice bookmarklet by @ummjackson that scrapes the data and exports it to a CSV file.

Uber Data Extractor was a time saver. After you login to your Uber account, clicking on the bookmarklet scrapes the ride data and saves it to a local file. Depending on your total Uber rides this can take a few seconds to minutes. The author also provides a simple visualization of the CSV data using a drag & drop page. The bookmarklet provides the following data fields.

    – trip_id
    – date
    – date_time
    – driver
    – car_type
    – city
    – price
    – payment_method
    – start_time
    – start_address
    – end_time
    – end_address

The simple area chart of my ride history is shown below.

uber rides data

My primary interest was the start_address, end_address and price. Although the start_address varied many times for the same location, adding an extra field to the CSV and consolidating the different start_address to a single source eliminated that problem. Once the CSV was ready I imported it in Google Sheets which enabled me to play with the data and visualize some. There was some missing data, may be cancelled trips (need to check that), but my primary purpose was solved, using a small bookmarklet.

While working with the code I found the following nice Javascript libraries which are used by the Uber visualization tool.

PapaParse – A Fast and powerful CSV parser that gracefully handles large files and malformed input.

AlaSQL – : A JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and nested JSON data (NoSQL). Export, store, and import data from localStorage, IndexedDB, or Excel.

The Uber scraping bookmarklet uses artoo.js to scrape pages on the client side.

I’ll cover these libraries in some other post, primarily artoo.js, as I’m most interested with client side scraping.

One thought to “Scraping Uber ride history data”

Leave a Reply

Your email address will not be published.