Using Big Data to Improve Public Transit

Are you sick of hearing the term “big data” yet?

If so, you may be disappointed to hear that big data is here to stay. For years now, credit card companies and banks have been analyzing massive amounts of customer data to predict behaviours like who will be most receptive to offers for new credit cards, or who is most likely to default on a loan. In the past decade, the retail sector has latched on to data analytics, with companies like Amazon analyzing millions of transactions to identify which products complement each other, so it can recommend them to shoppers.

More recently, brick-and-mortar retailers have regained some edge by using Wi-Fi and Bluetooth networks to anonymously track customer behaviours in-store, seeing which sections are visited most frequently. They can even send location-based offers directly to customers’ phones when they enter a certain part of the store.

Seeming to be a bit of a laggard in the data world, the public transit sector is finally catching up, and agencies across the world are adopting techniques of using big data to measure customer behaviours. Quality transit customer data has a multitude of uses, from service planning to communications to new customer acquisition. Though many data collection methods exist for transit, among the top sources today are electronic farecards, Bluetooth Low-Energy (BLE) beacons, and Wi-Fi networks.

Electronic farecard systems, like Toronto’s PRESTO card give transit agencies access to massive amounts of transit user data. Source: Metrolinx

Upgrades to electronic farecard systems (like PRESTO in Toronto) in recent decades have unlocked vast amounts of travel data, revealing origins, destinations, concessions, mode choice and time-of-day. The transit card used for the Metro in Hong Kong, known as the Octopus Card, can even be used to pay for parking, retail, and a variety of other things, allowing for even more knowledge of customer behaviours (and more convenience to the cardholders). In 2016, in an attempt to understand different types of transit users, researchers at MIT conducted a study of the usage of over 60,000 farecards in a 4-week period in the London Tube network.

Their analysis identified 11 unique customer segments based on travel patterns, and when they combined these segments with demographic data, were able to identify unique characteristics for each user group. For example, those who used transit exclusively for Monday-to-Friday commuting were more likely to be older, of higher income, own more vehicles, and live in the outer suburbs. Users who primarily used transit for commuting but also for weekend travel and secondary purposes were more likely to be of lower income, own less vehicles, and live in the inner suburbs.

Other unique segments include those dominated by students, retirees, disabled persons, and many more. Clearly this study is extremely difficult to summarize in a paragraph, but in short, this level of knowledge can significantly improve transit agencies’ understanding of their customers to effectively tailor their communications and the services they offer.

One limitation of farecard data is that it fails to capture occasional users who might pay a cash fare instead of using a farecard. In many situations, such as major sporting events, these groups can form the majority of ridership, so there is a need for data which captures these riders.

In 2015, Google and TriMet launched a pilot project on the transit system in Portland, Oregon where they temporarily installed hundreds of Bluetooth Low-Energy “Beacon” devices at nearly 90 LRT stops across the network. These passive Bluetooth devices, no larger than a deck of cards, could register any phone within range of the device that has Bluetooth turned on.

The network of Beacons could identify where unique devices are entering and exiting the network, as well as when they returned, without collecting any personally-identifiable information. Transit users benefited as well, as the Beacons allowed the delivery of hyper-local information, such as the next train arrivals and service delays to users standing on the platform through the Google Maps app.

A Beacon Bluetooth device

Similar to Beacon devices, Wi-Fi networks can also be used to collect data on customers. When a phone’s Wi-Fi is turned on, Wi-Fi networks can passively collect the unique MAC ID of that device, and all of the other devices passing by (without collecting any personally-identifiable information). The added benefit to Wi-Fi is that these networks are already installed in many places across transit networks, and little additional effort is needed for agencies to begin collecting data.

Just last month, Transport for London completed a 4-week pilot in which it collected anonymous Wi-Fi data from customers in a subset of its Tube stations, in an effort to better understand ridership patterns. Since a significant portion of Tube ridership is from non-locals, using farecard data alone fails to paint the full picture of ridership, so Wi-Fi offers an advantage since it is used by tourists and locals alike. The agency is currently in the analysis phase of the pilot, and hopes to be able to use the data to improve customer communications, service planning, and optimize placement of assets such as advertising.


Holborn Station, London Underground. Pic: Shutterstock

Here in Toronto, countless opportunities exist to start leveraging big data in transit. With PRESTO devices now fully installed across the TTC network and tokens soon being phased out, PRESTO will quickly grow to account for a majority of ridership, bringing with it massive never-before-seen usage data. The TTC and GO Transit have both also invested in Wi-Fi networks in their stations as well, and the TTC has just announced that it will begin expanding Wi-Fi service into subway tunnels in 2018.

With PRESTO now on the TTC, ridership data can be collected from farecard taps. Source: Wikimedia

Imagine being able to know in real-time the approximate number of people on a crowded platform? Or to know the variation in first-time transit users by time-of-day at each station? Or to see how station construction affects passenger flows? The possibilities are endless!

A note on privacy:

It is essential that privacy be a central element of the design of customer data collection initiatives. Agencies should be open and transparent with the public about the purpose of data collection, and ensure customers’ personal information is well-protected. While partnering with a third-party can allow access to more project funding, these organizations can have very different interests in terms of uses for collected data.

In 2015, Boston’s MBTA partnered with third-party advertising provider Intersection to install Beacon devices at a number of its rapid transit stations, primarily for the purpose of advertising and customer data collection. The MBTA took a hands-off, passive approach to the project, and as a result there were significant privacy concerns raised by the public. By comparison, in the Transport for London Wi-Fi pilot unique phone IDs are being de-personalized and encrypted to remove any possibility of linking the data back to an individual, and the TfL website contains full information and FAQs on the project.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s