A New Way to Analyze CaBi Data
A version of this entry was posted at Greater Greater Washington on June 13, 2012.
Bike sharing has emerged in US cities over the last five years to wide praise from bicycle advocates, and transit supporters as well because of its natural ability to complement transit services. Many bike share stations are co-located with transit stops to help address the last-mile problem and to create links where gaps exist in the transit network. Until recently, the data and tools available to really assess travel patterns and alternatives have been elusive to our industry. This was an opportunity for us to start to apply our ideas to leverage OpenTripPlanner as more than a user tool, but begin to use it as a planning tool.
We wanted to look at bikeshare trips and see how those trips would have been taken if someone were to ride transit instead. Are they saving time by using a bike rather than taking transit? Is there a time of day when transit is more competitive than biking because of wait time? We also wanted to see if people used bikeshare for direct trips or rambling leisure trips. Are they taking shortest path trips or something much longer? To get some inference on many of these tasks we played around with data from Capital Bikeshare (affectionately called, CaBi) and ran it through OTP to see how biking compared to transit and walking for the trips that DC region residents are taking.
This post is about the basics of our method so others can think about it too and find new ways to interpret, analyze and visualize the data!
This was largely a project to explore data to see what kinds of questions could be answered. In short, we compared the trip details of bike share trips taken in DC to an equivalent trip planned using transit or walking. By knowing where trips began and ended, we could plug that into OTP to generate the transit/walking alternative.
The CaBi data includes every trip taken on the system since late 2010. It has many fields of useful data, but we were mostly interested in the duration of the bike trip, the time and date it began, and the start and end locations. Because it provided bike station IDs, we matched those with the latitude and longitude from a live feed of station information. The bike share stations in DC are extremely modular and are frequently moved around from one side of an intersection to another or a block away to improve visibility and immediate access to them; this means that there may be slight discrepancies between where trips started in the past and where we project them starting based on today’s feed, but that difference is likely small.
Once we had origin and destination coordinates and the time the trip began, we were able to use the data to plan trips. This is a good place to point out an important caveat in the methodology. The CaBi source dataset only identifies the bike share portion of a trip. A person actually walks to a bike share station, makes a transaction of some sort to retrieve the bike, rides to another station, returns the bike and walks to their ultimate destination. When we use these coordinates to plan transit trips, it will return an itinerary for an entire transit trip which includes walking access time. This would be like having a bikeshare station at your front door and in your office building and then comparing that bike share trip to the time it takes to walk to a Metro station, take Metro and walk to the office. This simplifying assumption presents a challenge for us that we’d love to get some feedback on. How can we synthesize a set of origin-destination coordinates based on the actual station-to-station bike share trips?
Moving ahead with that assumption, we used a python script to request the itinerary for a 1 percent sample of 2012Q1 data (which still is several thousand points). The resultant dataset includes the origin and destination, walk/wait/in-transit times, CaBi rental times, planned bike trip times and distances traveled by each mode. This rich dataset very quickly showed us some interesting information.
Our initial curiosity was whether or not bike share trips are replacing transit trips. To give us some insight into the question, we plotted a comparison between the actual trip time for a CaBi trip on the vertical axis and the predicted transit trip time for that same O-D pair on the horizontal axis. The 45 degree red line is an approximate indifference line. When a trip takes very long by bike but very short by transit (like long haul trips), it would show up above the indifference line; conversely if it takes long by transit but short by bike (such as between two radial transit lines) then the point would fall below the line. If individuals chose their mode solely based on travel time, then they would be indifferent between transit and bike sharing at points along that line where travel times are equal.
There are a few regions of interest in this graphic. For starters, the vast concentration of trips are taken in an area below the curve where a bike-share trip is faster than it would take to ride transit. This might suggest a trend for CaBi riders to use the service when its faster than transit, but we would need more data from non-CaBi riders and non-CaBi/transit trips to say something more concrete. In all of the analyses we’ve done with CaBi data, we keep seeing stragglers who tend to have bikes out for inordinate amounts of time. Those people who take trips that are significantly longer by bike than by transit are probably taking a leisure ride (or are lost). Also notice the triangular white space on the bottom right which shows the limits of bike speeds in beating transit. For example, a bike will not make a 40 minute transit trip in 5 minutes or less. (In a separate analysis we found that the average CaBi speed is just under 8 mph.)
This doesn’t fully answer the question about displaced transit trips, but provides an interesting starting point for this type of analysis using OTP as a tool for transportation planners. We’re looking forward to finding more ways to explore O-D data using OTP. If you have ideas to share, please comment here! If you’re interested in doing this kind of analysis yourself, check out the git for this project and see how else we can use these tools!