Let’s talk about Metro’s recent decision to eschew Google Transit. This is a topic about which I have a small amount of inside knowledge, actually. At the end of August I wrote to WMATA’s CTO, Suzanne Peck. I talked about some of the projects I’d done with WMATA data and expressed my affection for the agency. I explained that I was frustrated by the limits of the data available on wmata.com, and I requested a meeting. I received a response almost immediately (even though it was 11pm or so — a good sign in a CTO!). She thanked me for my interest, complimented me on my projects and directed me to Victor Grimes, her deputy, whom she instructed to meet with me.
I met with Mr. Grimes a week or two later. He was cordial but not too interested in what I had to say about opening up WMATA’s data: according to him, the situation was well in hand. WMATA was already on its way to integrating with Google Transit. There were just a few lingering problems.
To discuss those properly, we first need to talk briefly about Google Transit. The heart of Google’s system is their GTFS format — this stands for Google Transit Feed Specification, and it’s quickly become an industry standard. It’s not too complicated: basically a GTFS feed consists of a bunch of comma-delimited files with names like “routes.txt” and “stops.txt”, which all get zipped up into one file and placed on the transit agency’s website. You could open these up in Excel if you wanted. Google sucks this data down once a week or so, processes it and displays it within their own systems — but a transit agency doesn’t have to worry about all that. They just need to make sure they get their GTFS file right, and that Google knows where to find it. Many other systems have adopted GTFS, and a lot of them make the data available to the public, not just to Google.
The frequency with which Google picks up the GTFS data was an issue, according to Mr. Grimes. Google wanted to do it about once a week; that wouldn’t cut it for Metro, he said. WMATA updates its data daily with bus detours, route changes and the like. It sounded like this had been worked out, though, and Google would simply arrange to pull the data every day.
The second objection involved the display of fare information. Google Transit couldn’t manage this, apparently. The format does make allowances for transferring fare information, but for whatever reason it wasn’t up to the task of handling WMATA fares. That was fine, though; users could just be directed to the RideGuide website if they wanted to know fare information.
The final hurdle was bureaucratic. The various jurisdictions that make up Metro had to sign off on the release of the (already public) data. Last I heard, Maryland was the holdup. But Mr. Grimes thought this would be resolved soon. The GTFS dataset and Google Transit functionality would be released around September 23, he said.
Well, that didn’t happen. I gave it a week, then wrote to ask what had happened. Here’s the last I’ve heard, received on October 10:
Anyway, the Google/WMATA transit integration project has come to a stand still for reasons I can’t explain, but I expect things to get back on track in the near future. We are having a meeting to discuss our participation in the Google transit initiative next week and I’ll let you know how that turns out.
Presumably that meeting didn’t go very well. We Love DC spots WMATA’s explanation — it comes in the form of a FAQ page put online yesterday, seemingly in effort to control the fallout surrounding their decision to ditch Google. Here’s the meat of it:
- Google could not guarantee accurate and up-to-date Metro trip information and could not provide Metro fare information. Metro’s own Web site provides near real-time travel and fare information to viewers.
- Many transit providers in the region are not part of Google Transit so Google’s Web site viewers could not get a complete and accurate picture of their transit travel options in the Washington Metropolitan area when they use Google Transit. Metro has partnered with all the local transit providers in the Washington region to provide up-to-date and accurate information for Metro’s Trip Planner. The Trip Planner factors in other area bus and rail providers, such as Fairfax Connector, DASH, ART and Ride On, and their schedules and fare information when giving customers travel options.
- Metro and Google have not yet come to acceptable terms regarding licensing agreements, which put Metro at a greater legal liability. Google is a for-profit company while Metro is a taxpayer subsidized public agency. Google wanted Metro’s transit data at no cost and wanted the transit agency to open itself up to greater legal liability. Given financial constraints, Metro officials are exploring whether there is a way for the transit agency to generate revenue in such a partnership.
I think we can safely dismiss the second point — hearing about the ART bus might be nice, but it’s hardly a sufficient reason for scrapping this project. Based on my conversation with Mr. Grimes, I also think point one is probably a red herring. Those limitations were known in August, and they weren’t considered show-stoppers then. I suppose Google may have ultimately decided that they couldn’t support a daily GTFS update, but this seems very unlikely to me — the Metro system is big, this technical problem is small, and enough people would find this integration useful that I’m sure they’d find a way to support it. I suppose point one could also be interpreted to mean that WMATA was insisting on some sort of service level/data accuracy guarantee that Google couldn’t or wouldn’t provide, but insisting on such a point seems a bit unreasonable.
So that leaves us with point three, which basically boils down to: Google needs to cough up some dough. I’m actually a bit more sympathetic to this idea than you might expect. “Google Transit Feed Specification” sure sounds like a proprietary format. It’s easy to imagine a bureaucrat with sign-off powers seeing that and saying, “Wait a minute. We’re doing all this work for a private enterprise and they’re not even paying us for it?” Google is an awfully helpful company, but presumably it’s offering transit information because it makes them money (or at least supports their brand). WMATA’s in perpetual need of cash; there’s no reason it should be giving Google freebies.
But this misses the larger point. GTFS may be unfortunately eponymous, but it’s an open format. Exposing schedule data in useful ways should be part of WMATA’s mandate — the current system of bulky PDFs used for bus schedules is downright inexcusable. I don’t particularly care whether WMATA lets private firms like Google use its data for commercial ends, but it should certainly grant noncommercial rights to the public.
THIS JUST IN: More from Mr. Grimes:
Yes, there is still an issue between WMATA and Google regarding the licensing agreement. However, we are in continuing discussions with Google and looking at other options for making the data available in GTFS format not only to Google but to others who have requested the information. You are correct in that much consideration has been given to this effort and it is not over. I will let you know as soon as a decision has been made regarding the direction WMATA will take to provide the transit data in GTFS format.
Also, via DCist the Examiner confirms that this is really about money. Well, good. WMATA should put the GTFS dataset online under a Creative Commons noncommercial license, and Google should cough up the $68k of online ad revenue that WMATA’s afraid of losing. Lord knows they’ve got the money.
FURTHER: One of Ezra’s commenters points out that Google Transit isn’t actually all that great. I’m not that familiar with it, but I’m not too surprised — it’s a hard problem. But even if Google can’t do it, someone else can.
Also: it’s worth mentioning that the real payoff here is for buses. Figuring out how to ride the train is dead simple; the bus, not so much. Right now it’s easy for people to read the PDF bus schedules, but hard for them to figure out what the schedule means (or which schedule is the correct one); these difficulties are reversed for computers. If WMATA releases the GTFS dataset, riding the bus could become much easier for a lot of people.
Didn’t realize just now that “Tom” over at Ezra’s was you.
I really just have a hard time seeing why Google should need to pay anything here. If they do, maybe they should also ask WMATA to pay a couple bucks for every new customer they refer…
goddamn Tommy you are going to make the future a righteous place.
Jack: that’s an understandable perspective, but I think it suffers from a confusion between WMATA and a private enterprise. Metro doesn’t benefit from a marginal user the way that a private firm might — in some ways, additional riders represent headaches for them, in that fares are to a large extent subsidized by local governments. Adding a rider doesn’t necessarily add money to the bottom line, in other words.
Now of course WMATA understands why it exists, and it generally wants to encourage ridership. But the relationship between Google and WMATA is not as inherently symbiotic as you might imagine. If additional riders meant additional money then I agree, this would be a no-brainer. As it stands, though, WMATA’s actions could arguably be called small-minded or provincial, but they’re pretty understandable — and in my opinion, justifiable.
Wait. We are talking about static schedule data here? Like as in a published schedule? They are having trouble releasing that?
I could understand being wary about releasing up to the minute bus data, like http://www.nextmuni.com has, but not a simple printed schedule.
Well… yes and no. It’s static in that it’s not real-time, but it’s frequently amended. So it’s not as simple as, say, doing a one-time conversion of the PDF bus timetables. It needs to be able to be updated.
But you’re right, it’s not rocket science, and it really ought to be exposed in a more accessible manner.
etro doesn’t benefit from a marginal user the way that a private firm might
Not to be contrary, but how so exactly?
The marginal cost of an additional rider is in most cases practically zero. Most of WMATA’s costs are capital and fixed (per vehicle) operating costs – as long as the new users don’t necessitate adding more vehicles at peak times, it’s basically pure gravy. And the kind of riders Google Transit or the like would be helping with are mostly off-peak – commuters don’t need to look up routes much.
Plus, higher rider numbers (and lower average cost numbers) look a lot better when it’s time to go ask the govt. for more money.
Fair enough. I think everyone agrees that Metro’s general interests are aligned with increasing ridership. My point was just that new riders don’t necessarily pay for themselves — at least not immediately. Metro’s top priority is worrying about how to keep things running, not worrying about how to get more customers.
“Right now it’s easy for people to read the PDF bus schedules, but hard for them to figure out what the schedule means”
Well, there’s also the issue that the actual arrival times of the actual buses bear little resemblance to what’s written on the schedule, so even if people do figure out the schedule, it doesn’t matter.
Kevin: true, although I think this is often overstated. This is a bit external to this discussion, though — sticking to a schedule will be tough for any urban bus system.
FWIW, WMATA has taken steps to address this by looking at ways to reduce bus “bunching”, and of course with the NextBus pilot program (which has been temporarily suspended due to the architecture not being sustainable, but which will — hopefully — be reintroduced over the next few years, according to what WMATA told me).
My point was just that new riders don’t necessarily pay for themselves — at least not immediately.
I’m still not seeing how the fare from an extra rider or ten goes anywhere but straight to the bottom line, especially on an off-peak bus or train.
I’m nothing more than an amateur transit fan, so I really can’t claim to be an expert. If I’m wrong here, I’m genuinely curious what I’m missing.
What’s up with the slashes in the middle of all the names?
The slashes are to avoid tieing this entry to those names in Google. The entry has attracted enough inbound links that it’s likely that the folks I’ve been speaking to at WMATA are aware of it, but my initial intention was to avoid attracting their attention to this entry so that they don’t censor their future email interactions with me. Not that I think they have anything to fear, mind you: I’ve been impressed by WMATA’s openness and professionalism throughout this process.
Jack: I have to admit that I’m no expert in the larger economics of the Metro system, either — I think I can speak to their online operations, but the accounting is over my head. However, if you check out this post I think it may at least make *some* more sense to you. On average, Metro fares pay 77% of the cost of each rider’s trip. So rides are subsidized and adding a new rider costs Metro a little more money.
But of course this is an abstraction. As you point out, many of the costs are fixed. If you add a rider at rush hour to a crowded train that’s already running, their fare may be gravy to Metro — free money! On the other hand, if that rider is the one who makes the train so crowded that Metro decides another one has to be run, then the cost of their ride is going to put Metro hundreds of dollars in the hole.
But of course it makes no sense to try to figure out who that tipping-point rider is; it’s impossible. You have to just average things out; when you do, it turns out that new riders, on average, cost Metro money rather than making them money. That’s fine, and doesn’t mean that ridership shouldn’t be encouraged, but it does give WMATA a different perspective on how to best serve its mission.
Jack: I have to admit that I’m no expert in the larger economics of the Metro system, either — I think I can speak to their online operations, but the accounting is over my head. However, if you check out this post I think it may at least make *some* more sense to you. On average, Metro fares pay 77% of the cost of each rider’s trip. So rides are subsidized and adding a new rider costs Metro a little more money.
But of course this is an abstraction. As you point out, many of the costs are fixed. If you add a rider at rush hour to a crowded train that’s already running, their fare may be gravy to Metro — free money! On the other hand, if that rider is the one who makes the train so crowded that Metro decides another one has to be run, then the cost of their ride is going to put Metro hundreds of dollars in the hole.
Ah. That clears it up.
I think you’re misinterpreting the 77% figure:
Say Metro runs 10 buses a day. Each bus costs $10 to operate. The fare is $1, and each bus gets an average of 7.7 passengers.
That’s a 77% cost recovery rate: daily costs are $100, daily revenue is $77, and the bottom line is a daily subsidy of $23.
Now suppose we add a passenger to each bus. It doesn’t cost us more. On the contrary: now we’ve got 87 passengers a day, our bottom line went up by $10 (reduced subsidy), and the fare recovery and average cost figures improved as well. Gravy.
It’s true that tricky things happen if you add riders right at peak times and need more trains or buses – but this is a corner case, and I think something like Google Transit is going to mostly help off-peak riders (people riding in the daytime or the evening when there’s lots of excess capacity).
First, I wouldn’t underestimate point #2 from the WMATA’s explanation. Providing only a partial snapshot of the overall transit picture is one of my biggest complaints about Google transit. It significantly decreases its usefulness as a planning tool.
Second, technical merits aside, why should everyone use’s Google’s format? Why not put together a standards committee under the auspices of one of the international standards organizations so that you’re not held hostage by the whims of a private company?
Third, I think Google is failing to realize the uniqueness of this relationship dynamic. Normally when Google steps into the room everyone does what they say because, hey, theyâre Google, theyâre cool, they make good products, but more importantly because theyâre the 800 pound gorilla in the room. Government agencies, however, regardless of their size are not used to being told what to do â theyâre used to telling others what to do.
The WMATA is obviously not privatized or lives in a pre- or post-internet universe? I reckon that restaurants should prohibit guides writing about them – at the very least they should prohibit guides to print their address or opening hours. I mean – why on earth does Sony allow Amazon to sell their products when their own web site has so much more information?
Google’s core competency is finding or at least searching for information. The WMATA is more about providing public transport? Sounds like a perfect match for everybody: the metro – google – and us all?
People living in DC should protest the WMATA’s decision?
Peter: sure, more integration is desirable. But these are small systems compared to WMATA. Let’s not let the perfect be the enemy of the good.
The same applies to the proposal to use an alternative to GTFS. GTFS has gained marketshare because it was the first format proposed and adopted. That’s not always a great reason to use a format, but in this case Google seems to have done a pretty good job. GTFS is well-documented, open, and the associated tools are released under the Apache license. If it has shortcomings it may be appropriate to work on an alternate format, but right now you’d have to present a case for why the advantages of doing so would outweigh the costs in time and fragmentation of a currently unified standard. I think that’s a hard case to make (at the moment, anyway).
WMATA should put the GTFS dataset online under a Creative Commons noncommercial license,
I agree wholeheartedly, except that the Creative Commons license would be redundant and unenforceable. Information about what time the buses run where is uncopyrightable. It’s just a bunch of facts, and facts are never subject to copyright. That data is born in the public domain, and it stays there. Anyone can use it, for any purpose. And that’s exactly how it should be.
Facts aren’t copyrightable, but databases are. It’s a bit of a strange distinction, and I won’t claim to be able to be able to answer questions on the finer point of this arrangement, but that’s how things work in this lovely intellectual property system of ours. Other systems — BART for example — publish their data but make it subject to license terms. Generally these are just things that you’re already legally obligated to do (like not using their logo), but it’s helpful to have their policies all in one place.
If nothing else, a CC license would make WMATA’s intentions clear. Whether or not the noncommercial clause is enforceable, I couldn’t say — I think the CC license hasn’t undergone that many test cases overall, much less ones where public data is specifically involved.
this whole ‘google must pay’ thing is a bit odd to me. they’re offering a public service for….the public, and they’re doing it for free. it is up to us, the public, to take our data and put it into gtfs – that is, force our publicly-accountable institutions, namely WMATA, to do it. we’re paying for that mess – they need to do, plain and simple.
for the record, i’ve used google transit in at least two towns, and it’s fantastic, and would never use anything else. google transit far surpasses any and every system i’ve used from any other public agency at any time in my life, in any country i’ve lived. it’s not perfect, but let’s be honest, they don’t have much competition when it comes to online routing.
:)
I agree with you on the signalling value of a CC license. Even if no permission is needed, CC licenses have come to be associated with a spirit of generosity and collaboration.
And now back to the copyright question. The rule is that while facts aren’t copyrightable, their “selection, coordination, and arrangement” may be. The choice of database structure (”arrangement”) and the choice of which facts to put in into it (”selection”) can be copyrighted, but ONLY if those choices are original. The Supreme Court held, in the famous Feist case, that putting a phone book in alphabetical order isn’t original, and neither is choosing to include everyone with phone service from a particular company. Both of those choices are unoriginal.
The same is true, here. Perhaps WMATA’s internal databases contain lots more information, surprising fields, and editorial comments. But the files they’d be exposing to the world as part of the Google Transit program are in a standardized format (which therefore isn’t original to WMATA) and contain a standardized set of data (times and destinations of trains and buses operated by WMATA). Thus, while some databases can potentially support copyright claims in some settings, the files at issue here couldn’t.
Interesting. Thanks for the additional context, James. Count me convinced.
About the slashies Tom: you’ve got coverage at Techdirt, so it’s kinda hard to not let this hit the masses at this point, google or not. I agree that not having it do so hopefully prevents press from putting an idiotic spin on things.
I do find it interesting that the issue is asking for money here.
Tom, as much as I get the idea that sometimes it could cost more money to have riders you seem to acknowledge yourself that more riders is the end goal here. Wouldn’t that sometimes figure be quite overridden in concept of that?
Isn’t it a bit short term to say the equivalent to “well, if we add x riders and not Y (Greater than X), it costs us money, but if we add Z riders (greater than y) and not Y, it costs us less money” and basically refute that?
If the mass transit system is intended to work (and through subsidies,etc) and ends up getting enough to require additional transport, isn’t that a good thing?
You know, more investment, more business, more jobs, more tax for the gov’t etc?
There just seems to be missing logic in merely applying a blanket statement that basically is “We can’t handle more riders right now, so we shouldn’t” (end of negotiations).
Thanks, Matt — I’ve remove the slashes, acknowledging what’s been obvious for a few days now.
As for Metro’s motivation w/r/t getting more riders — perhaps I should’ve tried to state this in terms of the speed of the decision, rather than its ultimate outcome. I agree that ultimately WMATA should find a way to integrate with Google, because that’s what’s in the public’s best interest. But in the short term there’s nothing wrong with the people running this project trying to make sure that it’s successful within the terms of the project, rather than in terms of Metro’s overall mission. That may seem a bit provincial, but it’s an inevitable (and I would argue necessary) feature of how projects get done within bureaucracies. If Google Transit still hasn’t come to DC in 6 months, I’ll be much more willing to raise a fuss.
To chime in on the Metro economics motivation issue (even though I do see you think it’s a matter of time)
There’s a general benefit to using Google Transit for tourist income.
I’m much more likely to visit on a weekend trip, or add a city to a long travel tour if I know I can get around. Yes, I know it won’t be perfect, but having the easy ability to figure out if the place I’m thinking of staying is reasonable to get to w/o knowing city streets? Fanntastic.
Cheers,
Sarah
[2 cents from a Mainer who found your entry while trying to figure out if she could get Portland's Metro on Google Transit without the Transit Authority, because there isn't enough drive for them to do it themselves yet.]