Last night I finally found a moment to begin playing around with the WMATA GTFS data. Well, that’s not quite right: I tried to use the Django ORM to load the dataset when it was first released, but a memory leak killed the process when I left it to run overnight — something that doesn’t bode particularly well for our use of the ORM at work.
Last night I shoved the data it into a barebones MySQL database and added a few convenience columns to make it easier to work with the embedded time and date information. The scripts I used to do this take a little while to run (a few hours, perhaps, on the index-creation step — this could doubtless be sped up by moving the math into the SQL itself). And my use of MySQL is somewhat dumb, as the GIS capabilities of Postgres make it a more obvious choice. But if you’re just looking to get started with this stuff, this will at least save you from having to write the CREATE TABLE statements yourself. It’s all up on GitHub — have at it.
The next step for me is to generate a second-by-second snapshot of where every bus in the city is throughout the day, then feed that into Processing to make a movie like this one. Maybe I’ll even add a soundtrack this time — anybody feel like doing some field recordings of Metrobuses?
Thank you so much for sharing the code, I haven’t tested it yet but I am grabbing it and giving it a go. I’ll test if the SQL is compliant with sqlite also.
Cheers
Thanks for taking an interest, Matteo. If you wind up improving on the code, don’t hesitate to push your changes back to me. I should warn you: I’m a github n00b, so if the repository isn’t open just leave a comment and I’ll try to figure things out.
Hi Tom. I have been poking around with MSSQL first as it provides me with the foreign key constraint checks that sqlite is missing (or am not bothered to compile with). The WMATA dataset (as downloaded 2 days ago) is fine against the following primary key constraints:
ALTER TABLE [agency] ADD CONSTRAINT [pk_agency_id] PRIMARY KEY([agency_id]);
ALTER TABLE [stops] ADD CONSTRAINT [pk_stop_id] PRIMARY KEY([stop_id]);
ALTER TABLE [routes] ADD CONSTRAINT [pk_route_id] PRIMARY KEY([route_id]);
ALTER TABLE [trips] ADD CONSTRAINT [pk_trip_id] PRIMARY KEY([trip_id]);
ALTER TABLE [stop_times] ADD CONSTRAINT [pk_stop_time_id] PRIMARY KEY([trip_id],[stop_sequence]);
ALTER TABLE [calendar] ADD CONSTRAINT [pk_calendar_id] PRIMARY KEY([service_id]);
ALTER TABLE [calendar_dates] ADD CONSTRAINT [pk_calendar_date_id] PRIMARY KEY([service_id]);
ALTER TABLE [fare_attributes] ADD CONSTRAINT [pk_fare_attribute_id] PRIMARY KEY([fare_id]);
ALTER TABLE [fare_rules] ADD CONSTRAINT [pk_fare_rule_id] PRIMARY KEY([fare_id],[route_id]);
ALTER TABLE [frequencies] ADD CONSTRAINT [pk_frequency_id] PRIMARY KEY([trip_id]);
ALTER TABLE [transfers] ADD CONSTRAINT [pk_transfer_id] PRIMARY KEY([from_stop_id],[to_stop_id]);
And foreign key constraints:
ALTER TABLE [routes] WITH CHECK ADD CONSTRAINT [fk_routes_agency_id] FOREIGN KEY([agency_id]) REFERENCES [agency] ([agency_id]);
ALTER TABLE [trips] WITH CHECK ADD CONSTRAINT [fk_trips_route_id] FOREIGN KEY([route_id]) REFERENCES [routes] ([route_id]);
ALTER TABLE [trips] WITH CHECK ADD CONSTRAINT [fk_service_id] FOREIGN KEY([service_id]) REFERENCES [calendar] ([service_id]);
ALTER TABLE [stop_times] WITH CHECK ADD CONSTRAINT [fk_stop_times_trip_id] FOREIGN KEY([trip_id]) REFERENCES [trips] ([trip_id]);
ALTER TABLE [stop_times] WITH CHECK ADD CONSTRAINT [fk_stop_times_stop_id] FOREIGN KEY([stop_id]) REFERENCES [stops] ([stop_id]);
ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_fare_rules_fare_id] FOREIGN KEY([fare_id]) REFERENCES [fare_attributes] ([fare_id]);
ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_fare_rules_route_id] FOREIGN KEY([route_id]) REFERENCES [routes] ([route_id]);
ALTER TABLE [frequencies] WITH CHECK ADD CONSTRAINT [fk_frequencies_trip_id] FOREIGN KEY([trip_id]) REFERENCES [trips] ([trip_id]);
ALTER TABLE [transfers] WITH CHECK ADD CONSTRAINT [fk_transfers_from_stop_id] FOREIGN KEY([from_stop_id]) REFERENCES [stops] ([stop_id]);
ALTER TABLE [transfers] WITH CHECK ADD CONSTRAINT [fk_transfers_to_stop_id] FOREIGN KEY([to_stop_id]) REFERENCES [stops] ([stop_id]);
With respect to the dataset, the only issue i found is with the [trips].[block_id] column, as it *should* be an int (being an id), but in the dataset is a string instead. That’s fine as there’s no actual value in it so I just ignore it during the import.
One remark with the GTFS is that [agency].[agency_id] should be an int (being an id) rather than a string. WMATA seem to comply with this ([agency_id] is 1) but others (including the GTFS dataset example) dont.
That again is not a problem as the [agency] dataset is in principle relatively small, and the [agency_id] could be ignored in the import by defining an autoincrement index on the column.
Another remark with the GTFS is that zones are implemented *poorly*. There is no zones table, and the [zone_id] is referenced from [stops], e.g. each stop is assigned a zone, which is fine, but then there is nowhere to look for furhter information on that zone. There can be no foreign key constraint on [stops].[zone_id] as [zone_id] is not a valid index for [stops]. There should be a [zones] table, and the following constraints (derived from GTFS):
– ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_origin_id] FOREIGN KEY([origin_id]) REFERENCES [stops] ([zone_id]);
– ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_destination_id] FOREIGN KEY([destination_id]) REFERENCES [stops] ([zone_id]);
– ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_contains_id] FOREIGN KEY([contains_id]) REFERENCES [stops] ([zone_id]);
Should be:
ALTER TABLE [stops] WITH CHECK ADD CONSTRAINT [fk_stops_] FOREIGN KEY([zone_id]) REFERENCES [zones] ([zone_id]);
ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_fare_rules_origin_id] FOREIGN KEY([origin_id]) REFERENCES [zones] ([zone_id]);
ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_fare_rules_destination_id] FOREIGN KEY([destination_id]) REFERENCES [zones] ([zone_id]);
ALTER TABLE [fare_rules] WITH CHECK ADD CONSTRAINT [fk_fare_rules_contains_id] FOREIGN KEY([contains_id]) REFERENCES [zones] ([zone_id]);
I can post/email you with the SQL I used to create the tables and constraints and with a backup of a MSSQL 2000 db with the WMATA dataset
if you want. I haven’t developed any routing algorithm so far so I dont know yet on which other tables/columns is worth to build an index on.
Please let me know if I got anything wrong in interpreting the GTFS. I am going now to *port* the database in sqlite, will update you
with the code for that also if you want.
Ciao,
Matteo
A further note on GTFS:
[fare_rules].[route_id] should be required (e.g. not NULLable) as it is being used together with [fare_rules].[fare_id] as a primary key.
This is great, Matteo! Thanks so much for sharing your work. Please do send me whatever you’d like — I’ll do my best to integrate it into the repository. Even better would be adding you to the repo as a maintainer, if you’re interested in collaborating in that manner. It would be great to have SQL creation scripts for a variety of databases, and to revise the MySQL schema to integrate the sorts of constraints (and corrections) that you list above.
good post thanks
[...] GTFS to MySQL Speaking of GTFS data, someone has gone to the trouble of creating MySQL tables from a set of GTFS data files. I ran into some errors, but you may have more time to spare than I (let me know how it goes!). [...]