Archive for July, 2009

TCP/IP sanctions

It seems like the Iranian cyberactivist movement has marked the entrance of the term DDoS into our culture’s shared vocabulary. There’s a big DDoS attack going on right now, in fact: the Post has coverage (they’re among the targets), and I heard a piece about it this morning on NPR. Governments are among the victims. People are starting to notice that the agencies responsible for dealing with this stuff don’t have the power to do much more than write whitepapers. Governments are going to start doing things about it.

I think this will likely take two forms:

  • Legislated standards for ISPs. In particular, I imagine we’ll see ports for IRC and SMTP, if not all non-http ports, move to a default-closed state. I’m not sure how you’ll opt into opening them, but SMS verification of an online request seems to be increasing in popularity (my bank does this; Google does, too, when you start to use their App Engine service). There’ll also likely be incentives or mandates put in place for things like server-side antivirus scanning of email attachments. The details matter quite a lot — the possibility of a corporate power-grab that constrains citizens’ ability to interact securely is real — but I think such attacks can be fought off, and the result will probably be a good thing on the whole.
  • These ISP standards could conceivably get written into trade agreements, at which point a more politically interesting possibility will arise: the imposition of economic or network sanctions against nations judged to have an out-of-control internet. Is there a damaging DoS attack coming from South Korea? Is that country’s government judged not to have adequate standards in place to fight or prevent it? Cut ‘em off the network, or throttle their traffic. There’s a constituency for this: the WIPO people would be only too happy to have this capability.


    This opens up other interesting possibilities. For one thing, the mutually destructive nature of sanctions would become immediately clear to internet users in a way that isn’t always obvious in the comparatively sluggish realm of trade. And for another, online false flag operations would become a more serious concern. A hacker group could conceivably blackmail a small, cloutless nation under the threat of eliciting an international network crackdown.

It’ll be interesting (and no doubt horrifying) to see deep packet inspection debated on the House floor. I think we’re headed that way, though.

Oh yeah: let me plug this article, which is the more entertaining narrative about a DoS attack that I’ve come across.

more NextBussery

GGW points me toward some pretty discouraging news about how NextBus has treated other transit app developers. They assert their copyright on arrival times; they demand licensing agreements, yet refuse to do so on a small scale; and, perhaps most astoundingly, they say that free iPhone apps count as commercial use of the data because apple sells the iPhone on the strength of the app store’s offerings. Ugh.

On the bright side, the technical aspects of my work on a map-based NextBus iPhone app are proceeding well. At the moment there are no obvious obstacles that need to be overcome. That’s not to say those obstacles don’t exist — they almost certainly do — but a seemingly clear path ahead is about the best you can hope for in a technical project.

Oh, and I had another look at the GTFS/NextBus stop match-up. The news is still bad. I’d hoped that the average distance between stops was being pulled up by a few bad matches, but that doesn’t seem to be the case. In fact there are only about two dozen matched stops closer than 10m from one another. I’ll have another look eventually (maybe NextBus is using a different projection system?), but for now the NB dataset is sufficiently complete that I can rely on it for my work.

some NextBus stats

No word yet from WMATA, but I did end up writing a script to grab NextBus’s routeconfig data (download here). Then I tried to match each NextBus-defined stop with the closest one in the GTFS dataset. Some stats*:

  • NextBus’s dataset tracks 711 unique stop IDs. GTFS has 10,380.
  • Using this function to measure distance, the average space between matched stops is 164 feet. The smallest is 11 feet. The largest is 9/10ths of a kilometer.
  • 218 NextBus stops wound up sharing the same GTFS stop.

All in all, pretty bad — this level of data quality is clearly unusable. My GIS skills are weak; this may be my own stupid fault. I’ll consult with some experts and see what I might be doing wrong. But the basic distance-matching idea is pretty straightforward, so I’m not terrifically optimistic. It’s possible that data quality is just going to really, really stink — to be sure, this is not particularly encouraging. Here’s hoping we can get a proper lookup table out of WMATA or NextBus. Otherwise I don’t see a great alternative to manual intervention.

* These numbers ignore the routes that NextBus tracks but which GTFS does not; those are B99, F99, L99, NH1, P99, REX, S80 and S91 (they appear to be shuttles and the like). I haven’t yet identified the routes that are in GTFS but not tracked by NextBus.

after Twitter

Tim was nice enough to write a tweet endorsing my article about the potential downside of Twitter’s emerging political importance. But he noted that I didn’t say much about what the alternatives are — fair enough! As I said to him in response, I was only too happy to have word limits save me from having to propose a solution. Even though I think the situation is unfortunate, at this point I suspect that there isn’t much to be done about Twitter’s rising political relevance.

But, y’know, time heals all wounds. I am convinced that Twitter’s import as a cultural hub will decline. Twitter won’t go away entirely, mind you — it’s a genuine medium unto itself — but I think its true legacy is likely to be a frankly unbelievable extension, evolution and popularization of the capabilities represented by SMS. Multicast? Common use of symbolic delimiters like @ and #? Widespread institutional adoption? Two years ago, if you’d asked Verizon when SMS would be used this way, they’d have laughed in your face. Now the marketplace is going to demand this functionality — if not from Twitter, then from someone else.

But as I said, I think the conversations happening on Twitter will become less relevant, and the medium less vibrant. Actually, I’m beginning to think that this is an iron law of online mediums. This post (via Megan) helped focus my thinking a lot.

Here’s how it goes. First, a network achieves viability — enough people are using it to send non-”hello world” messages that the community can sustain itself. Next, users experiment, publishing and republishing content that they find compelling. The system amounts to a collaborative filter, and the quality and novelty of the results are surprisingly good. At this point people begin to notice and discuss the potential for the network to have greater relevance — and, inevitably, those who don’t understand that participation in the filtering activity is non-negotiable begin whining about taking the medium seriously when they see so much trivial content on it. Despite this carping, more users join the network and its value and potential importance begin to be more widely understood. At this point users change how they identify content worth publishing or republishing: rather than the first-order “how compelling is this?” they begin using the second-order “how compelling will other people find this?” Although they were excellent and determining what they thought was interesting and appropriate, they’re comparatively terrible at determining what other people will like. Quality declines (“I blogged: del.icio.us links for 2009-07-02″). Worse, as users continue to try to shirk their collaborative filtering responsibilities, experimental uses of the medium are discouraged or otherwise become less viable. The system ossifies, and soon enough everyone is sick of having to check Facebook. Time for a new no-pressure medium for goofing off with your early-adopter friends. Rinse, repeat.

I don’t want to oversell the preceding — I’m pretty sure that Clay Shirky accidentally scribbles more profound sociological observations about the internet during the course of searching for a working ballpoint pen at the bank. But this is my understanding of the situation, and by now I think we have enough data points to conclude that most, if not all online social networks achieve viability, blossom and stagnate (it may still be entirely possible to run a viable business during the stagnation phase, I should point out).

It’ll happen to Twitter, too — it is happening. So let’s start talking now about what conditions we should demand of the next medium-of-the-moment before we start moving our political institutions onto it. My suggestion for a place to start: open, free, and likely to remain so.

UPDATE: This is somewhat related — it’s an example of what I mean by people withdrawing from the collaborative filtering process.

GTFS and NextBus don’t match up

NextBus launched! It’s exciting. I and a lot of other area developers are looking at how to take advantage of the realtime GPS data that NextBus collects to make the DC transit system easier to use. I haven’t gotten too far with it — I’m just poking around — but the early word is that NextBus isn’t going to make this easy.

First: they’re claiming copyright on the location of metrobuses. They serve their data as XML, and each <body> tag looks like this:

1
&lt;body copyright="All data copyright NextBus 2009. Allowed use is for noncommercial purposes only."&gt;

Not great. But okay, whatever — whether this data is copyrightable in a meaningful sense is the first question; whether WMATA willingly sold off their bus locations is the second. But realistically, NextBus can apply whatever license terms on their service that they’d like, and can plausibly enforce them against commercial users. So that’d bring us back to where we are, which is fine.

But just because NextBus says it’s okay to have the data doesn’t mean they’re going to make it easy. Your browser makes a lot of requests to NextBus in order to show a map, of course, but the most interesting ones are to http://wmata.nextbus.com/service/googleMapXMLFeed, a script that performs a number of different operations based on what querystring parameters it’s passed. For instance, http://wmata.nextbus.com/service/googleMapXMLFeed?command=routeConfig&a=wmata&r=64&key=1424073267381 will get you a list of stops and route geometry for the 64 bus.

But if you paste that URL into your browser, you’ll get this:

1
2
3
&lt;Error shouldRetry="false"&gt;<br/>
&nbsp;&nbsp;Feed can only be accessed by NextBus map page. <br/>
&lt;/Error&gt;

Charming. But of course it works in the context of the map. So we need to figure out what the difference is. Here’s a complete conversation between my browser and NextBus. And here’s a specific request that worked for that URL:

1
2
3
4
5
6
7
8
9
10
11
12
http://wmata.nextbus.com/service/googleMapXMLFeed?command=routeConfig&a=wmata&r=64&key=1424073267381<br/><br/>
GET /service/googleMapXMLFeed?command=routeConfig&a=wmata&r=64&key=1424073267381 HTTP/1.1<br/>
Host: wmata.nextbus.com<br/>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.11) Gecko/2009060214 Firefox/3.0.11<br/>
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8<br/>
Accept-Language: en-us,en;q=0.5<br/>
Accept-Encoding: gzip,deflate<br/>
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7<br/>
Keep-Alive: 300<br/>
Connection: keep-alive<br/>
Referer: http://wmata.nextbus.com/googleMap/customGoogleMap.jsp?a=wmata&cssFile=http://www.wmata.com/css/nextbus.css<br/>
Cookie: userProfile_rev1=wmata|A11|A11_A11_0|7067|8166&wmata|64|64_64_0|16797|7073&wmata|64|64_64_0|6454|16793&; __utma=222659997.501906468.1246378684.1246395595.1246496449.3; __utmz=222659997.1246496449.3.3.utmccn=(referral)|utmcsr=wmata.com|utmcct=/rider_tools/nextbus/arrivals.cfm|utmcmd=referral; userID_rev4=5534021; __utma=64696752.769952994.1246496428.1246496428.1246496428.1; __utmb=64696752; __utmc=64696752; __utmz=64696752.1246496428.1.1.utmccn=(organic)|utmcsr=google|utmctr=nextbus+wmata|utmcmd=organic; Coyote-2-407c7b2d=c0a80a66:0; JSESSIONID=53015434E8CF00D3FEC91D4B491FFA08; __utmb=222659997; __utmc=222659997

It actually can’t be that many things. There’s the set of session-maintaining cookies, the HTTP referrer header, and the user agent reported by the browser (or some combination). Spoiler alert: it’s the referrer. So! Borrowing the referring URL from the headers (it checks for more than just the domain), we can use curl -e”http://wmata.nextbus.com/googleMap/customGoogleMap.jsp?a=wmata&cssFile=http://www.wmata.com/css/nextbus.css” “http://wmata.nextbus.com/service/googleMapXMLFeed?command=routeConfig&a=wmata&r=64&key=1424073267381″ and get back this file.

It’s got three things in it: stops; routes, which are ordered lists of stops; and paths, which define the shape of the lines that’ll be drawn to represent the routes (because roads have twists and turns, connecting stops with straight lines is insufficient).

This is pretty much how the GTFS dataset is organized. So you might expect to be able to match up the stops from GTFS to NextBus. Well, here’s a stop from NextBus:

1
&lt;stop tag="5880" title="11th St Nw + H St Nw" dirTag="null" lat="38.90032" lon="-77.02657" stopId="1001159"/&gt;

Here’s the same stop from a fresh download of the WMATA GTFS dataset:

1
7591 / NW 11TH ST &amp; NW H ST / 38.899812 / -77.027053

It’s a drag. The IDs don’t match. The human-readable stop names don’t match. Even the latitude and longitude don’t match. I mean, sure, they’re close. But making these line up is going to be a sloppy, relatively expensive calculation. Now, true, it probably only has to happen once. It’s not that hard of a problem. It’s just that it’s so unnecessary. Ah well. I’ve got an email in to Metro; we’ll see if they can provide a cleaner solution than the one I’m thinking about scripting up.

UPDATE: Some thoughts on NextBus’s javascript interface from 2006. A few things have changed since then, but from my initial investigation I’d say that not too much is different — just URLs, mostly.

An alternative to messing with the protected javascript interface is to scrape the HTML for the “next arrival time” pages. But a) this only gets you distance-expressed-as-time, which is lousy data compared to actual lat/lon coordinates and b) it’s still messy, as “apparently NextBus hired a live bear to write their markup” (the author of that post does have his scraping code up on GitHub, though).