Archive for June, 2010

Tim Lee on Professional/Amateur Writing

Tim is one of the very few people I know of who can product posts like this one: posts which, when I’m finished reading them, seem so perfectly clear, cogent and direct that it’s hard to find a single word I’d care to quibble with.  Not that I agree with him all the time, of course.  But this time I do (actually, I’m probably willing to go somewhat further out on this limb than Tim).

This makes me wonder, though, about the compensation structure of top-tier professional writers.  I have a hard time believing that Charles Murray is feeling the financial pinch of the collapsing media industry, his complaints about Times op-ed rates notwithstanding.  It seems a lot more likely to me that his compensation has shifted away from writing-for-hire and toward various cushy sinecures.  Rich people tend to have friends who’ll help them stay rich, after all.  I have a feeling that the way things went down at the Chicago Tribune is fairly typical.  Or maybe I just took that last season of The Wire a bit too much to heart.

Anyway, rich get richer, proletariat squeezed, dog bites man, Fox News Edge at 11.

generating the Affective Norms for English Words (ANEW) dataset

So!  At work we’ve been spending a couple of days working on off-the-wall projects — it’s a change of pace, a chance to work with folks not on our usual teams, an opportunity to try out new ideas, and a venue for some friendly competition.

One of the projects that my team considered but ultimately discarded was some sentiment analysis on on press statements made by legislators about the BP oil spill.  I figured we could pull some press releases, scan them for their level of aggression (or whatever) and compare the results to the level of oil industry support enjoyed by that legislator (thanks, Transparency Data!).  The result probably wouldn’t have set the world aflame, but if it turned out the way I expected it might’ve made for a fun and topical visualization.

As I said, we didn’t end up pursuing that idea.  But I did get far enough in researching sentiment analysis to realize that I’d like to use the ANEW dataset — a spatial model of various emotionally-charged words that would help me classify arbitrary texts.  Now, Emily says that she thinks sentiment analysis is “kind of bullshit”, and I’m not sure I disagree.  But I think it might still be interesting to run the numbers and see what comes up.

Unfortunately, the folks who created ANEW don’t want to give their data away.  Well, that’s not quite right: they’ll give it away, for free, if you’re a researcher.  A researcher who has a .edu email address.  And who isn’t a student.

This seems a little silly to me.  And it seems really silly when you consider that their widely-available 1999 paper introducing ANEW contains a complete data set.

You can probably see where this is going.

Here’s the data in CSV formatHere’s the code used to generate it. Here’s a paper that shows how to use ANEW.

Given the age of the paper its widespread availability, I can’t imagine there’ll be any objections to transforming its contents slightly into a more useful format. If there are, I’d be happy to hear them in comments — and if any are made by the folks responsible for ANEW, I’ll be happy to remove the link to everything here that contains even a whiff of their copyright.

And of course this is pretty old data.  I’m sure that ANEW’s gotten better in the last decade (this page, for example, refers to ANEW as containing 2000 words; my copy has just over a thousand).  But it’s something to start with.  It’d be great if its creators decided to remove some of the hoops surrounding their list — there are lots of research efforts that exist outside of the .edu TLD.

checking in on Twitter and Iran

I could use some GMaps help

Way back when, I wrote a Google Maps application for DCist that overlaid the DC Metro system on the usual GMaps tiles. People found it useful — me especially, since I think it helped me land a job at EchoDitto.  Its only real innovation was some simple, hacked-up geometry that would horrify a cartographer, but which allowed me to make an attractive map that recalled the more stylized WMATA map.  It wasn’t rocket science, but I still occasionally get emails from developers asking me how I did it (which is slightly bizarre, given that the code is right there for them to see).

In 2007 the GMaps API got an update, and I converted the project into something called a mapplet. I had to rewrite a few things, but it was more or less the same.  The main difference was that mapplets were used through the maps.google.com interface — you could add a bunch at the same time, but you could still use Local Search and permalinking and comments about businesses and other Googly innovations from within the interface.  I didn’t have to implement any of that stuff!  Instead, users could simply have their polished Google Maps experience supplemented by my modest mapplet.  Handy.

Unfortunately, over the last few weeks I’ve started receiving reports that the mapplet’s behaving weirdly.  Load the mapplet, then do a search for something — the station markers will disappear, and sometimes some of the lines that are supposed to connect them will, too.  It looked to me like an event handler had started working differently, so I went to investigate.

Alas!  It turns out that v2 of the API has been deprecated.  They’re on to v3 (not so bad) and they’ve discontinued the mapplet platform entirely (bad)!

I can still make the lines appear on a Google Map.  But I don’t think I can do it on the maps.google.com interface.  This is a drag: I don’t think the thing’s half as useful as a standalone product as it is when it supplements search functionality.  And I really don’t want to reimplement the entire maps.google.com interface (even though, yes, they expose the API for their local search stuff).

So! Developers! Anyone out there dealt with this? I’m not eager to dump a huge amount of time back into this project — a project that’s increasingly unnecessary thanks to Google Transit and the addition of transit stations to the GMaps tileset, but which is still useful when you’re working at a modestly wide zoom level.  But it would be nice to get things working again.