generating the Affective Norms for English Words (ANEW) dataset

So!  At work we’ve been spending a couple of days working on off-the-wall projects — it’s a change of pace, a chance to work with folks not on our usual teams, an opportunity to try out new ideas, and a venue for some friendly competition.

One of the projects that my team considered but ultimately discarded was some sentiment analysis on on press statements made by legislators about the BP oil spill.  I figured we could pull some press releases, scan them for their level of aggression (or whatever) and compare the results to the level of oil industry support enjoyed by that legislator (thanks, Transparency Data!).  The result probably wouldn’t have set the world aflame, but if it turned out the way I expected it might’ve made for a fun and topical visualization.

As I said, we didn’t end up pursuing that idea.  But I did get far enough in researching sentiment analysis to realize that I’d like to use the ANEW dataset — a spatial model of various emotionally-charged words that would help me classify arbitrary texts.  Now, Emily says that she thinks sentiment analysis is “kind of bullshit”, and I’m not sure I disagree.  But I think it might still be interesting to run the numbers and see what comes up.

Unfortunately, the folks who created ANEW don’t want to give their data away.  Well, that’s not quite right: they’ll give it away, for free, if you’re a researcher.  A researcher who has a .edu email address.  And who isn’t a student.

This seems a little silly to me.  And it seems really silly when you consider that their widely-available 1999 paper introducing ANEW contains a complete data set.

You can probably see where this is going.

Here’s the data in CSV formatHere’s the code used to generate it. Here’s a paper that shows how to use ANEW.

Given the age of the paper its widespread availability, I can’t imagine there’ll be any objections to transforming its contents slightly into a more useful format. If there are, I’d be happy to hear them in comments — and if any are made by the folks responsible for ANEW, I’ll be happy to remove the link to everything here that contains even a whiff of their copyright.

And of course this is pretty old data.  I’m sure that ANEW’s gotten better in the last decade (this page, for example, refers to ANEW as containing 2000 words; my copy has just over a thousand).  But it’s something to start with.  It’d be great if its creators decided to remove some of the hoops surrounding their list — there are lots of research efforts that exist outside of the .edu TLD.

10 Responses to “generating the Affective Norms for English Words (ANEW) dataset”

  1. mike d

    This “paywall” has the same problem the Times Select one did – just about anyone who is a college graduate can get a .edu email address along with paying their alumni dues.

  2. Tom

    Well, it’s not quite as bad as that: it looks to me like they manually vet the submissions to make sure they’re from professors or postdocs. I think that makes it worse, though.

  3. Mike

    It is too bad that you are not going to pursuit this project. It seems like it might be an interesting feature to build on top of Managing News where it can collect statements and articles from all sorts of sources, determine the average sentiment across them all, and then show how each article/statement compares to the norm.

    What is up with those column names though? Combining arousal, dominance, and valence sounds like some kinky atomic physics.

  4. Tom

    They’re suggestive names to be sure. But check out the papers — they explain what the dimensions mean. It’s not quite as titillating as it might seem.

  5. Karen

    Wait, so did you get the data you wanted, or not? As a linguist, I have never heard of ANEW, and I am also relatively certain it’s “kind of bullshit.”

  6. Tom

    Well, I got an earlier version of the data. But I think it’s silly to restrict access to it. If you’re going to charge for it, that’s one thing — but I have less patience for demanding credentials.

  7. Adam

    I’ve obtained a googledoc which is a manual for administering the test to generate the valencies in ANEW. It seems to contain the entire list as well only with their SD measures along the 3 dimensions. I will extract the list from that and see if it compares to yours. I hold a doctorate but work in industry so wasn’t able to grab a copy directly. They seem to be keen on not having this list used for any practical purpose.

  8. Adam

    Never mind…that’s the same doc as the article. Anyhow, someone’s put the doc up as a Google Doc so it’s available for download now but thanks for preparing a csv file for it. :D

  9. Nico

    Hi! The csv file is not working on my computer. Do you have doc or exel-file you could send me? Thanks!

  10. Tom

    The CSV is compressed as a tarball, Nico. Any decent archive program should be able to unzip it.

Leave a Reply