Word Scoring in R

Ingo Feinerer has created the “tm” package for the express purpose of enabling “text mining” in R. It looks like a good start down this track, and I’m always glad to see just how adaptable R can be to newer methods (specifics in the Oct 2008 “R News”).

As an aside, I am not completely convinced by this technique as it has been applied within political science. I have recently seen it used in scoring party manifestos based on particular word frequencies. Generally, relying on manifestos for revealing any concrete facts about a party’s position is a dangerous assumption (citizens don’t actually read those things and very rarely do they anticipate actual policy outcomes).

Despite my reticence at the way it has been used to this point I do believe it could lead to some very interesting future research. As more and more of the world’s libraries are digitized (and thus made machine readable) our ability to efficiently parse huge volumes of text will become increasingly important.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s