Ingo Feinerer has created the “tm” package for the express purpose of enabling “text mining” in R. It looks like a good start down this track, and I’m always glad to see just how adaptable R can be to newer methods (specifics in the Oct 2008 “R News”).
As an aside, I am not completely convinced by this technique as it has been applied within political science. I have recently seen it used in scoring party manifestos based on particular word frequencies. Generally, relying on manifestos for revealing any concrete facts about a party’s position is a dangerous assumption (citizens don’t actually read those things and very rarely do they anticipate actual policy outcomes).
Despite my reticence at the way it has been used to this point I do believe it could lead to some very interesting future research. As more and more of the world’s libraries are digitized (and thus made machine readable) our ability to efficiently parse huge volumes of text will become increasingly important.