hadoop+python+text mining

qiaoresearcher Thu, 24 Apr 2014 10:59:28 -0700

I have Hadoop and python installed with nltk. Now I have an large input
file which has three columns:
column 1  | column 2 | column 3
positive         id1          some tweet message
negative       id2          other tweet message
positive         id3          tweet message
negative       id4          tweet message
positive         id5          tweet message
....                    ...                ....


I want to use text mining to construct TFIDF vectors from the tweet
messages (also use stop words, stem, etc) and then use some classifier to
classify tweet message as positive or negative. I know how to do it just
using python and nltk. But how to do the same thing on hadoop?

thanks!

hadoop+python+text mining

Reply via email to