I have Hadoop and python installed with nltk. Now I have an large input file which has three columns: column 1 | column 2 | column 3 positive id1 some tweet message negative id2 other tweet message positive id3 tweet message negative id4 tweet message positive id5 tweet message .... ... ....
I want to use text mining to construct TFIDF vectors from the tweet messages (also use stop words, stem, etc) and then use some classifier to classify tweet message as positive or negative. I know how to do it just using python and nltk. But how to do the same thing on hadoop? thanks!