At the high level I think you have these choices and more:
1) Hadoop Streaming, leverage some of your python could, but not all b/c
you have to deal with map/reduce.
2) Use Mahout.
3) Use a distro of R that works with Hadoop
..


On Thu, Apr 24, 2014 at 1:58 PM, qiaoresearcher <qiaoresearc...@gmail.com>wrote:

> I have Hadoop and python installed with nltk. Now I have an large input
> file which has three columns:
> column 1  | column 2 | column 3
> positive         id1          some tweet message
> negative       id2          other tweet message
> positive         id3          tweet message
> negative       id4          tweet message
> positive         id5          tweet message
> ....                    ...                ....
>
> I want to use text mining to construct TFIDF vectors from the tweet
> messages (also use stop words, stem, etc) and then use some classifier to
> classify tweet message as positive or negative. I know how to do it just
> using python and nltk. But how to do the same thing on hadoop?
>
> thanks!
>
>
>

Reply via email to