for classifying twitter messages.

Lucene has support for ngrams, stopwords, porter stemmer, snowball stemmer, 
language specific analyzers etc...
Mahout uses Lucene for vectorization (part of Mahout's seq2sparse process).  

On Thursday, January 16, 2014 10:57 PM, qiaoresearcher 
<> wrote:
Mahout has an example of using naive bayes to classify 20 news group. but
how to just classify paragraphs  (e.g. twitter message, movie review) in
text files such as:

Text files has content like:
text paragraph 1                     class a
text paragraph 2                     class b
text paragraph 3                     class a
text paragraph 4                     class b
.............                                      ...

does it support n grams, stem, stop words, etc?

thanks for any suggestions.

Reply via email to