Re: Using clustering output for classification

2014-05-06 Thread Ted Dunning
I think Peng is right. It might help to amplify a bit. The idea is that in addition to the other predictor variables that you have, there is also one predictor variable per cluster. Whichever cluster is closest to the training example is turned on. On Wikipedia, the term used is "one hot" encod

Re: Using clustering output for classification

2014-05-06 Thread Angel Luis Scull
I will check it thanks. On 06/05/14 09:32, Ted Dunning wrote: I think Peng is right. It might help to amplify a bit. The idea is that in addition to the other predictor variables that you have, there is also one predictor variable per cluster. Whichever cluster is closest to the training examp

Re: Mahout Naive Bayes CSV Classification

2014-05-06 Thread Jossef Harush
Yes On Mon, May 5, 2014 at 10:51 PM, Andrew Palumbo wrote: > Jossef, > Does your training set have any features with a zero value for all > instances? > > > Date: Mon, 5 May 2014 08:33:37 +0300 > > Subject: RE: Mahout Naive Bayes CSV Classification > > From: josse...@gmail.com > > To: user@maho

RE: Mahout Naive Bayes CSV Classification

2014-05-06 Thread Andrew Palumbo
This would lead to that term not being counted by NaiveBayesModel.numFeatures(). NaiveBayesModel.numFeatures() returns the number of features (terms counts if this were a text classification problem) with a non-zero count across the entire input set. > From: josse...@gmail.com > Date: Tu