I think Peng is right. It might help to amplify a bit.
The idea is that in addition to the other predictor variables that you
have, there is also one predictor variable per cluster. Whichever cluster
is closest to the training example is turned on.
On Wikipedia, the term used is "one hot" encod
I will check it thanks.
On 06/05/14 09:32, Ted Dunning wrote:
I think Peng is right. It might help to amplify a bit.
The idea is that in addition to the other predictor variables that you
have, there is also one predictor variable per cluster. Whichever cluster
is closest to the training examp
Yes
On Mon, May 5, 2014 at 10:51 PM, Andrew Palumbo wrote:
> Jossef,
> Does your training set have any features with a zero value for all
> instances?
>
> > Date: Mon, 5 May 2014 08:33:37 +0300
> > Subject: RE: Mahout Naive Bayes CSV Classification
> > From: josse...@gmail.com
> > To: user@maho
This would lead to that term not being counted by
NaiveBayesModel.numFeatures(). NaiveBayesModel.numFeatures() returns the
number of features (terms counts if this were a text classification problem)
with a non-zero count across the entire input set.
> From: josse...@gmail.com
> Date: Tu