NB implementation doesnt handle numeric values very well, if you convert your data to boolean feature. You can construct a document out of it and use it on NB
A better way would be to use Weka formatter to convert to vectors and use the SGD classifier in Mahout. You will be pleasantly surprised by its accuracy and speed. Robin On Thu, Jun 2, 2011 at 8:18 PM, Lancaster, Robert (Orbitz) < [email protected]> wrote: > I'm looking at the Mahout implementation NaiveBayes for a classification > task, but the language around the Mahout implementation appears to be > document-centric. Is it possible to use the Mahout implementation of NB for > a classification task that doesn't involve documents? > > I have about 80 million records with a small number of features. The arff > header looks like (the numeric features could easily be nominalized if need > be): > > @RELATION relation > @ATTRIBUTE featurea NUMERIC > @ATTRIBUTE featureb {1,2,3,4,5,6,7} > @ATTRIBUTE featurec {1,2,3,4,5,6,7} > @ATTRIBUTE featured NUMERIC > @ATTRIBUTE featuref NUMERIC > @ATTRIBUTE featuref {0,1} > @ATTRIBUTE target {0,1} >
