I'm looking at the Mahout implementation NaiveBayes for a classification task,
but the language around the Mahout implementation appears to be
document-centric. Is it possible to use the Mahout implementation of NB for a
classification task that doesn't involve documents?
I have about 80 million records with a small number of features. The arff
header looks like (the numeric features could easily be nominalized if need be):
@RELATION relation
@ATTRIBUTE featurea NUMERIC
@ATTRIBUTE featureb {1,2,3,4,5,6,7}
@ATTRIBUTE featurec {1,2,3,4,5,6,7}
@ATTRIBUTE featured NUMERIC
@ATTRIBUTE featuref NUMERIC
@ATTRIBUTE featuref {0,1}
@ATTRIBUTE target {0,1}