If you know there are data doesn't belong to any existing category,
put them into the training set and make a new category for them. It
won't help much if instances from this unknown category are all
outliers. In that case, lower the thresholds and tune the parameters
to get a lower error rate. -Xi
If there exists a sample that doesn't not belong to A/B/C, it means
that there exists another class D or Unknown besides A/B/C. You should
have some of these samples in the training set in order to let naive
Bayes learn the priors. -Xiangrui
On Tue, Feb 10, 2015 at 10:44 PM, jatinpreet wrote:
> H
Hi,
I am using MLlib's Naive Baye's classifier to classify textual data. I am
accessing the posterior probabilities through a hack for each class.
Once I have trained the model, I want to remove documents whose confidence
of classification is low. Say for a document, if the highest class
probabi