subject:"Unknown sample in Naive Baye's"

Re: Unknown sample in Naive Baye's

2015-02-19 Thread Xiangrui Meng

If you know there are data doesn't belong to any existing category, put them into the training set and make a new category for them. It won't help much if instances from this unknown category are all outliers. In that case, lower the thresholds and tune the parameters to get a lower error rate. -Xi

Re: Unknown sample in Naive Baye's

2015-02-17 Thread Xiangrui Meng

If there exists a sample that doesn't not belong to A/B/C, it means that there exists another class D or Unknown besides A/B/C. You should have some of these samples in the training set in order to let naive Bayes learn the priors. -Xiangrui On Tue, Feb 10, 2015 at 10:44 PM, jatinpreet wrote: > H

Unknown sample in Naive Baye's

2015-02-10 Thread jatinpreet

Hi, I am using MLlib's Naive Baye's classifier to classify textual data. I am accessing the posterior probabilities through a hack for each class. Once I have trained the model, I want to remove documents whose confidence of classification is low. Say for a document, if the highest class probabi