Re: Probabilities in Bayesian classifier

Svetlomir Kasabov Wed, 15 Jun 2011 09:57:57 -0700

Hello Steven,

I've asked this question too:


http://mail-archives.apache.org/mod_mbox/mahout-user/201105.mbox/%3cbanlktinyohrcynt0xzrpoqqg3zkepvk...@mail.gmail.com%3E

unfortunately, Mahout's Naive Bayes implemention can't calculateprobabilities. You are now probably really astonished - I could'ntbelieve it too, as I read that (I think this is some kind of 'strange',since Bayes's main concept is probability calculation). It's a pitty,that such a great framework like Mahout has restricted the Bayesianconcept that way. In addition, Naive Bayes is (as far as I know) onlytext-oriented, you can apply it only on documents . Mahout is stillwonderful, though, because it lets us calculate probabilities usingLogistic Regression.

That's why I switched to using Mahout's Logistic Regressionimplementation: using OnlineLogisticRegression.java#classifyScalar()returns a probability. Logistic Regression has also the advantage, thatit can handle continous values directly, while in Bayes' Clasifier youshould categorize data first.

You can try the class TrainLogisticTest.java from the mahout-examples inorder to see how it works. See also the calculation of probability inTrainLogistic.java:


double p = lr.classifyScalar(input);






Am 15.06.2011 16:51, schrieb Steven Raemaekers:

Hello,

Currently I'm working on a classifier to classify documents written in 
different programming languages in the correct category. I made a test and a 
training set, and I get a confusion table as a result. This is nice, but the 
program does not supply any probabilities/uncertainties that a particular file 
belongs to a certain category, it only returns whether or not a single file 
belongs to a category or not. Because it is a Bayesian algorithm, probabilities 
must be involved somehow.

What I would like to have is for a single input file the chance/probability of 
that file belonging to each category, for instance like this:

C: 25%
C++: 50%
Java: 25%

The classifyDocument method in the class BayesAlgorithm does return numbers, 
but these are not really probabilities since they do not add up to 1.
Looking in the javadoc it says that these numbers are dot products between the 
vector of this document and the training set.

So my question is, is it possible to convert the numbers as stored in 
ClassifierResult and calculated in BayesAlgorithm.classifyDocument to some kind 
of probability?

Regards,

Steven

--
Software Improvement Group
www.sig.eu

We would like to invite you to complete our survey on the Awareness of Green 
Software.
It will take you less than 10 minutes.
Link to survey: http://bit.ly/kfWGZM

Re: Probabilities in Bayesian classifier

Reply via email to