Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Olivier Grisel
Another possible strategy: Add a new class named "random garbage" to your training set with random text collected from wikipedia or social networks messages, or both. -- Olivier -- Slashdot TV. Video for Nerds. Stuff

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Mohamed-Rafik Bouguelia
Karimkhan, Two possible naive methods that you can directly use with sklearn are: (1) use predict_proba and check if the probability of belonging to the most probable class (p1) is less than a threshold. Or you can use the entropy over the probability distribution instead of p1. However, an insta

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Lars Buitinck
2014-09-04 15:45 GMT+02:00 Karimkhan Pathan : > Oh okay, well I tried with predict_proba. But if query is out of domain then > classifier uniformly divide probability to all learned domains. Like in case > of 4 domains (0.333123570669, 0.333073654046, 0.166936800591, > 0.166865974694) Naive Bayes

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Karimkhan Pathan
Oh okay, well I tried with predict_proba. But if query is out of domain then classifier uniformly divide probability to all learned domains. Like in case of 4 domains (0.333123570669, 0.333073654046, 0.166936800591, 0.166865974694) On Thu, Sep 4, 2014 at 7:00 PM, Gael Varoquaux < gael.varoqu...@n

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Gael Varoquaux
On Thu, Sep 04, 2014 at 05:22:02PM +0530, Karimkhan Pathan wrote: > Well could you please throw light on my classification issue? I guess > you might be knowing well whether something helpful class/method exists > in scikit which can solve this issue.  I don't know. I would naively try to do a pre

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Karimkhan Pathan
Hey Gaël, Happy to see you on this thread. Actually today only I was listening to your scikit Ipython notebook tutorial. Well could you please throw light on my classification issue? I guess you might be knowing well whether something helpful class/method exists in scikit which can solve this iss

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Gael Varoquaux
On Thu, Sep 04, 2014 at 11:01:44AM +0200, Mohamed-Rafik Bouguelia wrote: > An example of this is the paper that can be found here: http://www.loria.fr/ > ~mbouguel/papers/BougueliaICPR.pdf > Mohamed-Rafik Bouguelia, Yoland Belaid and Abdel Belaid. Efficient active > novel > class detection for dat

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Mohamed-Rafik Bouguelia
Hi Patrick, Juste for information, there is some existing techniques to detect test instances whose class is not provided for training. Instead of letting the classifier put those instances in the closest match it can (the most probable known class), we detect that they belongs to a novel class wh

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Karimkhan Pathan
Hi Patrick, Yeah, you might be correct. But when I input testing query 'what is where' with all stopwords, and having filter for stopwords still it classifies it as : 'what is what' => films => 0. 'what is what' => laptops => 0. 'what is what' => medicine => 0.1667

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Karimkhan Pathan
Dear Sebastian, Thanks for reply. My actual alpha value was 0.1, I changed it to 0 and tested the code. But it behave similarly. On Wed, Sep 3, 2014 at 8:01 PM, Sebastian Raschka wrote: > This is due to the Laplace smoothening. If I understand correctly, you > want the classification to fail if

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Patrick Short
Hi Karimkhan, If I am understanding your question correctly, you are asking to classify test data in a class that is not specified in your training set. For instance if you have three classes of news article specified in your training data (e.g. politics, sports, and food) and you try to classify

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Sebastian Raschka
This is due to the Laplace smoothening. If I understand correctly, you want the classification to fail if there is a new feature value (e.g., a word that is not in the vocabulary when you are doing text classification)? You can set the alpha parameter to 0 (see http://scikit-learn.org/stable/mo

[Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Karimkhan Pathan
I have trained my classifier using 20 domain datasets using MultinomialNB. And it is working fine for these 20 domains. Issue is, if I make query which contains text which does not belongs to any of these 20 domain, even it gives classification result. Is it possible that if query does not belon