Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Karimkhan Pathan
Hi Patrick, Yeah, you might be correct. But when I input testing query 'what is where' with all stopwords, and having filter for stopwords still it classifies it as : 'what is what' => films => 0. 'what is what' => laptops => 0. 'what is what' => medicine => 0.1667

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Karimkhan Pathan
Dear Sebastian, Thanks for reply. My actual alpha value was 0.1, I changed it to 0 and tested the code. But it behave similarly. On Wed, Sep 3, 2014 at 8:01 PM, Sebastian Raschka wrote: > This is due to the Laplace smoothening. If I understand correctly, you > want the classification to fail if

Re: [Scikit-learn-general] Dirichlet priors on multinomial Bayes?

2014-09-03 Thread Josh Wasserstein
It's been a bit over a week, so I thought I would try again. I can probably identify the type of prior by looking at the code. Where exactly is this prior applied? Josh On Wed, Aug 27, 2014 at 11:47 PM, Josh Wasserstein wrote: > What prior does scikit-learn use for MultinomialNB? The document

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Patrick Short
Hi Karimkhan, If I am understanding your question correctly, you are asking to classify test data in a class that is not specified in your training set. For instance if you have three classes of news article specified in your training data (e.g. politics, sports, and food) and you try to classify

Re: [Scikit-learn-general] precomputed distance matrix for clustering

2014-09-03 Thread Amita Misra
n documents clustering using a precomputed similarity metric between a pair of documents. Code so Far Sim=np.zeros((n, n)) # create a numpy arrary i=0 j=0 for i in range(0,n): for j in range(i,n): if i==j: Sim[i][j]=1 else: Sim[i][j]=simfunction(list_doc[i],list_doc[j]

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Sebastian Raschka
This is due to the Laplace smoothening. If I understand correctly, you want the classification to fail if there is a new feature value (e.g., a word that is not in the vocabulary when you are doing text classification)? You can set the alpha parameter to 0 (see http://scikit-learn.org/stable/mo

[Scikit-learn-general] probabilistic values from KNeighborsClassifier

2014-09-03 Thread Sheila the angel
I am using KNeighborsClassifier and trying to obtain probabilistic output. But for many of the test sets I am getting equal probability for all class. >>>X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=0) >>>clf = KNeighbors

[Scikit-learn-general] scikit learn classification issue

2014-09-03 Thread Karimkhan Pathan
I have trained my classifier using 20 domain datasets using MultinomialNB. And it is working fine for these 20 domains. Issue is, if I make query which contains text which does not belongs to any of these 20 domain, even it gives classification result. Is it possible that if query does not belon

[Scikit-learn-general] precomputed distance matrix for clustering

2014-09-03 Thread Amita Misra
Hello, I have n documents and want to use precomputed similarity mertric between a pair of documents for clustering. I created a 2 dim numpy Array say X, containing similarity score for every pair of documents. Also type(X) and X.shape gives the output as (n, n) Then I create a cluster object us