Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-24 Thread Joel Nothman
What is the outcome of this discussion for scikit-learn? - Would someone be interested in improving the documentation highlight the merits and problems with each metric? - Are there metrics (e.g. balanc

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Mathieu Blondel
On Fri, Jul 25, 2014 at 1:46 AM, Alexandre Gramfort < alexandre.gramf...@telecom-paristech.fr> wrote: > > indeed but squared loss is cheap to use and can reach pretty good > classif performance in practice. > Indeed the squared loss works surprisingly well in practice for classification and it ha

Re: [Scikit-learn-general] SGDClassifier with class_weight=auto fails on linux, but not on osx

2014-07-24 Thread Rose Perrone
I was mistaken, it fails on osx running scikit-learn 0.15, but succeeds on scikit-learn 0.14. -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of

[Scikit-learn-general] SGDClassifier with class_weight=auto fails on linux, but not on osx

2014-07-24 Thread Rose Perrone
When I train an scikit-learn SGDClassifier with these options: SGDClassifier(loss='log', class_weight=None, penalty='l2'), training completes with no error. When I train this classifier with class_weight='au

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Olivier Grisel
2014-07-24 16:43 GMT+02:00 Kartik Kumar Perisetla : > I actually used part of text of one wikipedia article which was used in > training. I was expecting it to detect the category for which it was used as > training instance. But it predicted as some other category and thus I > thought it did not g

Re: [Scikit-learn-general] About scoring functions with GridSearchCV

2014-07-24 Thread Arnaud Joly
1) score_func is the deprecated way to pass metric function. Now we use scoring as it is a much better to make it. 2) in grid search cv, this is not possible since you want to optimise the hyper parameters using a given objective (you need a scalar and not a bumpy array). Extending the grid s

Re: [Scikit-learn-general] About scoring functions with GridSearchCV

2014-07-24 Thread Pagliari, Roberto
Thank you. Just a few more points and I should be all set :) - So, basically, scoring=None means that the default scoring function of the classifier will be used. What is the difference between scoring and score_func? - When running gridSearchCV is it possible to get the aver

Re: [Scikit-learn-general] About scoring functions with GridSearchCV

2014-07-24 Thread Arnaud Joly
Hi, 1) Indeed, the default scoring metrics in classification is the accuracy 2) True, the best score will be given by the best average accuracy over the folds 3) It should raise an error as it is not a possible scorer. Hope it help, Arnaud Joly On 24 Jul 2014, at 22:09, Pagliari, Roberto wrot

[Scikit-learn-general] About scoring functions with GridSearchCV

2014-07-24 Thread Pagliari, Roberto
I have a few comments/questions about scoring within the context of classification. Specifically, when GridSearchCV with kfold cross-validation is used. 1) From my understanding, the default scoring function is the ratio given by the number of correctly classified samples and the total nu

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Alexandre Gramfort
> But SGDClassifier optimizes classification-specific loss functions, > unlike ElasticNet which is a regressor. indeed but squared loss is cheap to use and can reach pretty good classif performance in practice. A -- Want

Re: [Scikit-learn-general] sparse matrices with LinearSVC

2014-07-24 Thread Pagliari, Roberto
That was version 0.6 I realized, so I guess it no longer applies From: Mathieu Blondel [mailto:math...@mblondel.org] Sent: Thursday, July 24, 2014 4:49 AM To: scikit-learn-general Subject: Re: [Scikit-learn-general] sparse matrices with LinearSVC On Thu, Jul 24, 2014 at 2:46 PM, Pagliari, Robe

Re: [Scikit-learn-general] sparse matrices with LinearSVC

2014-07-24 Thread Pagliari, Roberto
It is working now, I don't know why it was not working yesterday . -Original Message- From: Alexandre Gramfort [mailto:alexandre.gramf...@telecom-paristech.fr] Sent: Thursday, July 24, 2014 4:00 AM To: scikit-learn-general Subject: Re: [Scikit-learn-general] sparse matrices with LinearS

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Sheila the angel
>From the last few answers it seems that SGDClassifier is more appropriate for classification using ElasticNet. Although this link www.datarobot.com/blog/regularized-linear-regression-with-scikit-learn/ says "Regularization path plots can be efficiently created using coordinate descent optimizat

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Vlad Niculae
But SGDClassifier optimizes classification-specific loss functions, unlike ElasticNet which is a regressor. Correct me if i'm wrong, but wrapping ElasticNet in a OvR fashion doesn't lead to the same thing, and SGDClassifier would generally be more appropriate for classification in my opinion. My 2

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Alexandre Gramfort
> But now it makes me think - > How OneVsRestClassifier approach is different then SGDClassifier? > Is SGDClassifier an optimization algorithm which also uses > OneVsRestClassifier for classification? yes SGDClassifier uses OvR internally. A --

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Sheila the angel
I think I found the answer. The class score can be obtained using clf.decision_function(X) But now it makes me think - How OneVsRestClassifier approach is different then SGDClassifier? Is SGDClassifier an optimization algorithm which also uses OneVsRestClassifier for classification? On 24 Ju

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Kartik Kumar Perisetla
I actually used part of text of one wikipedia article which was used in training. I was expecting it to detect the category for which it was used as training instance. But it predicted as some other category and thus I thought it did not give accurate prediction. Please correct my understanding if

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-24 Thread Hamed Zamani
Dear all, Thank you very much for all of your quick and informative answers. The papers which you introduced really help me. Cheers, Hamed On Thu, Jul 24, 2014 at 1:42 AM, Dayvid Victor wrote: > Wow, I didn't know that. I've seen so many publications (and also used in > publications) > using

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Alexandre Gramfort
> So how do I obtain the class probability along with classification? you help me finish : https://github.com/scikit-learn/scikit-learn/pull/1176 :) Alex -- Want fast and easy access to all the code in your enterprise?

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Lars Buitinck
2014-07-24 4:35 GMT+02:00 Kartik Kumar Perisetla : > Also, Could someone please throw some light on how HashingVectorizer works? https://larsmans.github.io/ilps-hashing-trick/ https://en.wikipedia.org/wiki/Feature_hashing http://metaoptimize.com/qa/questions/6943/what-is-the-hashing-trick ---

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-24 Thread Sheila the angel
Thank you all. I tried the OneVsRestClassifier as iris = datasets.load_iris() X = iris.data y = iris.target X /= X.std(0) clf = OneVsRestClassifier(ElasticNet(alpha=0.25, l1_ratio=0.5)).fit(X,y) y_pred = clf.predict(X) This works however clf.predict_proba(X) gives error AttributeError:

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Olivier Grisel
2014-07-24 4:35 GMT+02:00 Kartik Kumar Perisetla : > Hello, > > I am creating a content classifier using scikit-learn through > HashingVectorizer( using this as reference: > http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html). > > The training dataset I am u

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Eustache DIEMERT
> But when I test the prediction for a new sentence or text, it gives wrong prediction. How do you measure that ? Having a few badly classified instances does not necessarily means the learning has failed. A good classification accuracy for text classification is typically > 80%, what is yours ?

Re: [Scikit-learn-general] sparse matrices with LinearSVC

2014-07-24 Thread Mathieu Blondel
On Thu, Jul 24, 2014 at 2:46 PM, Pagliari, Roberto wrote: > I also tried to import sparse.LinearSVC, but it says svm has no module > named sparse…. > > > I don't know where you get your documentation but sparse.LinearSVC has been removed like 3 years ago... :-) Mathieu --

Re: [Scikit-learn-general] sparse matrices with LinearSVC

2014-07-24 Thread Alexandre Gramfort
> Is it possible to use scipy sparse matrices with LinearSVC? sklearn.svm.LinearSVC will accept sparse data. it's the same classes that should work with both dense and sparse data A -- Want fast and easy access to all t

Re: [Scikit-learn-general] GSoC - Blog post updates

2014-07-24 Thread Nick Pentreath
This contribution is looking really exciting! Looking forward to seeing it in scikit-learn!— Sent from Mailbox On Thu, Jul 24, 2014 at 8:52 AM, Maheshakya Wijewardena wrote: > Hi, > I have made my new post on testing LSH-ANN implementation: > http://maheshakya.github.io/gsoc/2014/07/24/testing-