Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-22 Thread Dan Haiduc
Here's a comparison of all of them: EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION I warmly recommend MCC, though lots of people still use ROC On Wed, Jul 23

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-22 Thread Joel Nothman
Precision, Recall and F-measure are often contrasted with Accuracy in terms of their handling imbalance. I'm sure I could find a textbook citation, but for an online example Chris Manning thus introduces P/R/F in the imbalanced spam classification problem on coursera: https://class.coursera.org/nlp

Re: [Scikit-learn-general] Beta regression

2014-07-22 Thread Mathieu Blondel
statsmodel has a GLM module but apparently no beta regression. There is also a scikit-learn compatible wrapper around the GLM module here: https://github.com/jcrudy/glm-sklearn Mathieu On Mon, Jul 21, 2014 at 10:54 PM, Gavin Gray wrote: > Checking the documentation it looks like Scikit-learn

Re: [Scikit-learn-general] discrepancy of results with sklearn grid_search

2014-07-22 Thread Pagliari, Roberto
Thank you for the clarifications. Actually, I noticed I came across a very old non-official tutorial. Thank you, -Original Message- From: Lars Buitinck [mailto:larsm...@gmail.com] Sent: Tuesday, July 22, 2014 7:26 PM To: scikit-learn-general Subject: Re: [Scikit-learn-general] discre

Re: [Scikit-learn-general] Confidence score for each prediction from regressor

2014-07-22 Thread Ronnie Ghose
@peter, yup you can :D , also if you were looking at svms you can generate probabilities for those as well. On Tue, Jul 22, 2014 at 9:32 PM, Mathieu Blondel wrote: > > > > On Wed, Jul 23, 2014 at 4:47 AM, Peter Prettenhofer < > peter.prettenho...@gmail.com> wrote: > >> >> An alternative is to u

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-22 Thread Mathieu Blondel
AUC (area under the roc curve) is commonly used for imbalanced binary classification problems. The AUC is the probability that your classifier will rank a positive sample higher than a negative sample (where the ranking is computed using the "decision_function" scores). In scikit-learn, it is imple

Re: [Scikit-learn-general] Confidence score for each prediction from regressor

2014-07-22 Thread Mathieu Blondel
On Wed, Jul 23, 2014 at 4:47 AM, Peter Prettenhofer < peter.prettenho...@gmail.com> wrote: > > An alternative is to use a GradientBoostingRegressor with quantile loss to > generate prediction intervals (see [1]) -- only for the keen - i've once > used that unsuccessfully in a Kaggle comp. Its not

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-22 Thread Mathieu Blondel
from sklearn.multiclass import OneVsRestClassifier clf = OneVsRestClassifier(ElasticNet()) should work. This is tested here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tests/test_multiclass.py#L168 For setting the parameters by grid-search, you need to use the "estimator__

Re: [Scikit-learn-general] discrepancy of results with sklearn grid_search

2014-07-22 Thread Lars Buitinck
2014-07-23 0:58 GMT+02:00 Pagliari, Roberto : > Also, notice that I had to use gs.best_estimator_, and not > gs.best_estimator, and also that the module name for me is sklearn and not > scikits.learn. > Has there been a change in recent versions? No. Not in recent versions. We changed the package

[Scikit-learn-general] unsubscribe

2014-07-22 Thread Dillinger, Mike
On 7/20/14, 6:11 PM, "scikit-learn-general-requ...@lists.sourceforge.net" wrote: >Send Scikit-learn-general mailing list submissions to > scikit-learn-general@lists.sourceforge.net > >To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/lis

[Scikit-learn-general] discrepancy of results with sklearn grid_search

2014-07-22 Thread Pagliari, Roberto
I tried the sklearn example about grid search ("set parameters by cross-validation"). The result of gs.best_estimator using the sklearn iris dataset is giving me: > LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, > intercept_scaling=1, loss='l2', multi_class='ovr', penalt

Re: [Scikit-learn-general] Confidence score for each prediction from regressor

2014-07-22 Thread Peter Prettenhofer
I might be wrong but it seems like Mathieu is working on something similar for Ridge this: https://github.com/scikit-learn/scikit-learn/pull/3417 2014-07-22 21:47 GMT+02:00 Peter Prettenhofer : > Hi Yogesh, > > one of the few regressors that supports this in sklearn is GaussianProcess > but tha

Re: [Scikit-learn-general] Confidence score for each prediction from regressor

2014-07-22 Thread Peter Prettenhofer
Hi Yogesh, one of the few regressors that supports this in sklearn is GaussianProcess but that wont scale to your problem. An alternative is to use a GradientBoostingRegressor with quantile loss to generate prediction intervals (see [1]) -- only for the keen - i've once used that unsuccessfully in

Re: [Scikit-learn-general] Confidence score for each prediction from regressor

2014-07-22 Thread Ronnie Ghose
. do you mean the ``score`` method On Tue, Jul 22, 2014 at 1:52 PM, Yogesh Pandit wrote: > Hello, > > I am working with regressors (sklearn.ensemble). Shape of my test data > is (1121280, 452) > > I am wondering on how I can associate a confidence score for prediction > for each sample from

[Scikit-learn-general] Confidence score for each prediction from regressor

2014-07-22 Thread Yogesh Pandit
Hello, I am working with regressors (sklearn.ensemble). Shape of my test data is (1121280, 452) I am wondering on how I can associate a confidence score for prediction for each sample from my test data. Any suggestions would be helpful. Thank you, -Yogesh---

[Scikit-learn-general] LinearSVC parameters

2014-07-22 Thread Pagliari, Roberto
I just had a couple of questions about LinearSVC: 1. How do I determine a reasonable value of 'tol'? it is set by default to 1e-4. How was this value chosen? 2. Can you provide a simple example about how to set class_weight for linearSVC with two classes? Thank you, -

Re: [Scikit-learn-general] resetting extension 'sklearn.svm.liblinear' language from 'f77' to 'c++'

2014-07-22 Thread Pagliari, Roberto
I installed with pip and now everything seems to be ok :) -Original Message- From: Gael Varoquaux [mailto:gael.varoqu...@normalesup.org] Sent: Tuesday, July 22, 2014 11:18 AM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] resetting extension 'sklearn.s

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-22 Thread Dan Haiduc
Check out Matthews Correlation Coefficient ! It's derived from the chi^2 test. On Tue, Jul 22, 2014 at 6:26 PM, Hamed Zamani wrote: > Hi, > > I am working on a binary classification problem in which both t

Re: [Scikit-learn-general] Help simple adaption of PCA

2014-07-22 Thread Adam Hughes
Thanks Alexandre, I'm not sure that's the case for me. Our data is wavelength in rows and time in columns. Say an experiment takes 500 timepoints. I'd like to run PCA along the time variable (axis=1) and be able to represent my data as a wavelength vs. k_components matrix, where n << k_timepoin

[Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-22 Thread Hamed Zamani
Hi, I am working on a binary classification problem in which both training and test data are highly imbalanced. In other words, the number of instances available in one class is far more than the other one. Would you please let me know which evaluation measure is the best one to compare different

Re: [Scikit-learn-general] resetting extension 'sklearn.svm.liblinear' language from 'f77' to 'c++'

2014-07-22 Thread Gael Varoquaux
On Tue, Jul 22, 2014 at 02:50:00PM +, Pagliari, Roberto wrote: > resetting extension 'sklearn.svm.liblinear' language from 'f77' to 'c++' > What does the message mean? I don't know! -- Want fast and easy access to al

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-22 Thread Michael Eickenberg
Conflicting messages, no, there is no explicit ElasticNetClassifier, but Manoj's proposition creates one: Concerning Manoj's point 2), you may also want to trying weighting in a different way, by centering the target variable y, i.e. if y is in {-1, 1}, then do y <- y - y.mean(). This can help wit

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-22 Thread Vlad Niculae
Hi, The SGDClassifier supports elastic net regularization. You can make it solve the SVM loss function or the logistic loss function by changing the `loss=` parameter. Hope this helps, Vlad On Tue, Jul 22, 2014 at 4:17 PM, Sheila the angel wrote: > Hello All, > > Is it possible to perform class

[Scikit-learn-general] resetting extension 'sklearn.svm.liblinear' language from 'f77' to 'c++'

2014-07-22 Thread Pagliari, Roberto
When building scikit learn, under Ubuntu, I noticed this message: resetting extension 'sklearn.svm.liblinear' language from 'f77' to 'c++' this happened when I rebuild it, after installing fortran packages gfortran and fort77. What does the message mean? Thank you, --

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-22 Thread Andy
Hi. You can not use the ElasticNet regressor for classification. You can, however, use the SGDClassifier, which also supports elastic net regularization. Cheers, Andy On 07/22/2014 03:17 PM, Sheila the angel wrote: Hello All, Is it possible to perform classification using linear models such

Re: [Scikit-learn-general] ElasticNet for classification

2014-07-22 Thread Manoj Kumar
Hello, I am new too, but I think you can do a OvA for these type of problems, 1. Loop across all labels. 2. For each label, convert y into data containing 1 and -1, i.e all the labels other then the current class should be -1 (Hence the name) 3. And then predict, using clf.predict(X) For each s

[Scikit-learn-general] ElasticNet for classification

2014-07-22 Thread Sheila the angel
Hello All, Is it possible to perform classification using linear models such as ElasticNet? I tried the following - from sklearn.linear_model import ElasticNet iris = datasets.load_iris() X= iris.data y= iris.target clf= ElasticNet() clf.fit(X,y).predict(X[0]) Which gives output value

Re: [Scikit-learn-general] Beta regression

2014-07-22 Thread Andy
There is no beta regression, and very few generalized linear models in general :-/ On 07/21/2014 03:54 PM, Gavin Gray wrote: Checking the documentation it looks like Scikit-learn does not have an implementation of a generalized linear model where the target variable is within the unit interva

Re: [Scikit-learn-general] Help simple adaption of PCA

2014-07-22 Thread Alexandre Gramfort
hi, proper PCA is run on centered data (axis=0) otherwise it's a truncated SVD. I seams you want a PCA on X.T (X transposed). HTH Alex On Tue, Jul 22, 2014 at 3:14 AM, Adam Hughes wrote: > Hi, > > I'm really enjoying scikit learn and looking to add a lite version of PCA to > some programs I'm