[Scikit-learn-general] GSoC - Blog post updates

2014-07-23 Thread Maheshakya Wijewardena
Hi, I have made my new post on testing LSH-ANN implementation: http://maheshakya.github.io/gsoc/2014/07/24/testing-lsh-forest.html Best regards, Maheshakya -- Undergraduate, Department of Computer Science and Engineering, Faculty of Engineering. University of Moratuwa, Sri Lanka

Re: [Scikit-learn-general] sparse matrices with LinearSVC

2014-07-23 Thread Caleb
It is strange. I use sparse matrices with LinearSVC all the time. can you provide some code example? --- Caleb > On 24 Jul, 2014, at 1:46 pm, "Pagliari, Roberto" > wrote: > > Is it possible to use scipy sparse matrices with LinearSVC? > > I tried and it does not work. > > I also tried to i

[Scikit-learn-general] sparse matrices with LinearSVC

2014-07-23 Thread Pagliari, Roberto
Is it possible to use scipy sparse matrices with LinearSVC? I tried and it does not work. I also tried to import sparse.LinearSVC, but it says svm has no module named sparse \ Thank you, -- Want fast and easy acce

Re: [Scikit-learn-general] 'GridSearchCV' object has no attribute 'best_estimator_'

2014-07-23 Thread Pagliari, Roberto
I re-installed everything from scratch on a fresh linux distro and it works now. Thank you, From: Joel Nothman [mailto:joel.noth...@gmail.com] Sent: Wednesday, July 23, 2014 11:04 PM To: scikit-learn-general Subject: Re: [Scikit-learn-general] 'GridSearchCV' object has no attribute 'best_estima

Re: [Scikit-learn-general] 'GridSearchCV' object has no attribute 'best_estimator_'

2014-07-23 Thread Joel Nothman
Please make sure you call fit() first, as in http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html On 24 July 2014 02:07, Pagliari, Roberto wrote: > I’m getting this error when trying to predict using the result of grid > search with LinearSVC. > > > > However, ac

[Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-23 Thread Kartik Kumar Perisetla
Hello, I am creating a content classifier using scikit-learn through HashingVectorizer( using this as reference: http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html ). The training dataset I am using wikipedia. For example, for "management" category I am tr

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Dayvid Victor
Wow, I didn't know that. I've seen so many publications (and also used in publications) using this approximation and calling it AUC (including that survey I sent); But it is always good to know the correct terms. Thanks, On Wed, Jul 23, 2014 at 8:32 PM, Mario Michael Krell wrote: > Dayvid, as

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Mario Michael Krell
Dayvid, as I said, this metric should be called "balanced accuracy" (BA) to avoid misunderstandings with the real AUC from the ROC curve as stated in the given reference. I also had my autocorrection on: 1 - FP_rate = TN_rate and BA = (TP_rate+TN_rate)/2. It is not "another" but the same evalua

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Pagliari, Roberto
I used pip, after installing all required libraries, including fortran. -Original Message- From: Lars Buitinck [mailto:larsm...@gmail.com] Sent: Wednesday, July 23, 2014 3:56 PM To: scikit-learn-general Subject: Re: [Scikit-learn-general] GridSearchVC with SVM 2014-07-23 21:31 GMT+02:00

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Lars Buitinck
2014-07-23 21:31 GMT+02:00 Pagliari, Roberto : > It says 0.15.0 > > Right now I am finding the optimal values manually, using cross_validation > (by picking the best average). That can't be right. This attribute was in place in at least 0.14.0. How did you install scikit-learn? -

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Pagliari, Roberto
It says 0.15.0 Right now I am finding the optimal values manually, using cross_validation (by picking the best average). -Original Message- From: Lars Buitinck [mailto:larsm...@gmail.com] Sent: Wednesday, July 23, 2014 2:29 PM To: scikit-learn-general Subject: Re: [Scikit-learn-genera

Re: [Scikit-learn-general] Research position in the Brazilian Research Institute for Science and Neurotechnology – BRAINN

2014-07-23 Thread Olivier Grisel
Hi Paulo, Please do not post job ads to the mailing list unless they imply directly contributing to the scikit-learn project itself. This is not explicitly stated in this position description. Also please prefix job ads with a "[JOB]" marker in the object. Best, -- Olivier ---

[Scikit-learn-general] Research position in the Brazilian Research Institute for Science and Neurotechnology – BRAINN

2014-07-23 Thread Paulo Henrique Junqueira Amorim
Research position in the Brazilian Research Institute for Science and Neurotechnology – BRAINN Postdoc researcher to work with software development for medical imaging The Brazilian Research Institute for Neuroscience and Neurotechnology (BRAINN) (www.brainn.org.br) focuses on the investigation o

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Lars Buitinck
2014-07-23 18:21 GMT+02:00 Pagliari, Roberto : > Is there a way to make prediction, once grid search is done? Right now I’m > getting the error > > 'GridSearchCV' object has no attribute 'best_estimator_' Works fine here. What does `python -c 'import sklearn; print(sklearn.__version__)` say?

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Dayvid Victor
Mario, as I said, the correct would be: - AUC = (1 + TP_rate - FP_rate) / 2 But you are also right, that is another evaluation metric stated in those references I sent! On Wed, Jul 23, 2014 at 2:06 PM, Mario Michael Krell wrote: > 1-FN_rate = TN_rate > > Concequently, (1 + TP_rate - FN_

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Dayvid Victor
Hamed, I am sorry, the correct trapezoidal approximation is: - AUC = (1 + TP_rate - FP_rate) / 2 Also, keep in mind that, when dealing with binary imbalanced datasets, you can calculate as: auc = (1.0 + t_mn - (1.0 - t_mj)) / 2; Where t_mn is the minority class accuracy, and t_mj the majority

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Mario Michael Krell
1-FN_rate = TN_rate Concequently, (1 + TP_rate - FN_rate)/ 2 should be named "Balanced Accuracy" to avoid misunderstandings. Nevertheless, it is a good choice. On 23.07.2014, at 18:57, Dayvid Victor wrote: > > Or you might use the trapezoid aproximation: auc = (1 + TP_rate - FN_rate)/ 2 > -

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Dayvid Victor
Hi, Like Mathiel Blondel said, the AUC (Area under the ROC Curve) is the most popular metric. def auc_score(y_true, y_pred, pos_label=1): fp_rate, tp_rate, thresholds = sk.metrics.roc_curve( y_true, y_pred, pos_label=pos_label) return sk.metrics.auc(fp_rate, tp_rate) Or you migh

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Lars Buitinck
2014-07-23 18:07 GMT+02:00 Michael Eickenberg : > To answer 1): yes, if you set cv=number, then it will do K-fold > cross-validation with that number of folds. You can do this explicitly by > using > > from sklearn.cross_validation import KFold > > cv = KFold(len(data), 6) > > and pass cv as an arg

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Pagliari, Roberto
Hi Michael, Thanks for the clarifications. Is there a way to make prediction, once grid search is done? Right now I’m getting the error 'GridSearchCV' object has no attribute 'best_estimator_' And I’ve seen other people reporting the same error. If not possible, is there a minimal example of

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Emanuele Olivetti
Hi, In addition to what has already been suggested, especially Chi^2 and MCC, I would suggest this: http://dx.doi.org/10.1109/PRNI.2012.14 (full disclosure: it is one of my papers) which is, in short, a Bayesian equivalent of Chi^2 / MCC, which works for binary and multi-class and do

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Michael Eickenberg
To answer 1): yes, if you set cv=number, then it will do K-fold cross-validation with that number of folds. You can do this explicitly by using from sklearn.cross_validation import KFold cv = KFold(len(data), 6) and pass cv as an argument to GridSearchCV. To answer question 2 I think we need s

[Scikit-learn-general] 'GridSearchCV' object has no attribute 'best_estimator_'

2014-07-23 Thread Pagliari, Roberto
I'm getting this error when trying to predict using the result of grid search with LinearSVC. However, according to the documentation (http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html) the predict method should be available. Has it been implemented or shou

[Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Pagliari, Roberto
This is an example about how to perform gridsearch with SVM. >>> from sklearn import svm, grid_search, datasets >>> iris = datasets.load_iris() >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} >>> svr = svm.SVC() >>> clf = grid_search.GridSearchCV(svr, parameters) >>> clf.fit(iris.data,

Re: [Scikit-learn-general] Fwd: RBF Kernel - Query

2014-07-23 Thread Michael Eickenberg
It isn't possible because a gaussian kernel spans an infinite dimensional feature space. The coef you are looking for would be functions in that space. However, assuming you are working with an SVM (you don't specify this), you can look at dual_coef_ and if I remember correctly support_ to see whi

[Scikit-learn-general] RBK Kernel - Query

2014-07-23 Thread umang patel
Hello all , I asked the following question on "Issues" and I was advised to mail on the following email id I i have furthur queries . " Is it possible to get weight of features in rbf kernel . It is written under coeff_ that it is possible only with linear kernel . Is it mathematically possible

[Scikit-learn-general] Fwd: RBF Kernel - Query

2014-07-23 Thread umang patel
Hello all , I asked the following question on "Issues" and I was advised to mail on the following email id I i have furthur queries . " Is it possible to get weight of features in rbf kernel . It is written under coeff_ that it is possible only with linear kernel . Is it mathematically possible

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Mario Michael Krell
Unfortunately, neither MCC nor F-Measure are really suited in most cases of imbalanced data, although they are way better than accuracy. Especially, with F-Measure we got bad behavior due to changing class ratios in our data. If you want to have an intuitive measure which does not use a shifting

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-23 Thread Joel Nothman
Yes, I found that too and wished it were published with higher editorial standards so it could be more readable. On 23 July 2014 16:48, Dan Haiduc wrote: > Here's a comparison of all of them: EVALUATION: FROM PRECISION, RECALL > AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION >