Re: [Scikit-learn-general] [GSOC] Dummy with sparse target

2014-07-28 Thread Robert Layton
Yes, but shouldn't that result in the same expected accuracy? On 29 July 2014 12:27, Josh Vredevoogd wrote: > My reading is that most-frequent always predicts the median, whereas > stratified randomly predicts samples with probabilities given by the class > distribution. > > > On Mon, Jul 28, 2

Re: [Scikit-learn-general] [GSOC] Dummy with sparse target

2014-07-28 Thread Josh Vredevoogd
My reading is that most-frequent always predicts the median, whereas stratified randomly predicts samples with probabilities given by the class distribution. On Mon, Jul 28, 2014 at 7:18 PM, Robert Layton wrote: > Looks good Hamzeh! > > This may be a dumb question, but is there an expected (in

Re: [Scikit-learn-general] [GSOC] Dummy with sparse target

2014-07-28 Thread Robert Layton
Looks good Hamzeh! This may be a dumb question, but is there an expected (in the statistical sense) difference between a most-frequent and a stratified dummy predictor? On 29 July 2014 11:53, Hamzeh Alsalhi wrote: > Hi! This week I wrote a post (with many benchmark plots) of the sparse > targe

[Scikit-learn-general] [GSOC] Dummy with sparse target

2014-07-28 Thread Hamzeh Alsalhi
Hi! This week I wrote a post (with many benchmark plots) of the sparse target dummy classifier http://hamzehgsoc.blogspot.com/ Thanks, Hamzeh -- Infragistics Professional Build stunning WinForms apps today! Reboot your Win

Re: [Scikit-learn-general] [GSoC] - Logistic Regression CV (Manoj Kumar)

2014-07-28 Thread Joel Nothman
There is actually an open PR to import the sample_weight changes into the scikit-learn copy of liblinear: https://github.com/scikit-learn/scikit-learn/pull/2784. It would appreciate some love, or someone to executively decide that it's not worth including. On 29 July 2014 10:36, Sean Violante wr

Re: [Scikit-learn-general] [GSoC] - Logistic Regression CV (Manoj Kumar)

2014-07-28 Thread Sean Violante
> > it wasn't clear from the blog post but > are you aware that liblinear has a modification that handles sample > weights > http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances > > [fyi what I would be interested in (and I am not sure this is implemented > in that mod) is whe

[Scikit-learn-general] [GSoC] - Logistic Regression CV (Manoj Kumar)

2014-07-28 Thread Sean Violante
it wasn't clear from the blog post but are you aware that liblinear has a modification that handles sample weights http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances [fyi what I would be interested in (and I am not sure this is implemented in that mod) is where one can aggr

Re: [Scikit-learn-general] RBK Kernel - Query

2014-07-28 Thread Joel Nothman
You can find the answer by googling scikit-learn-general and "umang patel": https://www.mail-archive.com/scikit-learn-general@lists.sourceforge.net/msg10981.html As it does not pertain directly to scikit-learn, this is also a question that you might get a more thorough answer for in a forum like s

Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_

2014-07-28 Thread Joel Nothman
Make sure you read and understand http://scikit-learn.org/stable/modules/cross_validation.html. Basically, getting the score of the final model on the full training data will be a poor indication of how well the model will perform on other data. The average of k folds where we have held out test da

Re: [Scikit-learn-general] RBK Kernel - Query

2014-07-28 Thread umang patel
Hi Andy , I nvr got answer . Could you please re- answer if possible . I will really appraciate it . Thank you. On Mon, Jul 28, 2014 at 2:53 PM, Andy wrote: > Please do not repost. > You got an answer on the issue if I recall correctly. > If you want to know more, pick up a textbook on mach

[Scikit-learn-general] [GSoC] - Logistic Regression CV

2014-07-28 Thread Manoj Kumar
Hi, A update on the new Logistic Regression CV model in scikit-learn http://manojbits.wordpress.com/2014/07/28/scikit-learn-logistic-regression-cv-2/ -- Regards, Manoj Kumar, GSoC 2014, Scikit-learn Mech Undergrad http://manojbits.wordpress.com --

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-28 Thread Yogesh Karpate
Dear Hamed, Can you share the code of "balanced accuracy" as you mentioned in last mail. On Tue, Jul 29, 2014 at 12:07 AM, Hamed Zamani wrote: > Dear Mario, > > Yes of course. Sorry I forgot to mention GMeans. It is also one of the > measures which have been used frequently. > > -- Hamed > > >

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-28 Thread Hamed Zamani
Dear Mario, Yes of course. Sorry I forgot to mention GMeans. It is also one of the measures which have been used frequently. -- Hamed On Tue, Jul 29, 2014 at 2:24 AM, Mario Michael Krell wrote: > Dear Hamed, > > I think it would be a good idea to also consider gmean when extending > scikit.

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-28 Thread Mario Michael Krell
Dear Hamed, I think it would be a good idea to also consider gmean when extending scikit. It is the geometric mean of TNR and TPR instead of the arithmetic mean used for the balanced accuracy. Greets Mario On 28.07.2014, at 19:00, scikit-learn-general-requ...@lists.sourceforge.net wrote: >

Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_

2014-07-28 Thread Pagliari, Roberto
Hi Joel, Just to make sure I understood. - C is computed with cross validation, by finding the highest average score over the k folds - Once C is found, weights are computed over the whole training set. If that’s the case, why is the best_score_ averaged over the k folds? Sho

Re: [Scikit-learn-general] calculate the posterior probability

2014-07-28 Thread Mario Michael Krell
I have to somehow contradict. In fact it would be possible to get a probability but it requires some "work". So it is not easy. I my group, we are using a sigmoid fit introduced by Platt to map SVM scores to probability values. We integrated it in our pySPACE framework, which also interfaces sc

[Scikit-learn-general] LinearSVC parameters

2014-07-28 Thread Pagliari, Roberto
Hello, In terms of input parameters of LinearSVC, namely, * Penalty function * Loss function * Tolerance * Dual Did you make any study about performance and accuracy of the results with different combinations, or do you have any particular insights? Thank you

[Scikit-learn-general] scikit learn's LDA produces different results compared to R or a step-by-step approach, a bug?

2014-07-28 Thread Sebastian Raschka
Hi, I used scikit-learn's LDA for dimensionality reduction and noticed that the projected linear discriminants look a little bit strange. For testing, I then used the Iris dataset and could reproduce the results that are posted on scikit-learn's example documentation here: http://scikit-learn.

Re: [Scikit-learn-general] LinearSVC loss and penalty functions

2014-07-28 Thread Andy
On 07/28/2014 06:04 PM, Pagliari, Roberto wrote: I'm getting an error when I try to use the following combinations: Penalty: l1, loss: l1, regardless of dual Penalty: l1, loss: l2 when dual=True Penalty=l2, loss=l1, when dual = False Is this the expected behavior? Yes. This is the behavior

Re: [Scikit-learn-general] RBK Kernel - Query

2014-07-28 Thread Andy
Please do not repost. You got an answer on the issue if I recall correctly. If you want to know more, pick up a textbook on machine learning, such as ESL (free pdf: http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf), Kevin Murpy's book or the Bishop. On 07/27/2014 08:22

Re: [Scikit-learn-general] calculate the posterior probability

2014-07-28 Thread Josh Vredevoogd
In some cases, you can get more information from classifier.decision_function(). The output will not be a probability but can still be more useful than the binary result -- I'm thinking of meta-classifiers or classifier evaluation. Caveat: there are likely gotchas in going this direction if you don

Re: [Scikit-learn-general] calculate the posterior probability

2014-07-28 Thread Lars Buitinck
2014-07-28 18:39 GMT+02:00 Sheila the angel : > For the classifier which do not provide probability estimate of the class > (gives error 'object has no attribute predict_proba " ), is there any easy > way to calculate the posterior probability? No. If there were, we would have implemented predict_

Re: [Scikit-learn-general] Evaluation measure for imbalanced data

2014-07-28 Thread Hamed Zamani
Dear Joel, Sorry for the delay. I was in a trip and I couldn't check my email. To the best of my knowledge and according to the kind responses in this email thread, we cannot claim that an specific measure is better than the others for imbalanced data. In other words, there are some evaluation me

[Scikit-learn-general] calculate the posterior probability

2014-07-28 Thread Sheila the angel
For the classifier which do not provide probability estimate of the class (gives error 'object has no attribute predict_proba " ), is there any easy way to calculate the posterior probability? Thank you, -- Infragistics Pr

[Scikit-learn-general] LinearSVC loss and penalty functions

2014-07-28 Thread Pagliari, Roberto
I'm getting an error when I try to use the following combinations: Penalty: l1, loss: l1, regardless of dual Penalty: l1, loss: l2 when dual=True Penalty=l2, loss=l1, when dual = False Is this the expected behavior? Thank you,

Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_

2014-07-28 Thread Joel Nothman
I do think you're right to attempt to improve it! Please submit a PR! On 29 July 2014 00:05, Pagliari, Roberto wrote: > You are right. > > > > I guess only C (in the case of linear SVM) is the best averaged over the > fold. And once C is found, the weights over the whole training set are > comp

Re: [Scikit-learn-general] gridSearchCV best_estimator_ best_score_

2014-07-28 Thread Pagliari, Roberto
You are right. I guess only C (in the case of linear SVM) is the best averaged over the fold. And once C is found, the weights over the whole training set are computed. If that's the case, my proposal may be misleading. Thank you, Roberto From: Andy [mailto:t3k...@gmail.com] Sent: Saturday, J

Re: [Scikit-learn-general] Questions on random forests

2014-07-28 Thread federico vaggi
That is possibly the best explanation of random forest hyper parameters I've ever read. You really should blog about this since it should get way more exposure (I know you wrote a thesis about it, but that's not very accessible to people who aren't specialist in the field). On Mon, Jul 28, 2014

Re: [Scikit-learn-general] Questions on random forests

2014-07-28 Thread Gilles Louppe
Hi Kevin, Interesting question. Your point is true provided you have an infinite amount of training data. In that case, you can indeed show that an infinitely large forest of extremely randomized trees built for K=1 converges towards an optimal model (the Bayes model). This result however does no