Yes, but shouldn't that result in the same expected accuracy?
On 29 July 2014 12:27, Josh Vredevoogd wrote:
> My reading is that most-frequent always predicts the median, whereas
> stratified randomly predicts samples with probabilities given by the class
> distribution.
>
>
> On Mon, Jul 28, 2
My reading is that most-frequent always predicts the median, whereas
stratified randomly predicts samples with probabilities given by the class
distribution.
On Mon, Jul 28, 2014 at 7:18 PM, Robert Layton
wrote:
> Looks good Hamzeh!
>
> This may be a dumb question, but is there an expected (in
Looks good Hamzeh!
This may be a dumb question, but is there an expected (in the statistical
sense) difference between a most-frequent and a stratified dummy predictor?
On 29 July 2014 11:53, Hamzeh Alsalhi wrote:
> Hi! This week I wrote a post (with many benchmark plots) of the sparse
> targe
Hi! This week I wrote a post (with many benchmark plots) of the sparse
target dummy classifier http://hamzehgsoc.blogspot.com/
Thanks,
Hamzeh
--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your Win
There is actually an open PR to import the sample_weight changes into the
scikit-learn copy of liblinear:
https://github.com/scikit-learn/scikit-learn/pull/2784. It would appreciate
some love, or someone to executively decide that it's not worth including.
On 29 July 2014 10:36, Sean Violante wr
>
> it wasn't clear from the blog post but
> are you aware that liblinear has a modification that handles sample
> weights
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances
>
> [fyi what I would be interested in (and I am not sure this is implemented
> in that mod) is whe
it wasn't clear from the blog post but
are you aware that liblinear has a modification that handles sample weights
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances
[fyi what I would be interested in (and I am not sure this is implemented
in that mod) is where one can aggr
You can find the answer by googling scikit-learn-general and "umang patel":
https://www.mail-archive.com/scikit-learn-general@lists.sourceforge.net/msg10981.html
As it does not pertain directly to scikit-learn, this is also a question
that you might get a more thorough answer for in a forum like
s
Make sure you read and understand
http://scikit-learn.org/stable/modules/cross_validation.html. Basically,
getting the score of the final model on the full training data will be a
poor indication of how well the model will perform on other data. The
average of k folds where we have held out test da
Hi Andy ,
I nvr got answer . Could you please re- answer if possible . I will really
appraciate it .
Thank you.
On Mon, Jul 28, 2014 at 2:53 PM, Andy wrote:
> Please do not repost.
> You got an answer on the issue if I recall correctly.
> If you want to know more, pick up a textbook on mach
Hi, A update on the new Logistic Regression CV model in scikit-learn
http://manojbits.wordpress.com/2014/07/28/scikit-learn-logistic-regression-cv-2/
--
Regards,
Manoj Kumar,
GSoC 2014, Scikit-learn
Mech Undergrad
http://manojbits.wordpress.com
--
Dear Hamed,
Can you share the code of "balanced accuracy" as you mentioned in last mail.
On Tue, Jul 29, 2014 at 12:07 AM, Hamed Zamani wrote:
> Dear Mario,
>
> Yes of course. Sorry I forgot to mention GMeans. It is also one of the
> measures which have been used frequently.
>
> -- Hamed
>
>
>
Dear Mario,
Yes of course. Sorry I forgot to mention GMeans. It is also one of the
measures which have been used frequently.
-- Hamed
On Tue, Jul 29, 2014 at 2:24 AM, Mario Michael Krell
wrote:
> Dear Hamed,
>
> I think it would be a good idea to also consider gmean when extending
> scikit.
Dear Hamed,
I think it would be a good idea to also consider gmean when extending scikit.
It is the geometric mean of TNR and TPR instead of the arithmetic mean used for
the balanced accuracy.
Greets
Mario
On 28.07.2014, at 19:00, scikit-learn-general-requ...@lists.sourceforge.net
wrote:
>
Hi Joel,
Just to make sure I understood.
- C is computed with cross validation, by finding the highest average
score over the k folds
- Once C is found, weights are computed over the whole training set.
If that’s the case, why is the best_score_ averaged over the k folds? Sho
I have to somehow contradict. In fact it would be possible to get a probability
but it requires some "work". So it is not easy.
I my group, we are using a sigmoid fit introduced by Platt to map SVM scores to
probability values. We integrated it in our pySPACE framework, which also
interfaces sc
Hello,
In terms of input parameters of LinearSVC, namely,
* Penalty function
* Loss function
* Tolerance
* Dual
Did you make any study about performance and accuracy of the results with
different combinations, or do you have any particular insights?
Thank you
Hi,
I used scikit-learn's LDA for dimensionality reduction and noticed that the
projected linear discriminants look a little bit strange. For testing, I then
used the Iris dataset and could reproduce the results that are posted on
scikit-learn's example documentation here:
http://scikit-learn.
On 07/28/2014 06:04 PM, Pagliari, Roberto wrote:
I'm getting an error when I try to use the following combinations:
Penalty: l1, loss: l1, regardless of dual
Penalty: l1, loss: l2 when dual=True
Penalty=l2, loss=l1, when dual = False
Is this the expected behavior?
Yes.
This is the behavior
Please do not repost.
You got an answer on the issue if I recall correctly.
If you want to know more, pick up a textbook on machine learning, such
as ESL (free pdf:
http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf), Kevin
Murpy's book or the Bishop.
On 07/27/2014 08:22
In some cases, you can get more information from
classifier.decision_function(). The output will not be a probability but
can still be more useful than the binary result -- I'm thinking of
meta-classifiers or classifier evaluation. Caveat: there are likely gotchas
in going this direction if you don
2014-07-28 18:39 GMT+02:00 Sheila the angel :
> For the classifier which do not provide probability estimate of the class
> (gives error 'object has no attribute predict_proba " ), is there any easy
> way to calculate the posterior probability?
No. If there were, we would have implemented predict_
Dear Joel,
Sorry for the delay. I was in a trip and I couldn't check my email.
To the best of my knowledge and according to the kind responses in this
email thread, we cannot claim that an specific measure is better than the
others for imbalanced data. In other words, there are some evaluation
me
For the classifier which do not provide probability estimate of the class
(gives error 'object has no attribute predict_proba " ), is there any easy
way to calculate the posterior probability?
Thank you,
--
Infragistics Pr
I'm getting an error when I try to use the following combinations:
Penalty: l1, loss: l1, regardless of dual
Penalty: l1, loss: l2 when dual=True
Penalty=l2, loss=l1, when dual = False
Is this the expected behavior?
Thank you,
I do think you're right to attempt to improve it! Please submit a PR!
On 29 July 2014 00:05, Pagliari, Roberto wrote:
> You are right.
>
>
>
> I guess only C (in the case of linear SVM) is the best averaged over the
> fold. And once C is found, the weights over the whole training set are
> comp
You are right.
I guess only C (in the case of linear SVM) is the best averaged over the fold.
And once C is found, the weights over the whole training set are computed.
If that's the case, my proposal may be misleading.
Thank you,
Roberto
From: Andy [mailto:t3k...@gmail.com]
Sent: Saturday, J
That is possibly the best explanation of random forest hyper parameters
I've ever read. You really should blog about this since it should get way
more exposure (I know you wrote a thesis about it, but that's not very
accessible to people who aren't specialist in the field).
On Mon, Jul 28, 2014
Hi Kevin,
Interesting question. Your point is true provided you have an infinite
amount of training data. In that case, you can indeed show that an
infinitely large forest of extremely randomized trees built for K=1
converges towards an optimal model (the Bayes model).
This result however does no
29 matches
Mail list logo