Re: [scikit-learn] imbalanced datasets return uncalibrated predictions - why?

2020-11-17 Thread Roman Yurchak
On 17/11/2020 09:57, Sole Galli via scikit-learn wrote: And I understand that it has to do with the cost function, because if we re-balance the dataset with say class_weight = 'balance'. then the probabilities seem to be calibrated as a result. As far I know, logistic regression will have well

Re: [scikit-learn] imbalanced datasets return uncalibrated predictions - why?

2020-11-17 Thread Sean Violante
I am not sure if you are using "calibrated" in the correct sense. Calibrated means that the predictions align with the real world probabilities. so if you have a rare class it should have low probabilities On Tue, Nov 17, 2020 at 9:58 AM Sole Galli via scikit-learn < scikit-learn@python.org> wro

[scikit-learn] imbalanced datasets return uncalibrated predictions - why?

2020-11-17 Thread Sole Galli via scikit-learn
Hello team, I am trying to understand why does logistic regression return uncalibrated probabilities with values tending to low probabilities for the positive (rare) cases, when trained on an imbalanced dataset. I've read a number of articles, all seem to agree that this is the case, many show