Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-20 Thread Gael Varoquaux
More importantly than the statement from Sturla, which I may or may not agree with based on the modeling assumption (and every p-value is based on a modeling assumption), the logistic in scikit-learn is a penalized logistic model. Thus the closed-form formulas for p-values are not valid. G On Sa

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-19 Thread josef.pktd
On Sun, Apr 19, 2015 at 9:26 AM, Alan G Isaac wrote: > It seems unlikely that the choice of which features to provide > should turn entirely on controversial philosophical positions. > Hopefully a feature can be declared in or out of scope for the > project on technical grounds. But some design d

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-19 Thread Alan G Isaac
It seems unlikely that the choice of which features to provide should turn entirely on controversial philosophical positions. Hopefully a feature can be declared in or out of scope for the project on technical grounds. Alan Isaac On 4/18/2015 7:20 PM, josef.p...@gmail.com wrote: > Sturla means: No

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Sebastian Raschka
It wouldn't hurt to have p-values returned, but personally, I don't miss them in scikit-learn. I think that's a classic "ML vs. statistics" discussion -- what I mean is the inference vs. prediction stuff. To me, scikit-learn is primarily a machine learning library. > On Apr 19, 2015, at 12:53 A

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Sturla Molden
wrote: > Good, I was reading your previous comments on the topic as being > against all frequentist null hypothesis testing. In the frequentist paradigm I prefer to use model selection instead of classical hypothesis testing with p-values. My focus is on building useful models which are able to

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Sturla Molden
wrote: > Note. The editors of Basic and Applied Social Psychology are also > banning confidence intervals. I know. I am not sure I agree on that. I don't particularly like confidence intervals very much, but I don't hate them with a passion. Pro: Even though confidence intervals have a bizarre

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread josef.pktd
On Sat, Apr 18, 2015 at 9:45 PM, Sturla Molden wrote: > wrote: > >> (I just went through some articles to see how we can produce p-values >> after feature selection with penalized least squares or maximum >> penalized likelihood. :) > > If you have used penalized least squares or penalized likeli

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread josef.pktd
On Sat, Apr 18, 2015 at 9:25 PM, Sturla Molden wrote: > wrote: > >>> Re. "We should therefore never compute p-values": I assume that you meant >>> that within the narrow context of regression, and not, e.g., in the context >>> of tests of distribution. >> >> Sturla means: No null hypothesis testi

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Sturla Molden
wrote: > (I just went through some articles to see how we can produce p-values > after feature selection with penalized least squares or maximum > penalized likelihood. :) If you have used penalized least squares or penalized likelihood, you have already pruned the model for parameters that only

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Sturla Molden
wrote: >> Re. "We should therefore never compute p-values": I assume that you meant >> that within the narrow context of regression, and not, e.g., in the context >> of tests of distribution. > > Sturla means: No null hypothesis testing at all Not really, I mean "no p-values for inferential sta

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread josef.pktd
On Sat, Apr 18, 2015 at 6:40 PM, Phillip Feldman wrote: > This is a very nice explanation. Thanks!! > > Re. "We should therefore never compute p-values": I assume that you meant > that within the narrow context of regression, and not, e.g., in the context > of tests of distribution. Sturla means

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Phillip Feldman
This is a very nice explanation. Thanks!! Re. "We should therefore never compute p-values": I assume that you meant that within the narrow context of regression, and not, e.g., in the context of tests of distribution. On Sat, Apr 18, 2015 at 3:31 PM, Sturla Molden wrote: > Phillip Feldman > w

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Sturla Molden
Phillip Feldman wrote: > When using logistic regression, I'm often trying to establish whether a > given feature has any effect. Compare models with and without the feature: Cross-validation, BIC, AIC, PRESS, Bayes factor, etc. By the rules of inductive reasoning (cf. lex parsimoniae, Occam's

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Phillip Feldman
I was able to accomplish what I needed using `statsmodels.discrete.discrete_model.Logit`. Thanks! On Sat, Apr 18, 2015 at 11:35 AM, Michael Kneier wrote: > Hi Phillip, > > Have you checked out statsmodel? That might be a better fit for your needs. > > Sent from my iPhone > > > On Apr 18, 2015,

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Michael Kneier
Hi Phillip, Have you checked out statsmodel? That might be a better fit for your needs. Sent from my iPhone > On Apr 18, 2015, at 8:31 PM, Phillip Feldman > wrote: > > When using logistic regression, I'm often trying to establish whether a given > feature has any effect. R and Matlab give m

[Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread Phillip Feldman
When using logistic regression, I'm often trying to establish whether a given feature has any effect. R and Matlab give me p-values, but Scikit-learn does not. I would love to be able to do all of my statistical processing in Python. Please consider adding this feature. Phillip M. Feldman -