Re: [scikit-learn] imbalanced datasets return uncalibrated predictions - why?

2020-11-17 Thread Sean Violante
I am not sure if you are using "calibrated" in the correct sense. Calibrated means that the predictions align with the real world probabilities. so if you have a rare class it should have low probabilities On Tue, Nov 17, 2020 at 9:58 AM Sole Galli via scikit-learn < scikit-learn@python.org> wro

Re: [scikit-learn] Sparse predict_proba and Fenchel-Young losses

2018-10-26 Thread Sean Violante
1) You can call fit(X, Y) where Y is a n_samples array of label integers *or* Y is a n_samples x n_classes array containing *label proportions*. Matthieu - that's great. In glmnet it is implemented directly as counts (not proportions) - which may be more natural. I find it a shame this is not im

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread Sean Violante
lies >> t = time.time() >> model = GLM(y, X, family=families.Binomial(link= >> families.links.logit)) >> result = model.fit_regularized(method='elastic_net', alpha=1.0, >> L1_wt=0.0, cnvrg_tol=tol, maxiter=maxiter) >> print "sm.GL

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread Sean Violante
Stuart have you tried glmnet ( in R) there is a python version https://web.stanford.edu/~hastie/glmnet_python/ On Thu, Oct 5, 2017 at 6:34 PM, Stuart Reynolds wrote: > Thanks Josef. Was very useful. > > result.remove_data() reduces a 5 parameter Logit result object from > megabytes to 5K

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread Sean Violante
Hi Stuart the underlying logistic regression code in scikit learn (at least for the non liblinear implementation) allows sample weights which would allow you to do what you want. [pass in sample weight Total_Service_Points_Won and target 1 and ( Total_Service_Points_Played-Total_Service_Points_Won

Re: [scikit-learn] Replacing the Boston Housing Prices dataset

2017-07-06 Thread Sean Violante
G Reina you make a bizarre argument. You argue that you should not even check racism as a possible factor in house prices? But then you yourself check whether its relevant Then you say "but I'd argue that it's more due to the location (near water, near businesses, near restaurants, near parks and

Re: [scikit-learn] R user trying to learn Python

2017-06-18 Thread Sean Violante
CW you might want to read http://greenteapress.com/wp/think-python/ (available as free pdf) (for basics of programming and python) and Python for Data Analysis Data Wrangling with Pandas, NumPy, and IPython, O'reilly (for data analysis libraries: pandas, numpy, ipython...) On Sun, Jun 18

Re: [scikit-learn] Ipython Jupyter Kernel Dies when I fit an SGDClassifier

2017-06-03 Thread Sean Violante
Have you used sparse arrays? On Fri, Jun 2, 2017 at 7:39 PM, Stuart Reynolds wrote: > Hmmm... is it possible to place your original data into a memmap? > (perhaps will clear out 8Gb, depending on SGDClassifier internals?) > > https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Sean Violante
Sorry just saw you are not using the liblinear solver, agree with Sebastian, you should subtract mean not median On 15 Dec 2016 11:02 pm, "Sean Violante" wrote: > The problem is the (stupid!) liblinear solver that also penalises the > intercept (in regularisation) . Use a dif

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Sean Violante
The problem is the (stupid!) liblinear solver that also penalises the intercept (in regularisation) . Use a different solver or change the intercept_scaling parameter On 15 Dec 2016 10:44 pm, "Sebastian Raschka" wrote: > Subtracting the median wouldn’t result in normalizing the usual sense, >

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Sean Violante
because you could have a situation where one feature combination occurs 10 times and another feature combination 1000 times On Mon, Oct 10, 2016 at 3:48 PM, Raphael C wrote: > On 10 October 2016 at 12:22, Sean Violante > wrote: > > no ( but please check !) > > > >

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Sean Violante
i] = 0? > > Raphael > > On 10 October 2016 at 12:08, Sean Violante > wrote: > > should be the sample weight function in fit > > > > http://scikit-learn.org/stable/modules/generated/sklearn.linear_model. > LogisticRegression.html > > > > On Mon, Oct 10

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Sean Violante
should be the sample weight function in fit http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html On Mon, Oct 10, 2016 at 1:03 PM, Raphael C wrote: > I just noticed this about the glm package in R. > http://stats.stackexchange.com/a/26779/53128 > > " > Th

Re: [scikit-learn] scikit-learn Digest, Vol 6, Issue 40

2016-09-28 Thread Sean Violante
Afarin, can you please describe your full data set, as maybe you are making a mistake in how you are setting up the data. My understanding of what Afarin is saying is that for each person he has a row for successes and a row for failures (but cannot understand why only two rows - would expect mult