I am not sure if you are using "calibrated" in the correct sense.
Calibrated means that the predictions align with the real world
probabilities.
so if you have a rare class it should have low probabilities
On Tue, Nov 17, 2020 at 9:58 AM Sole Galli via scikit-learn <
scikit-learn@python.org> wro
1) You can call fit(X, Y) where Y is a n_samples array of label integers
*or* Y is a n_samples x n_classes array containing *label proportions*.
Matthieu - that's great. In glmnet it is implemented directly as counts
(not proportions) - which may be more natural.
I find it a shame this is not im
lies
>> t = time.time()
>> model = GLM(y, X, family=families.Binomial(link=
>> families.links.logit))
>> result = model.fit_regularized(method='elastic_net', alpha=1.0,
>> L1_wt=0.0, cnvrg_tol=tol, maxiter=maxiter)
>> print "sm.GL
Stuart
have you tried glmnet ( in R) there is a python version
https://web.stanford.edu/~hastie/glmnet_python/
On Thu, Oct 5, 2017 at 6:34 PM, Stuart Reynolds
wrote:
> Thanks Josef. Was very useful.
>
> result.remove_data() reduces a 5 parameter Logit result object from
> megabytes to 5K
Hi Stuart
the underlying logistic regression code in scikit learn (at least for the
non liblinear implementation) allows sample weights which would allow you
to do what you want.
[pass in sample weight Total_Service_Points_Won and target 1 and (
Total_Service_Points_Played-Total_Service_Points_Won
G Reina
you make a bizarre argument. You argue that you should not even check
racism as a possible factor in house prices?
But then you yourself check whether its relevant
Then you say
"but I'd argue that it's more due to the location (near water, near
businesses, near restaurants, near parks and
CW
you might want to read http://greenteapress.com/wp/think-python/
(available as free pdf)
(for basics of programming and python)
and
Python for Data Analysis
Data Wrangling with Pandas, NumPy, and IPython, O'reilly
(for data analysis libraries: pandas, numpy, ipython...)
On Sun, Jun 18
Have you used sparse arrays?
On Fri, Jun 2, 2017 at 7:39 PM, Stuart Reynolds
wrote:
> Hmmm... is it possible to place your original data into a memmap?
> (perhaps will clear out 8Gb, depending on SGDClassifier internals?)
>
> https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html
Sorry just saw you are not using the liblinear solver, agree with
Sebastian, you should subtract mean not median
On 15 Dec 2016 11:02 pm, "Sean Violante" wrote:
> The problem is the (stupid!) liblinear solver that also penalises the
> intercept (in regularisation) . Use a dif
The problem is the (stupid!) liblinear solver that also penalises the
intercept (in regularisation) . Use a different solver or change the
intercept_scaling parameter
On 15 Dec 2016 10:44 pm, "Sebastian Raschka" wrote:
> Subtracting the median wouldn’t result in normalizing the usual sense,
>
because you could have a
situation where
one feature combination occurs 10 times and another feature combination
1000 times
On Mon, Oct 10, 2016 at 3:48 PM, Raphael C wrote:
> On 10 October 2016 at 12:22, Sean Violante
> wrote:
> > no ( but please check !)
> >
> >
i] = 0?
>
> Raphael
>
> On 10 October 2016 at 12:08, Sean Violante
> wrote:
> > should be the sample weight function in fit
> >
> > http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
> LogisticRegression.html
> >
> > On Mon, Oct 10
should be the sample weight function in fit
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
On Mon, Oct 10, 2016 at 1:03 PM, Raphael C wrote:
> I just noticed this about the glm package in R.
> http://stats.stackexchange.com/a/26779/53128
>
> "
> Th
Afarin,
can you please describe your full data set, as maybe you are making a
mistake in how you are setting up the data.
My understanding of what Afarin is saying is that for each person he has a
row for successes and a row for failures (but cannot understand why only
two rows - would expect mult
14 matches
Mail list logo