Re: [scikit-learn] Why ridge regression can solve multicollinearity?

2020-01-08 Thread Stuart Reynolds
Correlated features typically have the property that they are tending to be similarly predictive of the outcome. L1 and L2 are both a preference for low coefficients. If a coefficient can be reduced yet another coefficient maintains similar loss, the these regularization methods prefer this soluti

Re: [scikit-learn] scikit-learn Digest, Vol 43, Issue 11

2019-10-06 Thread Stuart Reynolds
Pandas has a read_excel function that can load data from an excel spreadsheet: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html On Sun, Oct 6, 2019 at 1:57 AM Mike Smith wrote: > Can I call an MSExcel cell range in a function such as model.predict(), > instead o

Re: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

2019-05-29 Thread Stuart Reynolds
I looked into like a while ago. There were differences in which algorithms regularized the intercept, and which ones do not. (I believe liblinear does, lbgfs does not). All of the algorithms disagreed with logistic regression in scipy. - Stuart On Wed, May 29, 2019 at 10:50 AM Andreas Mueller wr

[scikit-learn] AUCROC/MAP confidence intervals in scikit

2019-02-06 Thread Stuart Reynolds
https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf Does scikit (or other Python libraries) provide functions to measure the confidence interval of AUROC scores? Same question also for mean average precision. It seems like this should be a standard results r

Re: [scikit-learn] Is there regression algo with 3-d input?

2018-12-05 Thread Stuart Reynolds
Would the output be different if you simply wrapped the whole process with reshaping 3D input to 2d? On Wed, Dec 5, 2018 at 7:14 PM lampahome wrote: > I want to regress time series prediction per week, so the unit of train > data X is the day ex: Mon, Tue, Wed...etc. > > Ex: train data X is like

Re: [scikit-learn] RFE with logistic regression

2018-07-24 Thread Stuart Reynolds
liblinear regularizes the intercept (which is a questionable thing to do and a poor choice of default in sklearn). The other solvers do not. On Tue, Jul 24, 2018 at 4:07 AM, BenoƮt Presles wrote: > Dear scikit-learn users, > > I am using the recursive feature elimination (RFE) tool from sklearn t

Re: [scikit-learn] Jeff Levesque: profit functionality

2018-06-11 Thread Stuart Reynolds
Scikit has a section on 'GLMs' 1.1. Generalized Linear Models http://scikit-learn.org/stable/modules/linear_model.html not covered there? (That page doesn't look like GLMs -- mostly it covers different fitting, loss and regularlization methids, but not general functional distributions). If not, ch

Re: [scikit-learn] PyCM: Multiclass confusion matrix library in Python

2018-05-31 Thread Stuart Reynolds
Hi Sepand, Thanks for this -- looks useful. I had to write something similar (for the binary case) and wish scikit had something like this. I wonder if there's something similar for the binary class case where, the prediction is a real value (activation) and from this we can also derive - CMs fo

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-18 Thread Stuart Reynolds
Good know -- thank you. On Fri, Oct 6, 2017 at 5:25 AM, wrote: > > > On Thu, Oct 5, 2017 at 3:27 PM, wrote: >> >> >> >> On Thu, Oct 5, 2017 at 2:52 PM, Stuart Reynolds >> wrote: >>> >>> Turns out sm.Logit does allow setting the

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread Stuart Reynolds
ed glmnet ( in R) there is a python version > https://web.stanford.edu/~hastie/glmnet_python/ > > > > > On Thu, Oct 5, 2017 at 6:34 PM, Stuart Reynolds > wrote: >> >> Thanks Josef. Was very useful. >> >> result.remove_data() reduces a 5 parameter L

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread Stuart Reynolds
tol=tol, gtol=tol, pgtol=tol, #maxiter=maxiter, ##full_output=False, disp=DISP) print "sm.GLM.fit", method, time.time() - t On Thu, Oct 5, 2017 at 10:32 AM, Sean Violante wrote: > Stu

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread Stuart Reynolds
ppers in statsmodels for doing this or should I roll my own? - Stu On Wed, Oct 4, 2017 at 3:43 PM, wrote: > > > On Wed, Oct 4, 2017 at 4:26 PM, Stuart Reynolds > wrote: >> >> Hi Andy, >> Thanks -- I'll give another statsmodels another go. >> I rememb

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-04 Thread Stuart Reynolds
n, and it's a concept that only applies to a subset of models. > We try to have a consistent interface for all our estimators, and > this doesn't really fit well within that interface. > > Hth, > Andy > > > On 10/04/2017 03:58 PM, Stuart Reynolds wrote: >&g

[scikit-learn] Can fit a model with a target array of probabilities?

2017-10-04 Thread Stuart Reynolds
I'd like to fit a model that maps a matrix of continuous inputs to a target that's between 0 and 1 (a probability). In principle, I'd expect logistic regression should work out of the box with no modification (although its often posed as being strictly for classification, its loss function allows

[scikit-learn] Confidence interval estimation for probability estimators

2017-10-03 Thread Stuart Reynolds
Let's say I have a base estimator that predicts the likelihood of an binary (Bernoulli) outcome: model.fit(X, y) where y contains [0 or 1] P = model.predict(X)/predict_proba(X) give values in the range [0 to 1] (model here might be a calibrated LogisticRegression model). Is there a way to est

[scikit-learn] Decision stubs?

2017-08-27 Thread Stuart Reynolds
Is it possible to efficiently get at the branch statistics that decision tree algorithms iterate over in scikit? For example if the root population has the class counts in the output vector: c0: 5000 c1: 500 Then I'd like to iterate over: # For a boolean (2 valued category) f1=True:

Re: [scikit-learn] question about class_weights in LogisticRegression

2017-08-01 Thread Stuart Reynolds
I hope not. And not accoring to the docs... https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/linear_model/logistic.py#L947 class_weight : dict or 'balanced', optional Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to h

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Stuart Reynolds
+1 LCS and its many many variants seem very practical and adaptable. I'm not sure why they haven't gotten traction. Overshadowed by GBM & random forests? On Fri, Jul 21, 2017 at 11:52 AM, Sebastian Raschka wrote: > Just to throw some additional ideas in here. Based on a conversation with a > co

Re: [scikit-learn] Max f1 score for soft classifier?

2017-07-17 Thread Stuart Reynolds
> curve. > > On 18 July 2017 at 02:41, Stuart Reynolds > wrote: > >> Does scikit have a function to find the maximum f1 score (and decision >> threshold) for a (soft) classifier? >> >> - Stuart >> > _

Re: [scikit-learn] Max f1 score for soft classifier?

2017-07-17 Thread Stuart Reynolds
And... with that in mind -- are there methods that explicitly try to optimize the f1 score? On Mon, Jul 17, 2017 at 9:41 AM, Stuart Reynolds wrote: > Does scikit have a function to find the maximum f1 score (and decision > threshold) for a (soft) classifier? > &

[scikit-learn] Max f1 score for soft classifier?

2017-07-17 Thread Stuart Reynolds
Does scikit have a function to find the maximum f1 score (and decision threshold) for a (soft) classifier? - Stuart ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Ipython Jupyter Kernel Dies when I fit an SGDClassifier

2017-06-02 Thread Stuart Reynolds
Hmmm... is it possible to place your original data into a memmap? (perhaps will clear out 8Gb, depending on SGDClassifier internals?) https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html https://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas - Stuart On

Re: [scikit-learn] Logistic regression with elastic net regularization

2017-03-14 Thread Stuart Reynolds
; Best, > Sebastian > > > On Mar 13, 2017, at 12:57 PM, Stuart Reynolds > wrote: > > > > Is there an implementation of logistic regression with elastic net > regularization in scikit? > > (or pointers on implementing this - its seems non-convex and s

Re: [scikit-learn] Logistic regression with elastic net regularization

2017-03-13 Thread Stuart Reynolds
Perfect. Thanks -- will give it a go. On Mon, Mar 13, 2017 at 10:04 AM, Jacob Schreiber wrote: > Hi Stuart > > Take a look at this issue: https://github.com/scikit-learn/scikit-learn/ > issues/2968 > > On Mon, Mar 13, 2017 at 9:57 AM, Stuart Reynolds < > stu...@s

Re: [scikit-learn] Differences between scikit-learn and Spark.ml for regression toy problem

2017-03-13 Thread Stuart Reynolds
Both libraries are heavily parameterized. You should check what the defaults are for both. Some ideas: - What regularization is being used. L1/L2? - Does the regularization parameter have the same interpretation? 1/C = lambda? Some libraries use C. Some use lambda. - Also, some libraries regula

[scikit-learn] Logistic regression with elastic net regularization

2017-03-13 Thread Stuart Reynolds
Is there an implementation of logistic regression with elastic net regularization in scikit? (or pointers on implementing this - its seems non-convex and so you might expect poor behavior with some optimizers) - Stuart ___ scikit-learn mailing list scik

[scikit-learn] Modelling event rates

2017-02-17 Thread Stuart Reynolds
Does scikit provide any event-rate/time-to-event models, or other models that are specifically time-dependent? (e.g. models that output the # events per unit of time) Examples might include: Poisson model, or Cox proportional hazard. There was some discussion about pulling from statsmodels, htt

Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn

2017-02-03 Thread Stuart Reynolds
The statsmodels package may have more of this kind of thing. http://statsmodels.sourceforge.net/devel/glm.html http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModelResults.pvalues.html?highlight=pvalue I assume you're talking about pvalues for a mode

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-10 Thread Stuart Reynolds
lidation sample of ~6 datapoints.. I'm >> still very skeptical of that giving you proper results for a complex model. >> Will this larger dataset be of exactly the same data? Just taking another >> unrelated dataset and showing that a MLP can learn it doesn'

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-09 Thread Stuart Reynolds
If you dont have a large dataset, you can still do leave one out cross validation. On Mon, Jan 9, 2017 at 3:42 PM Thomas Evangelidis wrote: > > Jacob & Sebastian, > > I think the best way to find out if my modeling approach works is to find > a larger dataset, split it into two parts, the first

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Stuart Reynolds
stion is whether it's possible to improve the estimator by > additionally adjusting the mean or the threshold for 0-1 predictions. It > might depend on the criteria to choose the penalization. I don't know and > have no idea what scikit-learn does. > > Josef > >

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Stuart Reynolds
a bias for imbalanced data. Can you tell me more? Why > does it not appear with the relaxed regularization? Also, using the same > data with statsmodels LR, which has no regularization, this doesn't seem to > be a problem. Any suggestions for > > how I could fix this are welc

Re: [scikit-learn] Model checksums

2016-12-15 Thread Stuart Reynolds
Sent from my phone. Please forgive brevity and mis spelling > On Dec 13, 2016, at 19:29, Stuart Reynolds > wrote: > >> I'd like to cache some functions to avoid rebuilding models like so: >> >> @cached >> def train(model, dataparams): ... >>

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Stuart Reynolds
LR is biased with imbalanced datasets. Is your dataset unbalanced? (e.g. is there one class that has a much smaller prevalence in the data that the other)? On Thu, Dec 15, 2016 at 1:02 PM, Rachel Melamed wrote: > I just tried it and it did not appear to change the results at all? > I ran it as f

Re: [scikit-learn] Scikit Learn Random Classifier - TPR and FPR plotted on matplotlib

2016-12-14 Thread Stuart Reynolds
You're looking at a tiny subset of the possible cutoff thresholds for this classifier. Lower thresholds will give higher tot at the expense of tpr. Usually, AUC is computed at the integral of this graph over the whole range of FPRs (from zero to one). If you have your classifier output probabiliti

Re: [scikit-learn] Why do DTs have a different fit protocol than NB and SVMs?

2016-12-13 Thread Stuart Reynolds
I think he's asking whether returning the model is part of the API (i.e. is it a bug that SVM and NB don't return self?). On Tue, Dec 13, 2016 at 12:23 PM, Jacob Schreiber wrote: > The fit method returns the object itself, so regardless of which way you > do it, it will work. The reason the fit

[scikit-learn] Model checksums

2016-12-13 Thread Stuart Reynolds
I'd like to cache some functions to avoid rebuilding models like so: @cached def train(model, dataparams): ... model is an (untrained) scikit-learn object and dataparams is a dict. The @cached annotation forms a SHA checksum out of the parameters of the function it annotates and returns

[scikit-learn] Missing data and decision trees

2016-10-13 Thread Stuart Reynolds
I'm looking for a decision tree and RF implementation that supports missing data (without imputation) -- ideally in Python, Java/Scala or C++. It seems that scikit's decision tree algorithm doesn't allow this -- which is disappointing because its one of the few methods that should be able to sensi