Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-05 Thread josef.pktd
On Mon, Oct 5, 2015 at 10:05 PM, Sturla Molden wrote: > On 06/10/15 00:35, josef.p...@gmail.com wrote: > > > rate in the sense of proportion is between zero and 1. > > Rate usually refers to "events per unit of time or exposure", so we can > either count events in intervals or record time-stamps

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-05 Thread Sturla Molden
On 06/10/15 00:35, josef.p...@gmail.com wrote: > rate in the sense of proportion is between zero and 1. Rate usually refers to "events per unit of time or exposure", so we can either count events in intervals or record time-stamps as our dependent variable. If the stochastic counting process is

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-05 Thread josef.pktd
On Mon, Oct 5, 2015 at 6:15 PM, Sturla Molden wrote: > On 04/10/15 05:07, George Bezerra wrote: > > > I am trying to follow this paper: > > > http://research.microsoft.com/en-us/um/people/mattri/papers/www2007/predictingclicks.pdf > > (check out section 6.2). They use logistic regression as a reg

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-05 Thread Sturla Molden
On 04/10/15 05:07, George Bezerra wrote: > I am trying to follow this paper: > http://research.microsoft.com/en-us/um/people/mattri/papers/www2007/predictingclicks.pdf > (check out section 6.2). They use logistic regression as a regression > model to predict the click through rate (which is contin

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-05 Thread Andreas Mueller
On 10/03/2015 11:11 PM, Michael Eickenberg wrote: > Hi George, > > completely agreed that np.unique on continuous targets is messy - I > have run into the same problem. > It's fixed here: https://github.com/scikit-learn/scikit-learn/pull/5084

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-04 Thread Mathieu Blondel
I've seen logistic regression used in a regression setting in a few papers as well. A nice thing is that the predictions are mapped to [0, 1]. The correct way to add this to scikit-learn would be to add a regression class `LogisticRegressor` and rename the existing class to `LogisticClassifier`. T

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread Michael Eickenberg
Hi George, completely agreed that np.unique on continuous targets is messy - I have run into the same problem. If I remember correctly, you can work around this by using sample_weight to inject the continuous target into the cross entropy loss: If p_i are the targets, then duplicate each sample,

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread josef.pktd
On Sat, Oct 3, 2015 at 11:54 PM, George Bezerra wrote: > Thanks a lot Josef. I guess it is possible to do what I wanted, though > maybe not in scikit. Does the statsmodels version allow l1 or l2 > regularization? I'm planning to use a lot of features and let the model > decide what is good. > > s

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread George Bezerra
Thanks a lot Josef. I guess it is possible to do what I wanted, though maybe not in scikit. Does the statsmodels version allow l1 or l2 regularization? I'm planning to use a lot of features and let the model decide what is good. Thanks again. On Sat, Oct 3, 2015 at 11:20 PM, wrote: > Just to co

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread josef.pktd
Just to come in here as an econometrician and statsmodels maintainer. statsmodels intentionally doesn't enforce binary data for Logit or similar models, any data between 0 and 1 is fine. Logistic Regression/Logit or similar Binomial/Bernoulli models can consistently estimate the expected value (p

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread George Bezerra
*I meant section 5. On Sat, Oct 3, 2015 at 11:07 PM, George Bezerra wrote: > Thanks Sebastian. > > I am trying to follow this paper: > http://research.microsoft.com/en-us/um/people/mattri/papers/www2007/predictingclicks.pdf > (check out section 6.2). They use logistic regression as a regression

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread George Bezerra
Thanks Sebastian. I am trying to follow this paper: http://research.microsoft.com/en-us/um/people/mattri/papers/www2007/predictingclicks.pdf (check out section 6.2). They use logistic regression as a regression model to predict the click through rate (which is continuous). A linear regression mod

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread Sebastian Raschka
Hi, George, logistic regression is a binary classifier by nature (class labels 0 and 1). Scikit-learn supports multi-class classification via One-vs-One or One-vs-All though; and there is a generalization (softmax) that gives you meaningful probabilities for multiple classes (i.e., class probabi

[Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread George Bezerra
Hi there, I would like to train a logistic regression model on a continuous (i.e., not categorical) target variable. The target is a probability, which is why I am using a logistic regression for this problem. However, the sklearn function tries to find the class labels by running a unique() on th