Re: [Scikit-learn-general] predict_proba() in svm.NuSVC and svm.SVC

2012-09-25 Thread Gael Varoquaux
On Mon, Sep 24, 2012 at 06:34:36PM +0200, Sheila the angel wrote: > 1. I am trying to understand how exactly this probability is calculated. The > document says "probability model is created using cross validation" Using Platt scaling that is calibrated by cross-validation. > So I think to calcul

Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-25 Thread Gael Varoquaux
Hi Ariel, On Tue, Sep 25, 2012 at 05:44:21PM -0700, Ariel Rokem wrote: > Initially, I suspected that this has to do with the non-negativity > constraint I applied, so I removed that. Indeed, if you are imposing positivity, you do not have a least square. > Then, I was wondering whether it might

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Mathieu Blondel
On Wed, Sep 26, 2012 at 3:52 AM, Doug Coleman wrote: > > > If you examine the code, fit() "warms up" the optimization with some > additional parameters, then calls _partial_fit(). partial_fit() just > calls _partial_fit() directly. So, it looks like fit() and > partial_fit() could take a `classes`

[Scikit-learn-general] Still trying to understand ElasticNet

2012-09-25 Thread Ariel Rokem
Hi everyone, I am still trying to understand ElasticNet. Here's my description (from a previous thread) of the kind of problem I am trying to solve: On Mon, Sep 17, 2012 at 9:56 AM, Ariel Rokem wrote: > I am using the sklearn.linear_model.ElasticNet class to fit some data. The > structure of th

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Olivier Grisel
I think we could have `classes=None` constructor parameter in SGDClassifier an possibly many other classifiers. When provided we would not use the traditional `self.classes_ = np.unique(y)` idiom already implemented in some classifiers of the project (but not all). +1 also for raising a ValueError

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Doug Coleman
I'm not necessarily looking for a quick fix here, and anything I would consider trying to contribute to scikit would be useful and correct. You're right that there's not a good chance it can learn to predict with sparse output classes, but if the problem were easy, then I wouldn't need scikit at a

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Gael Varoquaux
On Tue, Sep 25, 2012 at 10:31:10AM -0700, Doug Coleman wrote: > I'm making an ensemble of trees by hand for classification and trying > to merge their outputs with predict_proba. My labels are integers > -2..2. The problem is that -2 and 2 are rare labels. Now assume that I > train these trees with

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Doug Coleman
I'd love to submit a patch. Looking at SGDClassifier docs, the __init__ doesn't take a classes parameter, but instead there's a partial_fit() that takes `classes` exactly like I'd except. However, the docs for partial_fit() are exactly the same as for fit(). If you examine the code, fit() "warms

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Lars Buitinck
2012/9/25 Doug Coleman : > label. So to merge predictions from the trees, now I have to do > bookkeeping to remember which trees had which labels in them, and it's > a mess. You did discover the classes_ attribute, did you? That keeps track of the classes found in y by fit and solves part of the b

[Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Doug Coleman
Hi, I'm making an ensemble of trees by hand for classification and trying to merge their outputs with predict_proba. My labels are integers -2..2. The problem is that -2 and 2 are rare labels. Now assume that I train these trees with different but related data sets, some of which don't even contai

Re: [Scikit-learn-general] TF-Idf

2012-09-25 Thread Olivier Grisel
2012/9/24 Ark : > Olivier Grisel writes: > >> You can use the Pipeline class to build a compound classifier that >> binds a text feature extractor with a classifier to get a text >> document classifier in the end. >> > Done! > >> >> 7s is very long. How long is your text document in bytes ? > The

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-25 Thread Olivier Grisel
2012/9/24 Joseph Turian : > Chris Lin iirc has advocated partitioning the examples then concatenation the > individual classifiers. > > You could do that and then do a second pass of learning: find the 1% of > examples that are the hardest for the ensemble and learn over them. > > Regardless, it

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-25 Thread Joseph Turian
Chris Lin iirc has advocated partitioning the examples then concatenation the individual classifiers. You could do that and then do a second pass of learning: find the 1% of examples that are the hardest for the ensemble and learn over them. Regardless, it will be adhoc unless you use an out of