Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Gilles Louppe
> I'm basically looking to take pre-trained classifiers and allows you > to combine the predicted probabilities in custom ways, like favoring > some classifiers over others, etc. > > Not that RandomForests™ are not useful--they could be the building > block classifiers in such a system. > > @Oliver

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Andreas Mueller
On 09/25/2012 11:19 PM, Olivier Grisel wrote: > I think we could have `classes=None` constructor parameter in > SGDClassifier an possibly many other classifiers. When provided we > would not use the traditional `self.classes_ = np.unique(y)` idiom > already implemented in some classifiers of the pr

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Doug Coleman
@Gilles, Thanks for the link. Those classes basically implement a paper that has a specific idea of RandomForests™ (no kidding, it's trademarked), with bootstrapping, oob estimation, and n trees trained on the same data. I'm basically looking to take pre-trained classifiers and allows you to comb

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Gilles Louppe
@Doug: Sorry I was typing my previous response from my phone. The snippet of code that I was talking about can be found at: https://github.com/glouppe/scikit-learn/blob/master/sklearn/ensemble/forest.py#L93 Cheers, Gilles On Wednesday, 26 September 2012, Gilles Louppe wrote: > Hi, > > The ense

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Gilles Louppe
Hi, The ensemble classes handle the problem you describe already. Have a look at the implementation of predict_proba of BaseForestClassifier in ensemble.py if you want to do that yourself by hand. Hope this helps. Gilles On Wednesday, 26 September 2012, Mathieu Blondel wrote: > > > On Wed, Sep

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Mathieu Blondel
On Wed, Sep 26, 2012 at 3:52 AM, Doug Coleman wrote: > > > If you examine the code, fit() "warms up" the optimization with some > additional parameters, then calls _partial_fit(). partial_fit() just > calls _partial_fit() directly. So, it looks like fit() and > partial_fit() could take a `classes`

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Olivier Grisel
I think we could have `classes=None` constructor parameter in SGDClassifier an possibly many other classifiers. When provided we would not use the traditional `self.classes_ = np.unique(y)` idiom already implemented in some classifiers of the project (but not all). +1 also for raising a ValueError

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Doug Coleman
I'm not necessarily looking for a quick fix here, and anything I would consider trying to contribute to scikit would be useful and correct. You're right that there's not a good chance it can learn to predict with sparse output classes, but if the problem were easy, then I wouldn't need scikit at a

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Gael Varoquaux
On Tue, Sep 25, 2012 at 10:31:10AM -0700, Doug Coleman wrote: > I'm making an ensemble of trees by hand for classification and trying > to merge their outputs with predict_proba. My labels are integers > -2..2. The problem is that -2 and 2 are rare labels. Now assume that I > train these trees with

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Doug Coleman
I'd love to submit a patch. Looking at SGDClassifier docs, the __init__ doesn't take a classes parameter, but instead there's a partial_fit() that takes `classes` exactly like I'd except. However, the docs for partial_fit() are exactly the same as for fit(). If you examine the code, fit() "warms

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Lars Buitinck
2012/9/25 Doug Coleman : > label. So to merge predictions from the trees, now I have to do > bookkeeping to remember which trees had which labels in them, and it's > a mess. You did discover the classes_ attribute, did you? That keeps track of the classes found in y by fit and solves part of the b

[Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-25 Thread Doug Coleman
Hi, I'm making an ensemble of trees by hand for classification and trying to merge their outputs with predict_proba. My labels are integers -2..2. The problem is that -2 and 2 are rare labels. Now assume that I train these trees with different but related data sets, some of which don't even contai