Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-26 Thread Joseph Turian
My mistake, I meant Jimmy Lin: MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail! http://arxiv.org/abs/1209.2191 On Tue, Sep 25, 2012 at 2:28 AM, Olivier Grisel wrote: > 2012/9/24 Joseph Turian : >> Chris Lin iirc has advocated partitioning the examp

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-25 Thread Olivier Grisel
2012/9/24 Joseph Turian : > Chris Lin iirc has advocated partitioning the examples then concatenation the > individual classifiers. > > You could do that and then do a second pass of learning: find the 1% of > examples that are the hardest for the ensemble and learn over them. > > Regardless, it

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-25 Thread Joseph Turian
Chris Lin iirc has advocated partitioning the examples then concatenation the individual classifiers. You could do that and then do a second pass of learning: find the 1% of examples that are the hardest for the ensemble and learn over them. Regardless, it will be adhoc unless you use an out of

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-24 Thread Olivier Grisel
I think @glouppe is likely to contribute some evolution for the ensembles of trees models once he gets back from ECML 2012 where he has a paper on those issues. -- Live Security Virtual Conference Exclusive live event will

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-24 Thread Christian Jauvin
Thank you Olivier for these suggestions. I'd try/test them with pleasure, but meanwhile I discovered that there was just no way the dataset I was trying to use would ever fit in the 72GB of memory of the machine I'm using. So I just scaled it down, and obviously this error is not happening anymore

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-22 Thread Olivier Grisel
2012/9/22 Christian Jauvin : > Hi, > > I have been doing multiple experiments using a RandomForestClassifier > (trained with the parallel code option) recently, without encountering > any particular problem. However as soon as I began using a much bigger > dataset (with the exact same code), I got

[Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-22 Thread Christian Jauvin
Hi, I have been doing multiple experiments using a RandomForestClassifier (trained with the parallel code option) recently, without encountering any particular problem. However as soon as I began using a much bigger dataset (with the exact same code), I got this threading error: Exception in thre