Re: [Scikit-learn-general] RandomForestClassifier w/ IPython.parallel

2014-02-10 Thread Olivier Grisel
Extra Trees are even more random than random forests. Have a look at the referenced papers. To choose one vs the other you can evaluate the generalization power via cross-validation on your data (you might also want to grid search the optimal parameter values for max_features and min_samples_split

Re: [Scikit-learn-general] RandomForestClassifier w/ IPython.parallel

2014-02-10 Thread Alessandro Gagliardi
This looks perfect. I’m pretty knew to ensemble methods, so please forgive this ignorant question: what’s the difference between ExtraTrees and RandomForests? From http://scikit-learn.org/stable/modules/ensemble.html it looks like ExtraTrees is an extension of RandomForests. Examples of when one

Re: [Scikit-learn-general] RandomForestClassifier w/ IPython.parallel

2014-02-09 Thread Olivier Grisel
2014-02-07 15:09 GMT-08:00 Peter Prettenhofer : > Hi Allessandro, > > you might want to look into this presentation by Olivier > https://speakerdeck.com/ogrisel/growing-randomized-trees-in-the-cloud-1 -- > it should be pretty much what you need. Code is here > https://github.com/pydata/pyrallel. I

Re: [Scikit-learn-general] RandomForestClassifier w/ IPython.parallel

2014-02-08 Thread Gael Varoquaux
There is no support for multi-machine parallel computing in scikit-learn. You'll have to write your own code mimicking the code of the random forest. Gaël On Fri, Feb 07, 2014 at 10:28:01PM +, Alessandro Gagliardi wrote: > Hi All, > I want to run a large sklearn.ensemble.RandomForestClassifi

Re: [Scikit-learn-general] RandomForestClassifier w/ IPython.parallel

2014-02-07 Thread Peter Prettenhofer
Hi Allessandro, you might want to look into this presentation by Olivier https://speakerdeck.com/ogrisel/growing-randomized-trees-in-the-cloud-1 -- it should be pretty much what you need. Code is here https://github.com/pydata/pyrallel. best, Peter 2014-02-07 23:28 GMT+01:00 Alessandro Gagliar

[Scikit-learn-general] RandomForestClassifier w/ IPython.parallel

2014-02-07 Thread Alessandro Gagliardi
Hi All, I want to run a large sklearn.ensemble.RandomForestClassifier (with maybe a dozens or maybe hundreds of trees and 100,000 samples). My desktop won’t handle this so I want to try using StarCluster. RandomForestClassifier seems to parallelize easily, but I don’t know how I would split it