Re: [Scikit-learn-general] Machine Learning on Large Data Sets

2012-11-17 Thread Richard T. Guy
It only makes sense train a tree on a subset as part of an ensembl method, and in that case you can train a set of trees by training each one on a subset of the data (be sure to randomly choose the subset though). It's true that ensembl methods like RandomForest don't have partial_fit, but you cou

Re: [Scikit-learn-general] random forest question

2012-10-27 Thread Richard T. Guy
That explains the confusion! Thanks, guys. Tommy On Sat, Oct 27, 2012 at 5:25 AM, Joseph Turian wrote: > Gilles, > > I met Tommy Guy at the pydata conference today. > If I remember correctly, Brian Eoff (I don't have his email address) > errantly said that random forests partitions/samples the

[Scikit-learn-general] random forest question

2012-10-26 Thread Richard T. Guy
Hey Scikit-Learn, I've been working on some changes to the RandomForest code and I had a few questions. First, it looks like the function def _partition_features(forest, n_total_features): partitions features evenly across cores. Am I reading that correctly? If so, does this mean that on 2 cores

Re: [Scikit-learn-general] wiserf vs. sklearn RF

2012-08-27 Thread Richard T. Guy
I wonder what their core tree algorithms are and how fast they are. It seems they're achieving those speeds by either A) Significantly optimizing the tree learner or B) Using a smart cutoff on the size of the forest based Tommy Guy On Mon, Aug 27, 2012 at 10:14 AM, Jaques Grobler wrote: > cool l