Re: [Scikit-learn-general] cross validation with random forests

2014-09-29 Thread Andy
Maybe some of the tree huggers can say something about that ;) Below are my best guess. I am surprised to see that the docs say no regularization is usually best. I would not use such large upper bounds as you did, and I would never search the full range, but rather steps to get only a few cand

Re: [Scikit-learn-general] cross validation with random forests

2014-09-27 Thread Romaniuk, Michal
Hi Satra, In my experience, adjusting max_features can make some difference (I work with image data). Cheers, Michal -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Statu

Re: [Scikit-learn-general] cross validation with random forests

2014-09-27 Thread Satrajit Ghosh
thanks andy. are there any general heuristics for these parameters - given that their ranges are over the samples? max_depth = range(1, nsamples) or min_samples_leaves = range(1, nsamples) also related question: given that nsamples would actually depend on the cv method of the GridSearchCV, is t

Re: [Scikit-learn-general] cross validation with random forests

2014-09-26 Thread Andy
Hi Satra. You should set "n_estimators" as high as you can afford time and memory wise, and then cross-validate over (at least) one of the regularization parameters, for example over max_depth or min_samples_leaves. You can also search over max_features. Cheers, Andy On 09/26/2014 10:24 PM,

Re: [Scikit-learn-general] cross validation with random forests

2014-09-26 Thread Satrajit Ghosh
hi folks, what are some useful ranges of parameters to throw into a grid search? and are there specific difference between randomforests and extra trees? i understand one could try different impurity measures for classification, but any suggestions on sensitivity of other parameters would be nice.

Re: [Scikit-learn-general] cross validation with random forests

2014-09-25 Thread Andy
On 09/23/2014 11:50 PM, Pagliari, Roberto wrote: I’m a bit confused as to why gridsearchCV is not needed with random forests. I understand that with RF, each tree will only get to see a partial representation of the data. Why do you say GridSearchCV is not needed? I think it should always b

Re: [Scikit-learn-general] cross validation with random forests

2014-09-23 Thread Joel Nothman
You can indeed tune parameters of the RF with grid search, and the score method will be used although you could specify a different task metric to GridSearchCV's scoring parameter. On 24 September 2014 07:50, Pagliari, Roberto wrote: > I’m a bit confused as to why gridsearchCV is not needed with

[Scikit-learn-general] cross validation with random forests

2014-09-23 Thread Pagliari, Roberto
I'm a bit confused as to why gridsearchCV is not needed with random forests. I understand that with RF, each tree will only get to see a partial representation of the data. However, if I wanted to tune some parameters of the RF, wouldn't I still need to do gridsearch? If that is the case, does