Re: [Scikit-learn-general] RF optimisation - class weights etc.

2012-11-07 Thread Gilles Louppe
Hello Paul, > Do fully developed trees make sense for rather small datasets? Overall, I > have 622 samples with 177 features each. Isn't there the risk of > overfitting? Yes, overfitting might happen, but it should be limited since you are building randomized trees and average them together. > >

Re: [Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Paul . Czodrowski
> b) You shouldn't set max_depth=5. Instead, build fully developed trees > (max_depth=None) or rather tune min_samples_split using > cross-validation. Dear Gilles, I have set up a grid search: " tuned_parameters = [{'min_samples_split': [1,2,3,4,5,6,7,8,9]}] scores = [('precision', precision_sc

Re: [Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Paul . Czodrowski
Dear Gilles, > Hi Paul, > > a) Scaling has no effect on decision trees. Thanks! > > b) You shouldn't set max_depth=5. Instead, build fully developed trees > (max_depth=None) or rather tune min_samples_split using > cross-validation. Do fully developed trees make sense for rather small datasets?

Re: [Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Gilles Louppe
Hi Paul, a) Scaling has no effect on decision trees. b) You shouldn't set max_depth=5. Instead, build fully developed trees (max_depth=None) or rather tune min_samples_split using cross-validation. Hope this helps. Gilles On 6 November 2012 16:21, wrote: > > ear SciKitters, > > given a rathe

[Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Paul . Czodrowski
ear SciKitters, given a rather unbalanced data set (454 samples with classification "0" and 168 samples with classification "1"), I would like to train a RandomForest. For my data set, I have calculated 177 features per sample. In a first step, I have preprocessed my data set: " dataDescrs_array