Hello Paul,
> Do fully developed trees make sense for rather small datasets? Overall, I
> have 622 samples with 177 features each. Isn't there the risk of
> overfitting?
Yes, overfitting might happen, but it should be limited since you are
building randomized trees and average them together.
>
>
> b) You shouldn't set max_depth=5. Instead, build fully developed trees
> (max_depth=None) or rather tune min_samples_split using
> cross-validation.
Dear Gilles,
I have set up a grid search:
"
tuned_parameters = [{'min_samples_split': [1,2,3,4,5,6,7,8,9]}]
scores = [('precision', precision_sc
Dear Gilles,
> Hi Paul,
>
> a) Scaling has no effect on decision trees.
Thanks!
>
> b) You shouldn't set max_depth=5. Instead, build fully developed trees
> (max_depth=None) or rather tune min_samples_split using
> cross-validation.
Do fully developed trees make sense for rather small datasets?
Hi Paul,
a) Scaling has no effect on decision trees.
b) You shouldn't set max_depth=5. Instead, build fully developed trees
(max_depth=None) or rather tune min_samples_split using
cross-validation.
Hope this helps.
Gilles
On 6 November 2012 16:21, wrote:
>
> ear SciKitters,
>
> given a rathe
ear SciKitters,
given a rather unbalanced data set (454 samples with classification "0" and
168 samples with classification "1"), I would like to train a RandomForest.
For my data set, I have calculated 177 features per sample.
In a first step, I have preprocessed my data set:
"
dataDescrs_array