[Scikit-learn-general] extra trees, oob score vs shufflesplit

2014-02-26 Thread Satrajit Ghosh
hi folks, when using extra trees, one can compute an oob score. has anybody looked at comparing the oob_score to performing a shufflesplit iteration on the data? are these in someways equivalent or would converge to the same mean? cheers, satra --

Re: [Scikit-learn-general] extra trees

2012-03-25 Thread Gilles Louppe
Hi Satrajit, Adding more trees should never hurt accuracy. The more, the better. Since you have a lot of irrelevant features, I'll advise to increase max_features in order to capture the relevant features when computing the random splits. Otherwise, your trees will indeed fit on noise. Best, Gi

Re: [Scikit-learn-general] extra trees

2012-03-25 Thread Satrajit Ghosh
thanks paolo, will give all of this a try. i'll also send a pr with a section on patterns for sklearn. although this pattern might be specific to my problem domain, having more real-world scripts/examples that reflect such considerations might be useful to the community. cheers, satra On Sun, M

Re: [Scikit-learn-general] extra trees

2012-03-25 Thread Paolo Losi
Hi Satraijit, On Sun, Mar 25, 2012 at 3:02 PM, Satrajit Ghosh wrote: > hi giles, > > when dealing with skinny matrices  of the type few samples x lots of > features what are the recommendations for extra trees in terms of max > features and number of estimators? as far as number of estimators (t

Re: [Scikit-learn-general] extra trees

2012-03-25 Thread Paolo Losi
On Sun, Mar 25, 2012 at 3:32 PM, Paolo Losi wrote: > You could rank features by feature importance and perform recursive feature > limitation s/recursive feature limitation/recursive feature elimination/ -- This SF emai

[Scikit-learn-general] extra trees

2012-03-25 Thread Satrajit Ghosh
hi giles, when dealing with skinny matrices of the type few samples x lots of features what are the recommendations for extra trees in terms of max features and number of estimators? also if a lot of the features are nuisance and most are noisy, are there any recommendations for feature reductio