Hi Aditya,
The sampling is done with replacement with the default settings.
Hence, you will get different dataset even though you sample same number
(`X.shape[0]`) of datapoints.
Regards,
Venkatachalam N.
On Wed, Mar 11, 2020 at 11:14 AM aditya aggarwal <
adityaselfeffici...@gmail.com> wrote:
With all the parameters set to default, (especially bootstrap and
max_samples), no of samples passed to each estimator is X.shape[0]. Doesn't
it account for all the instances in the dataset with calculated no. of
feature? Then how come only a subset is given to the estimator?
On Wed, Mar 11, 2020
Regardless of the number of features, each DT estimator is given only a
subset of the data.
Each DT estimator then uses the features to derive decision rules for the
samples it was given.
With more trees and few examples, you might get similar or identical trees,
but that is not the norm.
Pardon b
For RandomForestClassifier in sklearn
max_features parameter gives the max no of features for split in random
forest which is sqrt(n_features) as default. If m is sqrt of n, then no of
combinations for DT formation is nCm. What if nCm is less than n_estimators
(no of decision trees in random fores