Hello R Experts,

I want to make sure I understand how the strata, sampsize and replace 
parameters work so I can confidently perform downsampling on a dataset I'm 
working with.

My main question is when the documentation talks about how each of these 
parameters (strata, sampsize, replace) works it is all per tree?  Below is my 
understanding...can you tell me if I have this correct?


table(iris$Species)



#    setosa versicolor  virginica

#        50         50         50

#default of replace is TRUE


#EACH tree uses a sample of 150. For a given tree since sampling w/ replacement 
is used it is possible that only one class is represented such as setosa i.e. 
each setosa observation is represented 3x.

randomForest(Species~.,data=iris)


# EACH tree uses a sample of 30 -- 10 from each class. Observations from each 
class may be repeated.
randomForest(Species~.,data=iris,sampsize=c(setosa=10,versicolor=10,virginica=10),
 strata=iris$Species)

# EACH tree uses a sample of 60 -- 10 from the 1st classs, 20 from the 2nd and 
30 from the 3rd. Observations from each class may be repeated.
randomForest(Species~.,data=iris,sampsize=c(setosa=10,versicolor=20,virginica=30),
 strata=iris$Species)

Dan


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to