I'm building a RF 50 trees at a time due to memory limitations (I have roughly .5 million observations and around 20 variables). I thought I could combine some or all of my forests later and look at global importance.
If I have say 2 forests : tree1 and tree2, they have similar Gini and Raw importances and, additionally, are similar to one another. After combining (using the combine command) the trees into one however, the combined tree Raw importances have changed in rank order rather dramtically (e.g. the top most important becomes least important. It is not however a completely reversed ordering). In addtion, the scale of both the Raw and Gini importances is orders of magnitude smaller for the combined tree. Note that the combined tree Gini importance looks roughly similar to the individual tree Gini (and Raw) importance, at least in terms of rank ordering. I'm using the non-formula randomForest specification along with norm.votes=FALSE to facilitate large sample estimation and tree combining. I'm using R 2.5.0 on a windows XP machine with 2 gig RAM. I'm also using randomForest 4.5-18. Any advice is appreciated, Many thanks, Joe [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.