I’d say the probably best summary (and discussion) can be found
"Understanding variable importances in forests of randomized trees” by Gilles
Louppe, Louis Wehenkel, Antonio Sutera and Pierre Geurts (with references to
Breimans original proposed ideas)
http://papers.nips.cc/paper/4928-understand
In the Scikit-Learn documentation the feature importances are described as
coming from the relative depths features are used as decision nodes,
averaged across trees in the forest. Does anyone know which paper discusses
this method? Breiman's original paper seems to just talk about randomly
permuti