Another difference... R's randomForest package (which RRF is based on) evaluates subsets of values when partitioning nominal values. [This is why it complains if there are more than 32 distinct values for a nominal variable.]
For example, if our nominal variable has values { A, B, C, D }, the package will consider "in { A, C }" versus "not in { A, C }" as a partition candidate. -----Original Message----- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Friday, October 18, 2013 10:42 AM To: user@mahout.apache.org Subject: Re: Mahout 0.8 Random Forest Accuracy On Fri, Oct 18, 2013 at 7:48 AM, Tim Peut <t...@timpeut.com> wrote: > Has anyone found that Mahout's random forest doesn't perform as well as > other implementations? If not, is there any reason why it wouldn't perform > as well? > This is disappointing, but not entirely surprising. There has been considerably less effort applied to Mahouts random forest package than the comparable R packages. Note, particularly that the Mahout implementation is not regularized. That could well be a big difference.