Another difference...

R's randomForest package (which RRF is based on) evaluates subsets of values 
when partitioning nominal values.  [This is why it complains if there are more 
than 32 distinct values for a nominal variable.]

For example, if our nominal variable has values { A, B, C, D }, the package 
will consider "in { A, C }" versus "not in { A, C }" as a partition candidate.

-----Original Message-----
From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
Sent: Friday, October 18, 2013 10:42 AM
To: user@mahout.apache.org
Subject: Re: Mahout 0.8 Random Forest Accuracy

On Fri, Oct 18, 2013 at 7:48 AM, Tim Peut <t...@timpeut.com> wrote:

> Has anyone found that Mahout's random forest doesn't perform as well as
> other implementations? If not, is there any reason why it wouldn't perform
> as well?
>

This is disappointing, but not entirely surprising.  There has been
considerably less effort applied to Mahouts random forest package than the
comparable R packages.

Note, particularly that the Mahout implementation is not regularized.  That
could well be a big difference.

Reply via email to