Hi all, I'm using the random forest implementation in Mahout 0.8 to perform classification (org.apache.mahout.classifier.df.mapreduce.BuildForest and org.apache.mahout.classifier.df.mapreduce.TestForest). I've run the classifier multiple times with different parameters and different data splits, and consistently get accuracy of ~0.9.
I've previously used R's RRF package with the exact same data and I consistently get accuracy of ~0.95, which is a fair bit higher than the Mahout results. I've been unable to figure out why the classifiers perform differently with the same data and the same parameters. Has anyone found that Mahout's random forest doesn't perform as well as other implementations? If not, is there any reason why it wouldn't perform as well? Cheers, Tim