I think the real cause is perhaps that the implementation is not fully fleshed out. I haven't looked at it, but I'm sure that if you find additions and improvements you could post them and get them committed.
I am probably missing something basic, but you seemed to say at the outset that you switched to an in-memory (non-Hadoop?) implementation but are using Hadoop. Is there not a non-Hadoop version? (I really don't know, but thought there was.) It's hard to get anything non-trivial to finish on Hadoop in under 5 minutes. On Fri, Mar 30, 2012 at 5:27 PM, Jason L Shaw <jls...@uw.edu> wrote: > I don't believe I can use the multiple-files solution because Mahout can't > handle multiple input files for Random Forest training. > > 15-30 minutes isn't a big deal for training a model I'll use a lot, but in > developing a feature set and testing a model many times, it gets pretty > tedious. It's a shame that it's not easier to take advantage of excessive > parallelism in an algorithm such as RF, but that's the way it goes. I'm > using Hadoop not because I think it's the ideal parallel computing > solution, but because it's what I have available to me. > > Thanks for your help anyway. I'll post back if I find a silver bullet. > >