I think the real cause is perhaps that the implementation is not fully
fleshed out. I haven't looked at it, but I'm sure that if you find
additions and improvements you could post them and get them committed.

I am probably missing something basic, but you seemed to say at the outset
that you switched to an in-memory (non-Hadoop?) implementation but are
using Hadoop. Is there not a non-Hadoop version? (I really don't know, but
thought there was.)

It's hard to get anything non-trivial to finish on Hadoop in under 5
minutes.

On Fri, Mar 30, 2012 at 5:27 PM, Jason L Shaw <jls...@uw.edu> wrote:

> I don't believe I can use the multiple-files solution because Mahout can't
> handle multiple input files for Random Forest training.
>
> 15-30 minutes isn't a big deal for training a model I'll use a lot, but in
> developing a feature set and testing a model many times, it gets pretty
> tedious.  It's a shame that it's not easier to take advantage of excessive
> parallelism in an algorithm such as RF, but that's the way it goes.  I'm
> using Hadoop not because I think it's the ideal parallel computing
> solution, but because it's what I have available to me.
>
> Thanks for your help anyway.  I'll post back if I find a silver bullet.
>
>

Reply via email to