For the foreseeable future, we're on 0.20.2. A major reason is that Amazon EMR is on 0.20.2 and not 0.21.0 yet.
I think I know what you mean. There is one tiny API change that makes it impossible to run 0.20.2-compiled code on 0.21.0 and vice versa. Strangely, you will find that the code compiles fine against 0.21.0, and that you can then run it on 0.21.0. I believe an abstract class was made an interface and, while that change didn't make the compile-time situation any different for Mahout, it means that the byte code can only be consistent with one version of Hadoop. So: I'm saying you can change the Hadoop version to 0.21.0 and compile again and it will work fine. Well, not so fast. hadoop-core is now renamed and split into hadoop-common and hadoop-mapred, if I remember rightly, so you need to update and add these dependencies. And AFAIK 0.21.0 is still not in Maven for some reason. So you'd have to install 0.21.0 in Maven locally to get it to work. At least, this is what I had to do to get it working when I tried this a few weeks ago. On Mon, Dec 20, 2010 at 10:25 PM, Ben Clay <[email protected]> wrote: > Hi- > > > > Is there a timeline for updating Mahout to work with Hadoop 0.21? > > > > Hadoop 0.21 introduced some features we need (LATE scheduler) but when > running the Mahout Quickstart jobs on both Mahout 0.4 and the latest 0.5 > trunk, I get a number of errors relating to the new Hadoop API. I had some > trouble finding details using Google, but I see the 0.5 trunk's pom.xml > indicates <hadoop.version>0.20.2</hadoop.version> > > > > Thanks for any info! > > > > -Ben > > > >
