There isn't a good solution for 0.5. The code that calls setJarByClass has to pass a class that is NOT in the lib directory, but rather in the unpacked classes. It's really easy to build a hadoop job with Mahout that violates that rule due to all the static methods that create jobs.
We seem to have a consensus to rework all the jobs as beans so that this can be wrestled into control. On Sun, May 8, 2011 at 6:16 PM, Jake Mannix <[email protected]> wrote: > On Sun, May 8, 2011 at 2:58 PM, Sean Owen <[email protected]> wrote: > >> If I recall the last discussion on this correctly -- >> >> No you don't want to put anything in Hadoop's lib/ directory. Even if >> you can, that's not the "right" way. >> You want to use the job file indeed, which should contain all dependencies. >> However, it packages dependencies as jars-in-the-jar, which doesn't >> work for Hadoop. >> > > I thought that hadoop was totally fine with jars inside of the jar, if > they're > in the lib directory? > > >> I think if you modify the Maven build to just repackage all classes >> into the main jar, it works. It works for me at least. >> > > Clearly we're not expecting people to do this. I wasn't even running with > special new classes, it wasn't finding *Vector* - if this doesn't work on > a real cluster, then most of our entire codebase (which requires > mahout-math) doesn't work. > > -jake >
