The 'lib/' convention is not a feature of Java, it's a feature of hadoop.

It is activated by calling the 'setJar' API in the job conf, passing
the name of the jar that contains the lib folder.

As a convenience (and a trap for the unwary), there is a convenience:
setJarByClass. This takes a Class<?> instead of a string jar path. It
attempts to derive a jar name from the class reference.

Mahout then has a series of self-contained classes that create JobConf
objects, and make calls to setJarByClass, passing Whatever.class. If
one of those classes somehow wanders into lib/ (like, a person
building a job jar puts mahout into 'lib/' and then tries to use a
Mahout job class) the call to setJarByClass is at best ineffective and
at worst destructive.

On Mon, May 9, 2011 at 11:07 AM, Jake Mannix <[email protected]> wrote:
> Benson,
>
>  Can you remind me what the "setJarByClass" issue is again?
>
> On May 9, 2011 6:30 AM, "Benson Margulies" <[email protected]> wrote:
>
> I see no reason to stop using the 'lib/' convention in our jobs.
>
> There are apparently plenty of people out there who don't know
> anything about the distributed cache. If we require it's use to run
> simple jobs, we're going to be up to our ears in support email.
>
> I favor the following strategy:
>
> 1) Make sure that the split between 'libs/' and unpacked classes in
> our job jars is *correct* so that all the operations of the mahout
> command work out of the box.
>
> 2) post 0.5, act on the proposed refactoring so that none of our code
> is calling setJarFromClass in a way that forces users to do complex
> re-shading for themselves. That's the 'bean' proposal, in which each
> of our jobs is a bean, and a user who wants to combine ours and theirs
> can make their own call to setJar/setJarFromClass appropriately.
>

Reply via email to