The 'lib/' convention is not a feature of Java, it's a feature of hadoop. It is activated by calling the 'setJar' API in the job conf, passing the name of the jar that contains the lib folder.
As a convenience (and a trap for the unwary), there is a convenience: setJarByClass. This takes a Class<?> instead of a string jar path. It attempts to derive a jar name from the class reference. Mahout then has a series of self-contained classes that create JobConf objects, and make calls to setJarByClass, passing Whatever.class. If one of those classes somehow wanders into lib/ (like, a person building a job jar puts mahout into 'lib/' and then tries to use a Mahout job class) the call to setJarByClass is at best ineffective and at worst destructive. On Mon, May 9, 2011 at 11:07 AM, Jake Mannix <[email protected]> wrote: > Benson, > > Can you remind me what the "setJarByClass" issue is again? > > On May 9, 2011 6:30 AM, "Benson Margulies" <[email protected]> wrote: > > I see no reason to stop using the 'lib/' convention in our jobs. > > There are apparently plenty of people out there who don't know > anything about the distributed cache. If we require it's use to run > simple jobs, we're going to be up to our ears in support email. > > I favor the following strategy: > > 1) Make sure that the split between 'libs/' and unpacked classes in > our job jars is *correct* so that all the operations of the mahout > command work out of the box. > > 2) post 0.5, act on the proposed refactoring so that none of our code > is calling setJarFromClass in a way that forces users to do complex > re-shading for themselves. That's the 'bean' proposal, in which each > of our jobs is a bean, and a user who wants to combine ours and theirs > can make their own call to setJar/setJarFromClass appropriately. >
