The term 'make one jar' is a bit ambiguous to me. If you mean, 'make
one job jar that has all the job classes in the unpacked parts and all
the dependencies in lib', fine. If you mean 'shade it all into one
giant jar without a lib', I'm slightly squeamish -- but there are
options in shade to assure that this works right, even given bouncy
castle or other things with SPIs if we have any to worry about.

On Mon, May 9, 2011 at 11:41 AM, Sean Owen <[email protected]> wrote:
> Ah, OK. The trickiness there is that we don't know the location of the
> jar. (Right?) The user can tell us, though they're then specifying it
> twice on the command line, once to Hadoop and once to us. At least I
> don't know of something smarter.
>
> Is there any better interim solution than just packaging it all up
> into one .jar? that obviates this issue. (That doesn't personally
> offend my sense of hackiness and propriety anyway, but I do see the
> arguments there.) Because it looks like we need to do *something* for
> now.
>
> And then I bet there's a better long term answer even as I don't know
> what it is. Heck, if someone does know and it's not too hard, I'll
> make it happen now.
>
> Sean
>
> On Mon, May 9, 2011 at 4:36 PM, Benson Margulies <[email protected]> 
> wrote:
>> The 'lib/' convention is not a feature of Java, it's a feature of hadoop.
>>
>> It is activated by calling the 'setJar' API in the job conf, passing
>> the name of the jar that contains the lib folder.
>>
>> As a convenience (and a trap for the unwary), there is a convenience:
>> setJarByClass. This takes a Class<?> instead of a string jar path. It
>> attempts to derive a jar name from the class reference.
>>
>> Mahout then has a series of self-contained classes that create JobConf
>> objects, and make calls to setJarByClass, passing Whatever.class. If
>> one of those classes somehow wanders into lib/ (like, a person
>> building a job jar puts mahout into 'lib/' and then tries to use a
>> Mahout job class) the call to setJarByClass is at best ineffective and
>> at worst destructive.
>>
>> On Mon, May 9, 2011 at 11:07 AM, Jake Mannix <[email protected]> wrote:
>>> Benson,
>>>
>>>  Can you remind me what the "setJarByClass" issue is again?
>>>
>>> On May 9, 2011 6:30 AM, "Benson Margulies" <[email protected]> wrote:
>>>
>>> I see no reason to stop using the 'lib/' convention in our jobs.
>>>
>>> There are apparently plenty of people out there who don't know
>>> anything about the distributed cache. If we require it's use to run
>>> simple jobs, we're going to be up to our ears in support email.
>>>
>>> I favor the following strategy:
>>>
>>> 1) Make sure that the split between 'libs/' and unpacked classes in
>>> our job jars is *correct* so that all the operations of the mahout
>>> command work out of the box.
>>>
>>> 2) post 0.5, act on the proposed refactoring so that none of our code
>>> is calling setJarFromClass in a way that forces users to do complex
>>> re-shading for themselves. That's the 'bean' proposal, in which each
>>> of our jobs is a bean, and a user who wants to combine ours and theirs
>>> can make their own call to setJar/setJarFromClass appropriately.
>>>
>>
>

Reply via email to