I think I am still +1 to just creating one re-packaged .jar -- for now at least. It fixes problems for sure. And then I am happy for the cognoscenti to construct a better solution later, and I'd be pleased to help. Though I still don't find this re-packaging a bad thing -- theoretical problems with signing keys and whatnot, yes, but don't exist in practice now.
I guess I'm asking whether anyone is for/against committing MAHOUT-691? On Mon, May 9, 2011 at 9:39 PM, Jake Mannix <[email protected]> wrote: > On Mon, May 9, 2011 at 1:31 PM, Dmitriy Lyubimov <[email protected]> wrote: >> >> then AbstractJob implements walking the lib tree and adding those >> paths (based on MAHOUT_HOME >> or otherwise derived knowledge of lib location) and throws all the >> jars there into backend path. all mahout projects >> do something similar. Where's the complexity in that? >> > > The complexity is right there: "throws all jars there into backend path". > > How do you wish to accomplish this? Currently we follow the hadoop > convention of doing this (lib/ inside of the jar passed to hadoop cli). > It apparently doesn't always work (or never? or is this PEBKAC?). > We could alternately use the hadoop "-libjars" technique, which > does what you suggest in another way. Also we could, ourself, > copy these jars into the DistributedCache and do something that > way. > > But each of these has pros and cons, and figuring out which way > works for the examples job, and our users, is the question we're > getting at here. > > I really wish I knew why the lib/ thing doesn't work for vanilla > calls to classes in our examples job-jar. > > -jake >
