On Mon, May 9, 2011 at 1:09 PM, Benson Margulies <[email protected]>wrote:
> Once more from the top. > > There is a hadoop convention. Is has nothing to do with the > MANIFEST.MF as I read the code. > Ah, sorry, that was something we do with these lib/-ified jars here at work (it's pretty common practice to do this, it's too bad it's not a java-supported spec). > I'm not an evangelist for the maven-shade-plugin, but my very > unscientific impression is that people walk up to mahout and expect > the mahout command to just 'work'. Unless someone can unveil a way to > script the exploitation of the distributed cache, that means that the > jar file that the mahout command hands to the hadoop command has to > use the 'lib/' convention, and have the correct structure of raw and > lib-ed classes. > Totally agree, if it works. > Further, any unsophisticated user who goes to incorporate Mahout into > a larger structure has to do likewise. > Well, users who want to incorporate mahout into a larger structure will have their own build system to interact with, and will need to be instructed to take our individual jars and package them up properly, no? > We could avoid exciting uses of the shade plugin altogether if we > didn't have these static methods that initialize jobs and call > setJarByClass on themselves. However, I don't see that for 0.5 unless > we want to push the schedule back and make a concerted effort. > > Further, I am concerned, based on Jake's remarks, that even following > the hadoop lib/ convention correctly doesn't always work, and we have > no diagnostic insight into the nature of the failure. > Can someone please try out our current code on another real cluster, so we have another data point? My worry is that even without this setJarByClass business, we're not working properly. If we are, I'm fine fixing this classpath stuff in 0.6 If we're broken now, it needs fixing, asap. -jake
