I wrestled with this issue too, and I tried out a few things including the creating a single top-level jar (containing jars as well as containing the expanded files). As you found out, the jar with jars approach doesn't work. The jar with expanded jars approach could work if you don't have conflicting file names (incl resources) in the dependencies you are trying to package. It didn't work for me. The solution that I have isn't not very nice but it works: I have a top-level jar that included all the dependencies in its manifest (attr Class-Path). The maven assembly plugin can be used to automate this to make it extensible & less error-prone. Unfortunately, PIG will not add all the dependencies to the class path, so you will have to add this class to the class path by directly editing mapred-site.xml (using dist cache).
-sanjay -----Original Message----- From: Yong-gang Cao [mailto:[email protected]] Sent: Friday, October 22, 2010 5:39 AM To: [email protected] Subject: is it possible to use self-contained jar in pig? Hi, I met a headache about using UDFs with many dependence, adding them using register command is very painful and not extensible. I can make self-contained jar for hadoop job using maven (a jar with a lib directory which contains all jars will be used for class look up), but it seems doesn't work for pig. pig just treat that jar as a regular jar and try to find classes directly inside it instead of inside those embedded jars. Is there a way to make pig do the hadoop way of looking into the self-contained big jar for class loading? Thanks! -- Regards, Yong-gang Cao Seattle,WA,98104
