shade plugin is useful only to a point. E.g. signed jars would not survive
that in my experience (BouncyCastle library comes to mind).

-d

On Thu, Jan 20, 2011 at 10:44 PM, Erik Onnen <[email protected]> wrote:

> As a new member to the list, I offer our lone data point. We use the maven
> shade plugin: http://maven.apache.org/plugins/maven-shade-plugin/
>
> Shade produces an "uber" JAR with an optional declared main class.
>
> <http://maven.apache.org/plugins/maven-shade-plugin/>On the up side, for a
> reasonable number of dependencies (in our case ~40), it just works and
> results in a single JAR. We're lucky enough that across the board, we can
> use one JAR for launching a message consumer, an Hadoop Job, and a Pig job.
>
> <http://maven.apache.org/plugins/maven-shade-plugin/>That said, there are
> two caveats we've encountered:
> * System dependencies aren't rolled into the "uber" JAR - if you want
> something to be in the deployment artifact, you need to at a minimum put it
> into your local repo - we do this via bash scripting for HBase 0.90.0 for
> example.
> * Conflicts - so far we've managed to do a maven dependency:tree and
> exclude
> conflicting dependencies, but I'm sure there is a point where that will not
> work any more.
>
> I'd love to hear how others are solving the problem, so far this has worked
> for us.
>
> -erik
>
>
> On Thu, Jan 20, 2011 at 7:31 PM, Kaluskar, Sanjay <
> [email protected]
> > wrote:
>
> > Hi Dmitriy,
> >
> > Well, what I have is still experimental & not in any product. But, yes
> > we can compile to a Pig script. I try to use the native relational
> > operators where possible & use UDFs in other cases.
> >
> > I don't understand which conflicts you are referring to. Initially, I
> > was trying to create a single jar (containing all the 300 dependencies)
> > using the maven-dependency-plugin (BTW that seems to be the recommended
> > approach & should work in many cases) but it turned out that some of our
> > internal components had conflicting file names for some of the resources
> > (should probably be fixed!). My current approach works better because I
> > don't try to re-package any dependency. Yes, startup times are slow - of
> > course, I am open to other ideas :-)
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:[email protected]]
> > Sent: 21 January 2011 07:57
> > To: [email protected]
> > Subject: Re: Managing pig script jar dependencies
> >
> > Sanjay,
> > Informatica compiles to Pig now, eh? Interesting...
> > How do you handle jar conflicts if you bundle the whole lot? Doesn't
> > this cost you a lot on job startup time?
> >
> > Dmitriy
> >
> >
> > On Thu, Jan 20, 2011 at 5:41 PM, Kaluskar, Sanjay
> > <[email protected]
> > > wrote:
> >
> > > I have a similar problem and I can tell you what I am doing currently,
> >
> > > just in case it is useful. I have a tool that generates PIG scripts
> > > from some other representation (Informatica mappings), and in many
> > > cases the scripts also call UDFs that depend on about 300 jars & 580
> > > native libraries. Additionally, I generate a jar for each PIG script
> > > that contains the UDFs called from that script. I add the latter jar
> > > in the script in a register statement. But registering the 300 jars
> > > that the UDFs depend on individually is error prone & tedious; so I
> > > have automated that part. I have a top-level jar that includes all the
> >
> > > 300 jars on its Class-path in the MANIFEST.MF and I add this top-level
> >
> > > jar to the classpath. I generate that (top-level jar) using maven's
> > > assembly plugin. I also generate a zip of everything (jars, native
> > > libs) using maven's assembly plugin and use dist cache to distribute
> > > it and add the native libs to the LD_LIBRARY_PATH.
> > >
> > > -----Original Message-----
> > > From: Dmitriy Ryaboy [mailto:[email protected]]
> > > Sent: 21 January 2011 05:57
> > > To: [email protected]
> > > Subject: Re: Managing pig script jar dependencies
> > >
> > > This is becoming a bigger problem for us as well, as use of Pig
> > > becomes more varied across the company.
> > > Would love some to hear what others have found to work for them.
> > >
> > > D
> > >
> > > On Wed, Jan 19, 2011 at 2:24 PM, Geoffrey Gallaway
> > > <[email protected]>wrote:
> > >
> > > > I'm looking for some suggestions and ideas for how to handle JAR
> > > > dependencies in a production environment.
> > > >
> > > > Most of the pig scripts I write require multiple JAR files. For
> > > > instance, I have a pig script that processes some data through a
> > > > Solr instance which requires my Solr UDF and some solr, lucene and
> > > > apache commons jars. These pig scripts are stored in a git repo and
> > > > that git repo is deployed to our production cluster. Obviously we
> > > > don't want to
> > >
> > > > store the jars in git; I'd rather store them in our mvn repo with
> > > > the rest of the jars the company uses.
> > > >
> > > > The plan is to have a maven pom.xml for each pig script that defines
> >
> > > > which jars that pig script depends on. A shell script will then call
> >
> > > > "mvn dependency:copy-dependencies -DoutputDirectory=pig-jars" before
> >
> > > > calling the actual pig command to run the script. Given that, I'm
> > > > trying to figure out the best solution to a few questions.
> > > >
> > > > * For development I'd like to store the pig jar (pig-0.7.0-core.jar)
> >
> > > > in maven but there is no pom.xml for that jar (easily fixed) and
> > > > that jar contains all the java prerequisites (javax.servlet, apache
> > > > commons, etc) which seem to be making maven unhappy when I try to
> > > > import it into the maven company repo. Is there a pig-only jar?
> > > >
> > > > * What do other people use to deploy their code to various systems?
> > > > Check in jars with the code? Keep jars in a separate, network-based
> > > > directory?
> > > >
> > > > Geoff
> > > > --
> > > > Sent from my email client.
> > > >
> > >
> >
>

Reply via email to