Waiting for others to give best practice. I think you can use eclipse to manage the maven; see the full dependency hierarchy, if some jar(for example, guava) exists in both hadoop dependency chain and your own requirements, put your requirements' scope as "provided" .
Regards, *Stanley Shi,* On Mon, Mar 10, 2014 at 11:33 AM, Fengyun RAO <raofeng...@gmail.com> wrote: > First of all, I want to claim that I used CDH5 beta, and managed project > using maven, and I googled and read a lot, e.g. > https://issues.apache.org/jira/browse/MAPREDUCE-1700 > > http://www.datasalt.com/2011/05/handling-dependencies-and-configuration-in-java-hadoop-projects-efficiently/ > > I believe the problem is quite common, when we write an MR job, we need > lots of dependencies, > which may not exist in or conflict with HADDOP_CLASSPATH. > There are several options, e.g. > 1. add all libraries to my own JAR, and set > HADOOP_USER_CLASSPATH_FIRST=true > This is what I do, which makes the jar very big, and still it doesn't > work. > e.g. I already packaged guava-16.0.jar in my package, but it still use > guava-11.0.2.jar in the HADDOP_CLASSPATH. > below is my build configuration. > <plugin> > <artifactId>maven-assembly-plugin</artifactId> > <configuration> > <archive> > <manifest> > <mainClass>xxx.xxx.xxx.Runner</mainClass> > </manifest> > </archive> > <descriptorRefs> > > <descriptorRef>jar-with-dependencies</descriptorRef> > </descriptorRefs> > </configuration> > <executions> > <execution> > <id>make-assembly</id> > <phase>package</phase> > <goals> > <goal>single</goal> > </goals> > </execution> > </executions> > </plugin> > > 2. distinguish which library is not present in HADDOP_CLASSPATH, and put > it into DistributedCache > I think it's hard to distinguish, and still if it conflicts, which > dependency would be precedent? > > > *What's the best practice, especially using maven?* > > >