What's the best way to use third party libraries with Hadoop? For
example, I want to run a job with both a jar file containing the mob,
and also extra libraries. I noticed a couple solutions with a search,
but I'm hoping for something better:
- Merge the third party jar libraries into
Can't answer your question exactly, but can let you know what I do.
I build all dependencies into 1 jar, and by using Maven for my build
environment, when I assemble my jar, I am 100% sure all my
dependencies are collected together. This is working very nicely for
me and I have used the same
You could use the DistributedCache to put multiple jar's into the
classpath. Of cause you would have to write your own job-submission
logic for that
Johannes
On Dec 3, 2008, at 3:19 PM, Scott Whitecross wrote:
What's the best way to use third party libraries with Hadoop? For
Exactly. I'm no expert on maven either, but I like it's convenience for
classpath handling
Attached are my scripts.
- Hadoop-installer allows me to install different versions of hadoop
to local repo
- Pom has an assembly plugin (change mainClass and packageName to be
your target)
- Assembly does