Best practice for using third party libraries in MapReduce Jobs?

2008-12-03 Thread Scott Whitecross
What's the best way to use third party libraries with Hadoop? For example, I want to run a job with both a jar file containing the mob, and also extra libraries. I noticed a couple solutions with a search, but I'm hoping for something better: - Merge the third party jar libraries into

Re: Best practice for using third party libraries in MapReduce Jobs?

2008-12-03 Thread tim robertson
Can't answer your question exactly, but can let you know what I do. I build all dependencies into 1 jar, and by using Maven for my build environment, when I assemble my jar, I am 100% sure all my dependencies are collected together. This is working very nicely for me and I have used the same

Re: Best practice for using third party libraries in MapReduce Jobs?

2008-12-03 Thread Johannes Zillmann
You could use the DistributedCache to put multiple jar's into the classpath. Of cause you would have to write your own job-submission logic for that Johannes On Dec 3, 2008, at 3:19 PM, Scott Whitecross wrote: What's the best way to use third party libraries with Hadoop? For

Re: Best practice for using third party libraries in MapReduce Jobs?

2008-12-03 Thread tim robertson
Exactly. I'm no expert on maven either, but I like it's convenience for classpath handling Attached are my scripts. - Hadoop-installer allows me to install different versions of hadoop to local repo - Pom has an assembly plugin (change mainClass and packageName to be your target) - Assembly does