Are you looking for the DistributedCache's archives feature? If you add a 'archive' type to the cache, it automatically extracts it onto the current working directory.
See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html "Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave nodes. Jars may be optionally added to the classpath of the tasks, a rudimentary software distribution mechanism." API call: DistributedCache.addCacheArchive(…) On Sat, Dec 22, 2012 at 8:50 AM, Ilya Kirnos <i...@cardspring.com> wrote: > When running hadoop locally, RunJar will unjar the job jar and use the > localized directory as the classpath to run the job. When running > distributed, it seems the localized directory is created, but the jar is > used for the classpath instead, and the localized directory is ignored for > classpath purposes. Is it possible to configure hadoop to use the unjared > directory instead? (I have some relative paths that work on a real > filesystem, but not when running from a jar.) > > This is the directory I'm talking about: > > http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html: > > ${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/ : The jars directory, > which has the job jar file and expanded jar. The job.jar is the > application's jar file that is automatically distributed to each machine. It > is expanded in jars directory before the tasks for the job start. The > job.jar location is accessible to the application through the api > JobConf.getJar() . To access the unjarred directory, > JobConf.getJar().getParent() can be called. > > > Thanks. > > -- > -ilya -- Harsh J