Are you looking for the DistributedCache's archives feature? If you
add a 'archive' type to the cache, it automatically extracts it onto
the current working directory.

See 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

"Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave
nodes. Jars may be optionally added to the classpath of the tasks, a
rudimentary software distribution mechanism."

API call: DistributedCache.addCacheArchive(…)

On Sat, Dec 22, 2012 at 8:50 AM, Ilya Kirnos <i...@cardspring.com> wrote:
> When running hadoop locally, RunJar will unjar the job jar and use the
> localized directory as the classpath to run the job.  When running
> distributed, it seems the localized directory is created, but the jar is
> used for the classpath instead, and the localized directory is ignored for
> classpath purposes.  Is it possible to configure hadoop to use the unjared
> directory instead?  (I have some relative paths that work on a real
> filesystem, but not when running from a jar.)
>
> This is the directory I'm talking about:
>
> http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html:
>
> ${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/ : The jars directory,
> which has the job jar file and expanded jar. The job.jar is the
> application's jar file that is automatically distributed to each machine. It
> is expanded in jars directory before the tasks for the job start. The
> job.jar location is accessible to the application through the api
> JobConf.getJar() . To access the unjarred directory,
> JobConf.getJar().getParent() can be called.
>
>
> Thanks.
>
> --
> -ilya



-- 
Harsh J

Reply via email to