Hi Jeff,

Thanks for your prompt reply. Actually my problem is as follows:

My code creates a new job named "job 1" which writes something to
distributed cache (say a text file) and the job gets completed.

Now, I want to create some n number of jobs in while loop below, which
reads the text file written by "job 1" from the distributed cache. So my
question is, "*How to share content among multiple jobs using distributed
cache*" ?

*Another part of the problem *is that I don't know how to get instance of
running job from


*so that I can use job.addcachefiles(..) command/*

while (true)


JobConf conf2  = new JobConf(getConf(),graphMining.class);

new Path(input));FileOutputFormat.setOutputPath(conf2, new
Path(output)); }

RunningJob job = JobClient.runJob(conf2);

On Wed, Jun 8, 2016 at 3:50 AM, Guttadauro, Jeff <jeff.guttada...@here.com>

> Hi, Siddharth.
> I was also a bit frustrated at what I found to be scant documentation on
> how to use the distributed cache in Hadoop 2.  The DistributedCache class
> itself was deprecated in Hadoop 2, but there don’t appear to be very clear
> instructions on the alternative.  I think it’s actually much simpler to
> work with files on the distributed cache in Hadoop 2.  The new way is to
> add files to the cache (or cacheArchive) via the Job object:
> job.addCacheFile(*uriForYourFile*)
> job.addCacheArchive(*uriForYourArchive*);
> The cool part is that, if you set up your URI so that it has a “#
> *yourFileReference*” at the end, then Hadoop will set up a symbolic link
> named “*yourFileReference*” in your job’s working directory, which you
> can use to get at the file or archive.  So, it’s as if the file or archive
> is in the working directory.  That obviates the need to even work with the
> DistributedCache class in your Mapper or Reducer, since you can just work
> with the file (or path using nio) directly.
> Hope that helps.
> -Jeff
> *From:* Siddharth Dawar [mailto:siddharthdawa...@gmail.com]
> *Sent:* Tuesday, June 07, 2016 4:06 AM
> *To:* user@hadoop.apache.org
> *Subject:* Accessing files in Hadoop 2.7.2 Distributed Cache
> Hi,
> I want to use the distributed cache to allow my mappers to access data in
> Hadoop 2.7.2. In main, I'm using the command
> String hdfs_path="hdfs://localhost:9000/bloomfilter";
> InputStream in = new BufferedInputStream(new 
> FileInputStream("/home/siddharth/Desktop/data/bloom_filter"));
> Configuration conf = new Configuration();
> fs = FileSystem.get(java.net.URI.create(hdfs_path), conf);
> OutputStream out = fs.create(new Path(hdfs_path));
> //Copy file from local to HDFS
> IOUtils.copyBytes(in, out, 4096, true);
> System.out.println(hdfs_path + " copied to 
> HDFS");DistributedCache.addCacheFile(new Path(hdfs_path).toUri(), conf2);
> DistributedCache.addCacheFile(new Path(hdfs_path).toUri(), conf2);
> The above code adds a file present on my local file system to HDFS and adds 
> it to the distributed cache.
> However, in my mapper code, when I try to access the file stored in 
> distributed cache, the Path[] P variable gets null value. d
> public void configure(JobConf conf)
>                        {
>                                this.conf = conf;
>                                try {
>                                       Path [] 
> p=DistributedCache.getLocalCacheFiles(conf);
>                                } catch (IOException e) {
>                                       // TODO Auto-generated catch block
>                                       e.printStackTrace();
>                                }
>                        }
> Even when I tried to access distributed cache from the following code
> in my mapper, the code returns the error that bloomfilter file doesn't exist
> strm = new DataInputStream(new FileInputStream("bloomfilter"));
> // Read into our Bloom filter.
> filter.readFields(strm);
> strm.close();
> However, I read somewhere that if we add a file to distributed cache, we can 
> access it
> directly from its name.
> Can you please help me out ?

Reply via email to