CVE-2018-8009: Apache Hadoop distributed cache archive vulnerability
Severity: Severe
Vendor: The Apache Software Foundation
Versions Affected:
Hadoop 0.23.0 to 0.23.11
Hadoop 2.0.0-alpha to 2.7.6
Hadoop 2.8.0 to 2.8.4
Hadoop 2.9.0 to 2.9.1
Hadoop 3.0.0-alpha to 3.0.2
Hadoop 3.1.0
Hi
My code creates a new job named "job 1" which writes something to distributed
cache (say a text file) and the job gets completed.
Just to manage expectations, you add files to the distributed cache_in the job
driver_, and the framework makes them available to maps and reducers.
Hi, Siddharth.
Not sure I fully understand your problem. I think you are saying that you
would like to run an initial M/R job to create some data that n jobs after that
will be able to use, and you are saying you’d like to use the distributed cache
for that. I think you may not need the
currently running job only).
On Tue, Jun 7, 2016 at 6:36 PM, Arun Natva wrote:
> If you use the Instance of Job class, you can add files to distributed
> cache like this:
> Job job = Job.getInstanceOf(conf);
> job.addCacheFiles(filepath);
>
>
> Sent from my iPhone
>
>
Hi Jeff,
Thanks for your prompt reply. Actually my problem is as follows:
My code creates a new job named "job 1" which writes something to
distributed cache (say a text file) and the job gets completed.
Now, I want to create some n number of jobs in while loop below, which
reads the
Hi, Siddharth.
I was also a bit frustrated at what I found to be scant documentation on how to
use the distributed cache in Hadoop 2. The DistributedCache class itself was
deprecated in Hadoop 2, but there don’t appear to be very clear instructions on
the alternative. I think it’s actually
If you use the Instance of Job class, you can add files to distributed cache
like this:
Job job = Job.getInstanceOf(conf);
job.addCacheFiles(filepath);
Sent from my iPhone
> On Jun 7, 2016, at 5:17 AM, Siddharth Dawar
> wrote:
>
> Hi,
>
> I wrote a program which creates
Int(args[3]));conf2.setNumReduceTasks(Integer.parseInt(args[4]));FileInputFormat.addInputPath(conf2,
new Path(input));FileOutputFormat.setOutputPath(conf2, new
Path(output)); }
RunningJob job = JobClient.runJob(conf2);
}
Now, I want the first Job which gets created to write something in the
distrib
Hi,
I want to use the distributed cache to allow my mappers to access data in
Hadoop 2.7.2. In main, I'm using the command
String hdfs_path="hdfs://localhost:9000/bloomfilter";InputStream in =
new BufferedInputStream(new
FileInputStream("/home/siddharth/Desktop/data/bloom_fil
Hello,
I'm new to Hadoop and a bit used by one thing about distributed cache -
when do files added to distributed cache get deleted?
I'm concretely interested in Hadoop 0.20.2.
I read the following from Hadoop the definitive guide "Files are deleted to
make room for a new file
..@nissatech.com
<mailto:marko.di...@nissatech.com> wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a
sequence file that I add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now
in pseudo-distributed and dist
ion about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.di...@nissatech.com wrote:
>
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cach
h.com wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence
file that I add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now in
pseudo-distributed and distributed I do.
I'm adding file to di
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, mailto:marko.di...@nissatech.com>> wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence
file that I add to distributed cache.
I didn't have problems when I ran it in standalone mo
What version are you using?
Have you seen this?
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, wrote:
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't h
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence file that I
add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now in pseudo-
distributed and distributed I do.
I'm adding file to distributed cache like this
OP-3078
>>
>> https://issues.apache.org/jira/browse/HDFS-4659
>>
>>
>>
>> Cheers
>>
>> Seb.
>>
>>
>>
>> *From:* sam liu [mailto:samliuhad...@gmail.com]
>> *Sent:* Wednesday, May 28, 2014 7:40 AM
>> *To:* user@hadoop.
running Hadoop 2.2, my job places files in the distributed cache.
in my mapper setup, I call context.getCacheFiles() and get back a URI[] with
contents that make sense.
in my reducer setup, I call context.getCacheFiles() and get back null.
Is this expected behavior? If so, how do I get the
/jira/browse/HADOOP-3078
>
> https://issues.apache.org/jira/browse/HDFS-4659
>
>
>
> Cheers
>
> Seb.
>
>
>
> *From:* sam liu [mailto:samliuhad...@gmail.com]
> *Sent:* Wednesday, May 28, 2014 7:40 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: File Permission Is
, 2014 7:40 AM
To: user@hadoop.apache.org
Subject: Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0
Is this possible a Hadoop issue? Or any options is wrong in my cluster?
2014-05-27 13:58 GMT+08:00 sam liu :
Hi Experts,
The original local file has execution permission, and
Is this possible a Hadoop issue? Or any options is wrong in my cluster?
2014-05-27 13:58 GMT+08:00 sam liu :
> Hi Experts,
>
> The original local file has execution permission, and then it was
> distributed to multiple nodemanager nodes with Distributed Cache feature of
> Hadoop
Hi Experts,
The original local file has execution permission, and then it was
distributed to multiple nodemanager nodes with Distributed Cache feature of
Hadoop-2.2.0, but the distributed file has lost the execution permission.
However I did not encounter such issue in Hadoop-1.1.1.
Why this
11:17 AM, Serge Blazhievsky
>> wrote:
>> How are you putting files in distributed cache ?
>>
>> Sent from my iPhone
>>
>>> On Mar 27, 2014, at 9:20 AM, Jonathan Poon wrote:
>>>
>>>
>>> Hi Stanley,
>>>
>>> Sor
on in the 2.2.0 API?
Jonathan
On Thu, Mar 27, 2014 at 11:17 AM, Serge Blazhievsky wrote:
> How are you putting files in distributed cache ?
>
> Sent from my iPhone
>
> On Mar 27, 2014, at 9:20 AM, Jonathan Poon wrote:
>
>
> Hi Stanley,
>
> Sorry about the confusion
How are you putting files in distributed cache ?
Sent from my iPhone
> On Mar 27, 2014, at 9:20 AM, Jonathan Poon wrote:
>
>
> Hi Stanley,
>
> Sorry about the confusion, but I'm trying to read a txt file into my Mapper
> function. I am trying to copy the file us
r saying getLocalCacheFiles() is undefined. I've
>> imported the hadoop-mapreduce-client-core-2.2.0.jar as part of my build
>> environment in Eclipse.
>>
>> Any ideas on what could be incorrect?
>>
>> If I'm incorrectly using the distributed cache, could someone point me to
>> an example using the distributed cache with Hadoop 2.2.0?
>>
>> Thanks for your help!
>>
>> Jonathan
>>
>
>
core-2.2.0.jar as part of my build
> environment in Eclipse.
>
> Any ideas on what could be incorrect?
>
> If I'm incorrectly using the distributed cache, could someone point me to
> an example using the distributed cache with Hadoop 2.2.0?
>
> Thanks for your help!
>
> Jonathan
>
t an error saying getLocalCacheFiles() is undefined. I've
imported the hadoop-mapreduce-client-core-2.2.0.jar as part of my build
environment in Eclipse.
Any ideas on what could be incorrect?
If I'm incorrectly using the distributed cache, could someone point me to
an example using t
Hello,
I was wondering if anyone might know of a way to write bytes directly to the
distributed cache. I know I can call job.addCacheFile(URI uri), but in my
case the file I wish to add to the cache is in memory and is job specific.
I would prefer not writing it to a location that I have to then
= context.getCacheFiles();
>
> File f = new File(localPaths[0]);
>
>
>
> However, I get a NullPointerException when I do that in the Mapper code.
>
>
>
> Any suggesstions?
>
>
>
> Andrew
>
>
>
> From: Shahab Yunus [mailto:shahab.yu...
s = context.getCacheFiles();
>
> File f = new File(localPaths[0]);
>
>
>
> However, I get a NullPointerException when I do that in the Mapper code.
>
>
>
> Any suggesstions?
>
>
>
> Andrew
>
>
>
> *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com
Hi,
Thanks for the response. I can create symlinks for the files. But I don't know
how to add jar to distributed cache. I found one way is by using libjars
argument while running hadoop job. Is it possible to add a jar file directly to
distributed cache? Is there any specific folder in
Hi,
I have no idea about RHadoop but in general in YARN we do create symlinks
for the files in distributed cache in the current working directory of
every container. You may be able to use that somehow.
Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>
On Mon, Sep 23, 2
Hi,
Is it possible to access distributed cache in command line? I have written a
custom InputFormat implementation which I want to add to distributed cache.
Using libjars is not an option for me as I am not running Hadoop job in command
line. I am running it using RHadoop package in R which
*
>
> File f = new File(localPaths[0]);
>
> ** **
>
> However, I get a NullPointerException when I do that in the Mapper code.**
> **
>
> ** **
>
> Any suggesstions?
>
> ** **
>
> Andrew
>
> ** **
>
> *From:* Shahab Yunus [mailto:shaha
r, I get a NullPointerException when I do that in the Mapper code.
Any suggesstions?
Andrew
From: Shahab Yunus [mailto:shahab.yu...@gmail.com]
Sent: Wednesday, July 10, 2013 9:43 PM
To: user@hadoop.apache.org
Subject: Re: New Distributed Cache
Also, once you have the array of URIs after calling get
you try JobContext.getCacheFiles() ?
>
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew
> wrote:
>
>> Hi,
>>
>> ** **
>>
>> I am trying to store a
lly, how do I read my cached file(s) after I call
> JobContext.getCacheFiles()?
>
> ** **
>
> Thanks,
>
> ** **
>
> Andrew
>
> ** **
>
> *From:* Omkar Joshi [mailto:ojo...@hortonworks.com]
> *Sent:* Wednesday, July 10, 2013 5:15 PM
>
> *To:* user@ha
Files()?
Thanks,
Andrew
From: Omkar Joshi [mailto:ojo...@hortonworks.com]
Sent: Wednesday, July 10, 2013 5:15 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache
try JobContext.getCacheFiles()
Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>
On Wed, Jul 10, 2013 at 6:31
apper code? Is there
> a method that will look for any files in the cache?
>
> ** **
>
> Thanks,
>
> ** **
>
> Andrew
>
> ** **
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.
did you try JobContext.getCacheFiles() ?
Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>
On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew wrote:
> Hi,
>
> ** **
>
> I am trying to store a file in the Distributed Cache during my Hadoop job.
>
Hi,
I am trying to store a file in the Distributed Cache during my Hadoop job.
In the driver class, I tell the job to store the file in the cache with this
code:
Job job = Job.getInstance();
job.addCacheFile(new URI("file name"));
That all compiles fine. In the Mapper code, I try
@hadoop.apache.org
Subject: Re: Distributed Cache
You should use Job#addCacheFile()
Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew
mailto:andrew.bote...@emc.com>> wrote:
Hi,
I was wondering if I can still use the DistributedCache class in the latest
release of Hadoop (Version 2.0.5).
pache.hadoop.mapreduce.lib.output.FileOutputFormat;
>>
>> ** **
>>
>> Configuration conf = new Configuration();
>>
>> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
>>
>> Job job = Job.getInstance();
>>
>> …
>>
>> ** **
>>
>> However, I keep getting warnings that the method addCacheFile() is
>> deprecated.
>>
>> Is there a more current way to add files to the distributed cache?
>>
>> ** **
>>
>> Thanks in advance,
>>
>> ** **
>>
>> Andrew
>>
>
>
is code to try and add a file to the
> distributed cache:
>
> ** **
>
> import java.net.URI;
>
> import org.apache.hadoop.conf.Configuration;
>
> import org.apache.hadoop.filecache.DistributedCache;
>
> import org.apache.hadoop.fs
Hi,
I was wondering if I can still use the DistributedCache class in the latest
release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed
cache:
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import
This has been discussed before, see
http://search-hadoop.com/m/xI5AHMD0Vm1 for the previous discussion on
this.
On Wed, May 8, 2013 at 12:54 AM, Saeed Shahrivari
wrote:
> Would you please tell me why we should use Distributed Cache instead of
> HDFS?
> Because HDFS seems more stable,
Not sure what you mean...
If you want to put up a small file to be used by each Task in your job (mapper
or reducer)... you could put it up on HDFS.
Or if you're launching your job from an edge node, you could read in the small
file and put it in to the distributed cache.
It really de
Would you please tell me why we should use Distributed Cache instead of
HDFS?
Because HDFS seems more stable, easier to use, and less error-prone.
Thanks in advance.
Anil, what issue are you facing? You have mentioned 'Unicode issue' but
what is exactly the issue?
Regards,
Shahab
On Sat, May 4, 2013 at 2:28 PM, AnilKumar B wrote:
> Hi,
>
> We are adding ISO-8859-1 content type file in Distributed Cache for look
> up purpose in MR Jo
Hi,
We are adding ISO-8859-1 content type file in Distributed Cache for look up
purpose in MR Job.
But when we try to read the content from Distributed Cache file in MR, we
are facing Unicode issues.
Please find the sample code snippet below:
@Override
protected void setup
The Distributed Cache uses the shared file system (which ever is specified).
The Distributed Cache can be loaded via the GenericOptionsParser / TooRunner
parameters. Those parameters (-files, -archives, -libjars) are seen on the
commandline and available in a MR driver class that implements the
I think the correct question is why would you use distributed cache for a
large file that is read during map/reduce instead of plain hdfs? It does
not sound wise to shuffle GB of data onto all nodes on each job submission
and then just remove it when the job is done. I would think about picking
Hmmm.. maybe im missing something.. but (@bjorn) Why would you use hdfs as a
replacement for the distributed cache?
After all - the distributed cache is just a file with replication over the
whole cluster, which isn't in hdfs. Cant you Just make the cache size big and
store the file
"a replication factor equal to the number of DN"Hmmm... I'm not sure I
understand: there are 8 DN in mytest cluster.
Date: Tue, 9 Apr 2013 04:49:17 -0700
Subject: Re: Distributed cache: how big is too big?
From: bjorn...@gmail.com
To: user@hadoop.apache.org
Put it once
hing a Hadoop solution for an existing application that
> requires a directory structure full of data for processing.
>
> To make the Hadoop solution work I need to deploy the data directory to
> each DN when the job is executed.
> I know this isn't new and commonly done with a
buted Cache.
Based on experience what are the common file sizes deployed in a Distributed
Cache? I know smaller is better, but how big is too big? the larger cache
deployed I have read there will be startup latency. I also assume there are
other factors that play into this.
I know that->
could
not do the System.loadLibrary call recommended).
You can also bundle the stuff into the JAR file in
a subdir and that will be unpacked to the local
working dir. The nice thing about using the
distributed cache is the files only need to be pushed
to the cluster once with a copyFromLocal and
age distribution to the nodes (the nodes are really
> bare and installing the language on the node is not an option) using the
> distributed cache (as a tar.gz. file).
>
> My understanding is that HadoopMapreduce will unarchive this tgz file and
> then for every task attempt symlink it int
rs loaded into the
> Distributed cache, several (~4) nodes have their map jobs fails because
> of ClassNotFoundException. All the other nodes proceed through the job
> normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
>
> Can anyone explain why some nodes mi
Running hadoop-0.20.2 on a 20 node cluster.
When running a Map/Reduce job that uses several .jars loaded into the
Distributed cache, several (~4) nodes have their map jobs fails because
of ClassNotFoundException. All the other nodes proceed through the job
normally and the jobs completes. But
gt; http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
>> >
>> > That should give you what you want.
>> >
>> > hth,
>> > Arun
>> >
>> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
>> >
allel in a local filesystem - not too much of a
> >> performance hit for small reads (by virtue of OS caches, and quick
> >> completion per read, as is usually the case for distributed cache
> >> files), and gradually decreasing performance for long reads (due to
> >> freq
:48 PM, Harsh J wrote:
>>
>> Hi Lin,
>>
>> It is comparable (and is also logically similar) to reading a file
>> multiple times in parallel in a local filesystem - not too much of a
>> performance hit for small reads (by virtue of OS caches, and quick
>>
e of OS caches, and quick
> completion per read, as is usually the case for distributed cache
> files), and gradually decreasing performance for long reads (due to
> frequent disk physical movement)? Thankfully, due to block sizes the
> latter isn't a problem for large files on a
Hi Lin,
It is comparable (and is also logically similar) to reading a file
multiple times in parallel in a local filesystem - not too much of a
performance hit for small reads (by virtue of OS caches, and quick
completion per read, as is usually the case for distributed cache
files), and
cas
> >> spread across racks such that on task bootup the downloads happen with
> >> rack locality.
> >>
> >> On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote:
> >> > Hi Kai,
> >> >
> >> > Smart answer! :-)
> >> >
> &
- you would want more replicas
>> spread across racks such that on task bootup the downloads happen with
>> rack locality.
>>
>> On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote:
>> > Hi Kai,
>> >
>> > Smart answer! :-)
>> >
>> > The assum
licas
> spread across racks such that on task bootup the downloads happen with
> rack locality.
>
> On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote:
> > Hi Kai,
> >
> > Smart answer! :-)
> >
> > The assumption you have is one distributed cache replica coul
so tied to
the concept of racks in a cluster - you would want more replicas
spread across racks such that on task bootup the downloads happen with
rack locality.
On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote:
> Hi Kai,
>
> Smart answer! :-)
>
> The assumption you have is one distri
I have figured out the 2nd issue, appreciate if anyone could advise on the
first issue.
regards,
Lin
On Sat, Dec 22, 2012 at 9:24 PM, Lin Ma wrote:
> Hi Kai,
>
> Smart answer! :-)
>
>- The assumption you have is one distributed cache replica could only
>serve one do
Hi Kai,
Smart answer! :-)
- The assumption you have is one distributed cache replica could only
serve one download session for tasktracker node (this is why you get
concurrency n/r). The question is, why one distributed cache replica cannot
serve multiple concurrent download session
Hi,
simple math. Assuming you have n TaskTrackers in your cluster that will need to
access the files in the distributed cache. And r is the replication level of
those files.
Copying the files into HDFS requires r copy operations over the network. The n
TaskTrackers need to get their local
Thanks Kai, using higher replication count for the purpose of?
regards,
Lin
On Sat, Dec 22, 2012 at 8:44 PM, Kai Voigt wrote:
> Hi,
>
> Am 22.12.2012 um 13:03 schrieb Lin Ma :
>
> > I want to confirm when on each task node either mapper or reducer access
> distributed cach
Hi,
Am 22.12.2012 um 13:03 schrieb Lin Ma :
> I want to confirm when on each task node either mapper or reducer access
> distributed cache file, it resides on disk, not resides in memory. Just want
> to make sure distributed cache file does not fully loaded into memory which
> co
Hi guys,
I want to confirm when on each task node either mapper or reducer access
distributed cache file, it resides on disk, not resides in memory. Just
want to make sure distributed cache file does not fully loaded into memory
which compete memory consumption with mapper/reducer tasks. Is that
On 12/07/2012 03:49 PM, surfer wrote:
> Hello Peter
> In my, humble, experience I never get hadoop 1.0.3 to work with
> distributed cache and the new api (mapreduce). with the old api it works.
> giovanni
>
> P.S. I already tried the approaches suggested by both Dhaval and H
tCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent i
-
From: Peter Cogan
Date: Fri, 7 Dec 2012 14:06:41
To:
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan wrote:
> Hi,
>
> It's an instance
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni
P.S. I already tried the approaches suggested by both Dhaval and Harsh J
On 12/06/2012 05:59 PM, Peter Cogan wrote:
>
> Hi ,
>
;> What is your conf object there? Is it job.getConfiguration() or an
>>> independent instance?
>>>
>>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan
>>> wrote:
>>> > Hi ,
>>> >
>>> > I want to use the distributed cac
You will need to add the cache file to distributed cache before creating the
Job object.. Give that a spin and see if that works
Regards,
Dhaval
From: Peter Cogan
To: user@hadoop.apache.org
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using
gt; What is your conf object there? Is it job.getConfiguration() or an
>> independent instance?
>>
>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan
>> wrote:
>> > Hi ,
>> >
>> > I want to use the distributed cache to allow my mappers to acce
e/testCache1"),
conf);
On Thu, Dec 6, 2012 at 5:02 PM, Harsh J wrote:
> What is your conf object there? Is it job.getConfiguration() or an
> independent instance?
>
> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan
> wrote:
> > Hi ,
> >
> > I want to use the
What is your conf object there? Is it job.getConfiguration() or an
independent instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan wrote:
> Hi ,
>
> I want to use the distributed cache to allow my mappers to access data. In
> main, I'm using the command
>
> Distribute
hen I use the distributed cache , I found that when the file is more than
> 100MB or the number of records are more than 10 million
> ,
>
> the file can not be cache in the memory; and I try to set the io.sort.mb is
> 200MB ;
> it still can not work, Any suggestion would be fine!
when I use the distributed cache , I found that when the file is more than
100MB or the number of records are more than 10 million ,
the file can not be cache in the memory; and I try to set the io.sort.mb is
200MB ;
it still can not work, Any suggestion would be fine! Thank you !
2012-11-16
On 31 October 2012 12:13, Saurabh Mishra wrote:
> Hi,
> I tried adding jars to distributed cache through following code :
> DistributedCache.addArchiveToClassPath(path, jobConf);
> DistributedCache.addCacheArchive(path.toUri(), jobConf);
>
> It w
I'll try that thanks for the suggestion Steve!
Mark
On Fri, Oct 12, 2012 at 11:27 AM, Steve Loughran wrote:
>
>
> On 11 October 2012 20:53, Mark Olimpiati wrote:
>
>> Thanks for the reply Harsh, but as I said I tried locally too by using
>> the following:
>>
>> FileSystem localFs = cachedFi
before job submission time, then create a serialized
>> representation (perhaps just Java object serialization), and send the
>> serialized form through distributed cache? Then, each reducer would just
>> need to deserialize during setup() instead of recomputing the full radix
>
g the setup time of the radix tree, is it possible to
precompute the radix tree before job submission time, then create a
serialized representation (perhaps just Java object serialization),
and send the serialized form through distributed cache? Then, each
reducer would just need to deserialize d
On 11 October 2012 20:53, Mark Olimpiati wrote:
> Thanks for the reply Harsh, but as I said I tried locally too by using the
> following:
>
> FileSystem localFs = cachedFiles[0].getFileSystem(new
> Configuration());
>
>
> Isn't the above supposed to give me the local file system ?? If yes, I
> On Thu, Oct 11, 2012 at 5:15 AM, Mark Olimpiati
> wrote:
> > Hi,
> >
> > I'm storing sequence files in the distributed cache which seems to be
> > stored somewher under each node's /tmp .../local/archive/ ... path.
> >
> > In mapper code,
Hello Kyle,
Regarding the setup time of the radix tree, is it possible to precompute
the radix tree before job submission time, then create a serialized
representation (perhaps just Java object serialization), and send the
serialized form through distributed cache? Then, each reducer would just
Problem Background:
I have a Hadoop MapReduce program that uses a IPv6 radix tree to provide
auxiliary input during the reduce phase of the second job in it's
workflow, but doesn't need the data at any other point.
It seems pretty straight forward to use the distributed cache to b
e files in the distributed cache which seems to be
> stored somewher under each node's /tmp .../local/archive/ ... path.
>
> In mapper code, I tried using SequenceFile.Reader with all possible
> configurations (locally, distribtued) however, it can't find it. Are
> sequence
Hi,
I'm storing sequence files in the distributed cache which seems to be
stored somewher under each node's /tmp .../local/archive/ ... path.
In mapper code, I tried using SequenceFile.Reader with all possible
configurations (locally, distribtued) however, it can't find it. Are
repository.
>
> The same question was asked in the jira but without clear resolution.
> https://issues.apache.org/jira/browse/MAPREDUCE-236
>
> My question might be related to
> https://issues.apache.org/jira/browse/MAPREDUCE-4408
> which is resolved for next version. But i
browse/MAPREDUCE-236
My question might be related to
https://issues.apache.org/jira/browse/MAPREDUCE-4408
which is resolved for next version. But it seems to be only about uberjar
and I am using a standard jar.
If it works with a hdfs location, what are the details? Won't it be cleaned
duri
99 matches
Mail list logo