CVE-2018-8009: Apache Hadoop distributed cache archive vulnerability
Severity: Severe
Vendor: The Apache Software Foundation
Versions Affected:
Hadoop 0.23.0 to 0.23.11
Hadoop 2.0.0-alpha to 2.7.6
Hadoop 2.8.0 to 2.8.4
Hadoop 2.9.0 to 2.9.1
Hadoop 3.0.0-alpha to 3.0.2
Hadoop 3.1.0
Hi
My code creates a new job named "job 1" which writes something to distributed
cache (say a text file) and the job gets completed.
Just to manage expectations, you add files to the distributed cache_in the job
driver_, and the framework makes them available to maps and reducers.
Hi, Siddharth.
Not sure I fully understand your problem. I think you are saying that you
would like to run an initial M/R job to create some data that n jobs after that
will be able to use, and you are saying you’d like to use the distributed cache
for that. I think you may not need
to currently running job only).
On Tue, Jun 7, 2016 at 6:36 PM, Arun Natva <arun.na...@gmail.com> wrote:
> If you use the Instance of Job class, you can add files to distributed
> cache like this:
> Job job = Job.getInstanceOf(conf);
> job.addCacheFiles(filepath);
>
>
> Sen
Hi Jeff,
Thanks for your prompt reply. Actually my problem is as follows:
My code creates a new job named "job 1" which writes something to
distributed cache (say a text file) and the job gets completed.
Now, I want to create some n number of jobs in while loop below, which
reads the
Hi, Siddharth.
I was also a bit frustrated at what I found to be scant documentation on how to
use the distributed cache in Hadoop 2. The DistributedCache class itself was
deprecated in Hadoop 2, but there don’t appear to be very clear instructions on
the alternative. I think it’s actually
If you use the Instance of Job class, you can add files to distributed cache
like this:
Job job = Job.getInstanceOf(conf);
job.addCacheFiles(filepath);
Sent from my iPhone
> On Jun 7, 2016, at 5:17 AM, Siddharth Dawar <siddharthdawa...@gmail.com>
> wrote:
>
> Hi,
>
>
Int(args[3]));conf2.setNumReduceTasks(Integer.parseInt(args[4]));FileInputFormat.addInputPath(conf2,
new Path(input));FileOutputFormat.setOutputPath(conf2, new
Path(output)); }
RunningJob job = JobClient.runJob(conf2);
}
Now, I want the first Job which gets created to write something in the
distrib
Hi,
I want to use the distributed cache to allow my mappers to access data in
Hadoop 2.7.2. In main, I'm using the command
String hdfs_path="hdfs://localhost:9000/bloomfilter";InputStream in =
new BufferedInputStream(new
FileInputStream("/home/siddharth/Desktop/data/bloom_filter&q
Hello,
I'm new to Hadoop and a bit used by one thing about distributed cache -
when do files added to distributed cache get deleted?
I'm concretely interested in Hadoop 0.20.2.
I read the following from Hadoop the definitive guide Files are deleted to
make room for a new file when the cache
On Mon, May 11, 2015 at 5:25 PM, marko.di...@nissatech.com
mailto:marko.di...@nissatech.com wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence
file that I add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now
about it to understand how it works.
Thanks,
Marko
On 05/11/2015 11:25 PM, marko.di...@nissatech.com wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence file
that I add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now
...@nissatech.com
mailto:marko.di...@nissatech.com wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a
sequence file that I add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now
in pseudo-distributed and distributed I do
wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence
file that I add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now in
pseudo-distributed and distributed I do.
I'm adding file to distributed cache like
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence file that I
add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now in pseudo-
distributed and distributed I do.
I'm adding file to distributed cache like this
And reading from
What version are you using?
Have you seen this?
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, marko.di...@nissatech.com wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence file
that I add to distributed cache.
I didn't have problems when I ran
.
*From:* sam liu [mailto:samliuhad...@gmail.com]
*Sent:* Wednesday, May 28, 2014 7:40 AM
*To:* user@hadoop.apache.org
*Subject:* Re: File Permission Issue using Distributed Cache of
Hadoop-2.2.0
Is this possible a Hadoop issue? Or any options is wrong in my cluster?
2014-05-27 13:58 GMT
running Hadoop 2.2, my job places files in the distributed cache.
in my mapper setup, I call context.getCacheFiles() and get back a URI[] with
contents that make sense.
in my reducer setup, I call context.getCacheFiles() and get back null.
Is this expected behavior? If so, how do I get
@hadoop.apache.org
*Subject:* Re: File Permission Issue using Distributed Cache of
Hadoop-2.2.0
Is this possible a Hadoop issue? Or any options is wrong in my cluster?
2014-05-27 13:58 GMT+08:00 sam liu samliuhad...@gmail.com:
Hi Experts,
The original local file has execution permission
, 2014 7:40 AM
To: user@hadoop.apache.org
Subject: Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0
Is this possible a Hadoop issue? Or any options is wrong in my cluster?
2014-05-27 13:58 GMT+08:00 sam liu samliuhad...@gmail.com:
Hi Experts,
The original local file has
Hi Experts,
The original local file has execution permission, and then it was
distributed to multiple nodemanager nodes with Distributed Cache feature of
Hadoop-2.2.0, but the distributed file has lost the execution permission.
However I did not encounter such issue in Hadoop-1.1.1.
Why
as part of my build
environment in Eclipse.
Any ideas on what could be incorrect?
If I'm incorrectly using the distributed cache, could someone point me to
an example using the distributed cache with Hadoop 2.2.0?
Thanks for your help!
Jonathan
in the 2.2.0 API?
Jonathan
On Thu, Mar 27, 2014 at 11:17 AM, Serge Blazhievsky hadoop...@gmail.comwrote:
How are you putting files in distributed cache ?
Sent from my iPhone
On Mar 27, 2014, at 9:20 AM, Jonathan Poon jkp...@ucdavis.edu wrote:
Hi Stanley,
Sorry about the confusion, but I'm
are you putting files in distributed cache ?
Sent from my iPhone
On Mar 27, 2014, at 9:20 AM, Jonathan Poon jkp...@ucdavis.edu wrote:
Hi Stanley,
Sorry about the confusion, but I'm trying to read a txt file into my Mapper
function. I am trying to copy the file using the -files option
saying getLocalCacheFiles() is undefined. I've
imported the hadoop-mapreduce-client-core-2.2.0.jar as part of my build
environment in Eclipse.
Any ideas on what could be incorrect?
If I'm incorrectly using the distributed cache, could someone point me to
an example using the distributed cache
environment in Eclipse.
Any ideas on what could be incorrect?
If I'm incorrectly using the distributed cache, could someone point me to
an example using the distributed cache with Hadoop 2.2.0?
Thanks for your help!
Jonathan
Hello,
I was wondering if anyone might know of a way to write bytes directly to the
distributed cache. I know I can call job.addCacheFile(URI uri), but in my
case the file I wish to add to the cache is in memory and is job specific.
I would prefer not writing it to a location that I have
]
*Sent:* Wednesday, July 10, 2013 9:43 PM
*To:* user@hadoop.apache.org
*Subject:* Re: New Distributed Cache
Also, once you have the array of URIs after calling getCacheFiles you
can iterate over them using File class or Path (
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs
Hi,
Thanks for the response. I can create symlinks for the files. But I don't know
how to add jar to distributed cache. I found one way is by using libjars
argument while running hadoop job. Is it possible to add a jar file directly to
distributed cache? Is there any specific folder in HDFS
Hi,
Is it possible to access distributed cache in command line? I have written a
custom InputFormat implementation which I want to add to distributed cache.
Using libjars is not an option for me as I am not running Hadoop job in command
line. I am running it using RHadoop package in R which
Hi,
I have no idea about RHadoop but in general in YARN we do create symlinks
for the files in distributed cache in the current working directory of
every container. You may be able to use that somehow.
Thanks,
Omkar Joshi
*Hortonworks Inc.* http://www.hortonworks.com
On Mon, Sep 23, 2013 at 6
a NullPointerException when I do that in the Mapper code.
Any suggesstions?
Andrew
From: Shahab Yunus [mailto:shahab.yu...@gmail.com]
Sent: Wednesday, July 10, 2013 9:43 PM
To: user@hadoop.apache.org
Subject: Re: New Distributed Cache
Also, once you have the array of URIs after calling getCacheFiles
a NullPointerException when I do that in the Mapper code.**
**
** **
Any suggesstions?
** **
Andrew
** **
*From:* Shahab Yunus [mailto:shahab.yu...@gmail.com]
*Sent:* Wednesday, July 10, 2013 9:43 PM
*To:* user@hadoop.apache.org
*Subject:* Re: New Distributed Cache
** **
Also
@hadoop.apache.org
Subject: Re: Distributed Cache
You should use Job#addCacheFile()
Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew
andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote:
Hi,
I was wondering if I can still use the DistributedCache class in the latest
release of Hadoop (Version
Hi,
I am trying to store a file in the Distributed Cache during my Hadoop job.
In the driver class, I tell the job to store the file in the cache with this
code:
Job job = Job.getInstance();
job.addCacheFile(new URI(file name));
That all compiles fine. In the Mapper code, I try accessing
did you try JobContext.getCacheFiles() ?
Thanks,
Omkar Joshi
*Hortonworks Inc.* http://www.hortonworks.com
On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew andrew.bote...@emc.comwrote:
Hi,
** **
I am trying to store a file in the Distributed Cache during my Hadoop job
Mapper code? Is there
a method that will look for any files in the cache?
** **
Thanks,
** **
Andrew
** **
*From:* Ted Yu [mailto:yuzhih...@gmail.com]
*Sent:* Tuesday, July 09, 2013 6:08 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Distributed Cache
** **
You
()?
Thanks,
Andrew
From: Omkar Joshi [mailto:ojo...@hortonworks.com]
Sent: Wednesday, July 10, 2013 5:15 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache
try JobContext.getCacheFiles()
Thanks,
Omkar Joshi
Hortonworks Inc.http://www.hortonworks.com
On Wed, Jul 10, 2013 at 6:31 AM, Botelho
JobContext.getCacheFiles()?
** **
Thanks,
** **
Andrew
** **
*From:* Omkar Joshi [mailto:ojo...@hortonworks.com]
*Sent:* Wednesday, July 10, 2013 5:15 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Distributed Cache
** **
try JobContext.getCacheFiles()
Thanks
...@hortonworks.com wrote:
did you try JobContext.getCacheFiles() ?
Thanks,
Omkar Joshi
*Hortonworks Inc.* http://www.hortonworks.com
On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew
andrew.bote...@emc.comwrote:
Hi,
** **
I am trying to store a file in the Distributed Cache during my Hadoop
Hi,
I was wondering if I can still use the DistributedCache class in the latest
release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed
cache:
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import
to try and add a file to the
distributed cache:
** **
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import
to the
distributed cache:
** **
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import
This has been discussed before, see
http://search-hadoop.com/m/xI5AHMD0Vm1 for the previous discussion on
this.
On Wed, May 8, 2013 at 12:54 AM, Saeed Shahrivari
saeed.shahriv...@gmail.com wrote:
Would you please tell me why we should use Distributed Cache instead of
HDFS?
Because HDFS seems
Would you please tell me why we should use Distributed Cache instead of
HDFS?
Because HDFS seems more stable, easier to use, and less error-prone.
Thanks in advance.
Not sure what you mean...
If you want to put up a small file to be used by each Task in your job (mapper
or reducer)... you could put it up on HDFS.
Or if you're launching your job from an edge node, you could read in the small
file and put it in to the distributed cache.
It really depends
researching a Hadoop solution for an existing application that
requires a directory structure full of data for processing.
To make the Hadoop solution work I need to deploy the data directory to
each DN when the job is executed.
I know this isn't new and commonly done with a Distributed Cache.
*Based
a replication factor equal to the number of DNHmmm... I'm not sure I
understand: there are 8 DN in mytest cluster.
Date: Tue, 9 Apr 2013 04:49:17 -0700
Subject: Re: Distributed cache: how big is too big?
From: bjorn...@gmail.com
To: user@hadoop.apache.org
Put it once on hdfs with a replication
Hmmm.. maybe im missing something.. but (@bjorn) Why would you use hdfs as a
replacement for the distributed cache?
After all - the distributed cache is just a file with replication over the
whole cluster, which isn't in hdfs. Cant you Just make the cache size big and
store the file
I think the correct question is why would you use distributed cache for a
large file that is read during map/reduce instead of plain hdfs? It does
not sound wise to shuffle GB of data onto all nodes on each job submission
and then just remove it when the job is done. I would think about picking
The Distributed Cache uses the shared file system (which ever is specified).
The Distributed Cache can be loaded via the GenericOptionsParser / TooRunner
parameters. Those parameters (-files, -archives, -libjars) are seen on the
commandline and available in a MR driver class that implements
Cache.
Based on experience what are the common file sizes deployed in a Distributed
Cache? I know smaller is better, but how big is too big? the larger cache
deployed I have read there will be startup latency. I also assume there are
other factors that play into this.
I know that-Default
(the nodes are really
bare and installing the language on the node is not an option) using the
distributed cache (as a tar.gz. file).
My understanding is that HadoopMapreduce will unarchive this tgz file and
then for every task attempt symlink it into the task attempt's working
folder.
However
to and could
not do the System.loadLibrary call recommended).
You can also bundle the stuff into the JAR file in
a subdir and that will be unpacked to the local
working dir. The nice thing about using the
distributed cache is the files only need to be pushed
to the cluster once with a copyFromLocal
Running hadoop-0.20.2 on a 20 node cluster.
When running a Map/Reduce job that uses several .jars loaded into the
Distributed cache, several (~4) nodes have their map jobs fails because
of ClassNotFoundException. All the other nodes proceed through the job
normally and the jobs completes
into the
Distributed cache, several (~4) nodes have their map jobs fails because
of ClassNotFoundException. All the other nodes proceed through the job
normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
Can anyone explain why some nodes might fail to read all the .jars from
Hi,
As I suspected, cache files are symlinked after a child JVM is
started: TaskRunner.setupWorkDir is being called from
org.apache.hadoop.mapred.Child.main.
This is unfortunate as it makes impossible to leverage distributed
cache for the purpose of deploying JVM agents. I could submit a jira
:
http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
That should give you what you want.
hth,
Arun
On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
Hi,
I am seeking a way to leverage hadoop's distributed cache in order to
ship jars
with
rack locality.
On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma lin...@gmail.com wrote:
Hi Kai,
Smart answer! :-)
The assumption you have is one distributed cache replica could
only
serve
one download session for tasktracker node (this is why you get
to
the concept of racks in a cluster - you would want more replicas
spread across racks such that on task bootup the downloads happen with
rack locality.
On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma lin...@gmail.com wrote:
Hi Kai,
Smart answer! :-)
The assumption you have is one distributed cache
.
On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma lin...@gmail.com wrote:
Hi Kai,
Smart answer! :-)
The assumption you have is one distributed cache replica could only
serve
one download session for tasktracker node (this is why you get
concurrency
n/r). The question is, why one distributed
Hi Lin,
It is comparable (and is also logically similar) to reading a file
multiple times in parallel in a local filesystem - not too much of a
performance hit for small reads (by virtue of OS caches, and quick
completion per read, as is usually the case for distributed cache
files
, and quick
completion per read, as is usually the case for distributed cache
files), and gradually decreasing performance for long reads (due to
frequent disk physical movement)? Thankfully, due to block sizes the
latter isn't a problem for large files on a proper DN, as the blocks
are spread
Lin,
It is comparable (and is also logically similar) to reading a file
multiple times in parallel in a local filesystem - not too much of a
performance hit for small reads (by virtue of OS caches, and quick
completion per read, as is usually the case for distributed cache
files), and gradually
I have figured out the 2nd issue, appreciate if anyone could advise on the
first issue.
regards,
Lin
On Sat, Dec 22, 2012 at 9:24 PM, Lin Ma lin...@gmail.com wrote:
Hi Kai,
Smart answer! :-)
- The assumption you have is one distributed cache replica could only
serve one download
Hi guys,
I want to confirm when on each task node either mapper or reducer access
distributed cache file, it resides on disk, not resides in memory. Just
want to make sure distributed cache file does not fully loaded into memory
which compete memory consumption with mapper/reducer tasks
Hi,
Am 22.12.2012 um 13:03 schrieb Lin Ma lin...@gmail.com:
I want to confirm when on each task node either mapper or reducer access
distributed cache file, it resides on disk, not resides in memory. Just want
to make sure distributed cache file does not fully loaded into memory which
Hi Kai,
Smart answer! :-)
- The assumption you have is one distributed cache replica could only
serve one download session for tasktracker node (this is why you get
concurrency n/r). The question is, why one distributed cache replica cannot
serve multiple concurrent download session
On 12/07/2012 03:49 PM, surfer wrote:
Hello Peter
In my, humble, experience I never get hadoop 1.0.3 to work with
distributed cache and the new api (mapreduce). with the old api it works.
giovanni
P.S. I already tried the approaches suggested by both Dhaval and Harsh J
I'm writing
instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan peter.co...@gmail.com
wrote:
Hi ,
I want to use the distributed cache to allow my mappers to access data.
In
main, I'm using the command
DistributedCache.addCacheFile(new
URI(/user/peter/cacheFile/testCache1),
conf);
Where
You will need to add the cache file to distributed cache before creating the
Job object.. Give that a spin and see if that works
Regards,
Dhaval
From: Peter Cogan peter.co...@gmail.com
To: user@hadoop.apache.org
Sent: Friday, 7 December 2012 9:06 AM
Subject
the distributed cache to allow my mappers to access data.
In
main, I'm using the command
DistributedCache.addCacheFile(new
URI(/user/peter/cacheFile/testCache1),
conf);
Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
Then, my setup function looks like
-
From: Peter Cogan peter.co...@gmail.com
Date: Fri, 7 Dec 2012 14:06:41
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Re: Problem using distributed cache
Hi,
any thoughts on this would be much appreciated
thanks
Peter
On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan peter.co
wrote:
Hi ,
I want to use the distributed cache to allow my mappers to access
data.
In
main, I'm using the command
DistributedCache.addCacheFile(new
URI(/user/peter/cacheFile/testCache1),
conf);
Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
What is your conf object there? Is it job.getConfiguration() or an
independent instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan peter.co...@gmail.com wrote:
Hi ,
I want to use the distributed cache to allow my mappers to access data. In
main, I'm using the command
...@ipinyou.com
**
when I use the distributed cache , I found that when the file is more than
100MB or the number of records are more than 10 million
,
the file can not be cache in the memory; and I try to set the io.sort.mb is
200MB ;
it still can not work, Any suggestion would be fine! Thank you
On 31 October 2012 12:13, Saurabh Mishra saurabhmishra.i...@outlook.comwrote:
Hi,
I tried adding jars to distributed cache through following code :
DistributedCache.addArchiveToClassPath(path, jobConf);
DistributedCache.addCacheArchive(path.toUri(), jobConf
I'll try that thanks for the suggestion Steve!
Mark
On Fri, Oct 12, 2012 at 11:27 AM, Steve Loughran ste...@hortonworks.comwrote:
On 11 October 2012 20:53, Mark Olimpiati markq2...@gmail.com wrote:
Thanks for the reply Harsh, but as I said I tried locally too by using
the following:
representation (perhaps just Java object serialization), and send the
serialized form through distributed cache? Then, each reducer would just
need to deserialize during setup() instead of recomputing the full radix
tree for every reducer task. That might save time.
Regarding the memory
Problem Background:
I have a Hadoop MapReduce program that uses a IPv6 radix tree to provide
auxiliary input during the reduce phase of the second job in it's
workflow, but doesn't need the data at any other point.
It seems pretty straight forward to use the distributed cache to build
Hello Kyle,
Regarding the setup time of the radix tree, is it possible to precompute
the radix tree before job submission time, then create a serialized
representation (perhaps just Java object serialization), and send the
serialized form through distributed cache? Then, each reducer would just
, 2012 at 5:15 AM, Mark Olimpiati markq2...@gmail.com
wrote:
Hi,
I'm storing sequence files in the distributed cache which seems to be
stored somewher under each node's /tmp .../local/archive/ ... path.
In mapper code, I tried using SequenceFile.Reader with all possible
Hi,
I'm storing sequence files in the distributed cache which seems to be
stored somewher under each node's /tmp .../local/archive/ ... path.
In mapper code, I tried using SequenceFile.Reader with all possible
configurations (locally, distribtued) however, it can't find it. Are
sequence files
files in the distributed cache which seems to be
stored somewher under each node's /tmp .../local/archive/ ... path.
In mapper code, I tried using SequenceFile.Reader with all possible
configurations (locally, distribtued) however, it can't find it. Are
sequence files supported
for next version. But it seems to be only about uberjar
and I am using a standard jar.
If it works with a hdfs location, what are the details? Won't it be cleaned
during job termination? Why not? Will it also be setup within the
distributed cache?
Regards
Bertrand
PS : I know there are others
is resolved for next version. But it seems to be only about uberjar
and I am using a standard jar.
If it works with a hdfs location, what are the details? Won't it be cleaned
during job termination? Why not? Will it also be setup within the
distributed cache?
Regards
Bertrand
PS : I know
Hi all
How do you add a small file to distributed cache in MR program
Regards
Abhi
Sent from my iPhone
Hi Abshiek
You can find a simple example of using Distributed Cache here
http://kickstarthadoop.blogspot.co.uk/2011/05/hadoop-for-dependent-data-splits-using.html
--Original Message--
From: Abhishek
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Add file
/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache
That should give you what you want.
hth,
Arun
On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
Hi,
I am seeking a way to leverage hadoop's distributed cache in order to
ship jars that are required to bootstrap a task's
to leverage hadoop's distributed cache in order to
ship jars that are required to bootstrap a task's jvm, i.e., before a
map/reduce task is launched.
As a concrete example, let's say that I need to launch with
'-javaagent:/path/profiler.jar'. In theory, the task tracker is
responsible for downloading
distributed cache in order to
ship jars that are required to bootstrap a task's jvm, i.e., before a
map/reduce task is launched.
As a concrete example, let's say that I need to launch with
'-javaagent:/path/profiler.jar'. In theory, the task tracker is
responsible for downloading cached files onto
/mapred_tutorial.html#DistributedCache
That should give you what you want.
hth,
Arun
On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
Hi,
I am seeking a way to leverage hadoop's distributed cache in order to
ship jars that are required to bootstrap a task's jvm, i.e., before a
map
On Fri, Aug 3, 2012 at 1:31 PM, Harsh J ha...@cloudera.com wrote:
What this would do is merely take your passed -files jar (client-common) and
symlink it into the JVM's working directory (the task's working directory)
_before_ the JVM is begun, as foo.jar. So if I pass additionally, JVM opts
On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote:
Hi,
I am seeking a way to leverage hadoop's distributed cache in order to
ship jars that are required to bootstrap a task's jvm, i.e., before a
map/reduce task is launched.
As a concrete example, let's say that I need to launch with
'-javaagent
On Fri, Aug 3, 2012 at 4:19 PM, Arun C Murthy a...@hortonworks.com wrote:
Just do -javaagent:./profiler.jar?
Yep, that should work. Thanks!
On Tue, Jul 31, 2012 at 7:26 PM, Michael Segel
michael_se...@hotmail.com wrote:
Hi Stan,
If I understood your question... you want to ship a jar to the nodes where
the task will run prior to the start of the task?
Not sure what it is you're trying to do...
Your example isn't really clear.
filesystem.
Thanks,
stan
On Mon, Jul 30, 2012 at 6:23 PM, Stan Rosenberg
stan.rosenb...@gmail.com wrote:
Hi,
I am seeking a way to leverage hadoop's distributed cache in order to
ship jars that are required to bootstrap a task's jvm, i.e., before a
map/reduce task is launched.
As a concrete
Forwarding to common-user to hopefully get more exposure...
-- Forwarded message --
From: Stan Rosenberg stan.rosenb...@gmail.com
Date: Tue, Jul 31, 2012 at 11:55 AM
Subject: Re: task jvm bootstrapping via distributed cache
To: mapreduce-u...@hadoop.apache.org
I am guessing
Rosenberg stan.rosenb...@gmail.com wrote:
Forwarding to common-user to hopefully get more exposure...
-- Forwarded message --
From: Stan Rosenberg stan.rosenb...@gmail.com
Date: Tue, Jul 31, 2012 at 11:55 AM
Subject: Re: task jvm bootstrapping via distributed cache
()));
String line = null;
while ((line = wordReader.readLine()) != null) {
dyads.add(line);
} // end of while
wordReader.close();
}// end of try
catch (IOException ioe) {
System.err.println(IOException reading from distributed cache);
} // end of catch
1 - 100 of 226 matches
Mail list logo