CVE-2018-8009: Apache Hadoop distributed cache archive vulnerability

2018-11-21 Thread Akira Ajisaka
CVE-2018-8009: Apache Hadoop distributed cache archive vulnerability Severity: Severe Vendor: The Apache Software Foundation Versions Affected: Hadoop 0.23.0 to 0.23.11 Hadoop 2.0.0-alpha to 2.7.6 Hadoop 2.8.0 to 2.8.4 Hadoop 2.9.0 to 2.9.1 Hadoop 3.0.0-alpha to 3.0.2 Hadoop 3.1.0

Re: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-21 Thread Gabriel Balan
Hi My code creates a new job named "job 1" which writes something to distributed cache (say a text file) and the job gets completed. Just to manage expectations, you add files to the distributed cache_in the job driver_, and the framework makes them available to maps and reducers.

RE: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-20 Thread Guttadauro, Jeff
Hi, Siddharth. Not sure I fully understand your problem. I think you are saying that you would like to run an initial M/R job to create some data that n jobs after that will be able to use, and you are saying you’d like to use the distributed cache for that. I think you may not need the

Re: How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

2016-06-13 Thread Siddharth Dawar
currently running job only). On Tue, Jun 7, 2016 at 6:36 PM, Arun Natva wrote: > If you use the Instance of Job class, you can add files to distributed > cache like this: > Job job = Job.getInstanceOf(conf); > job.addCacheFiles(filepath); > > > Sent from my iPhone > >

Re: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-13 Thread Siddharth Dawar
Hi Jeff, Thanks for your prompt reply. Actually my problem is as follows: My code creates a new job named "job 1" which writes something to distributed cache (say a text file) and the job gets completed. Now, I want to create some n number of jobs in while loop below, which reads the

RE: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-07 Thread Guttadauro, Jeff
Hi, Siddharth. I was also a bit frustrated at what I found to be scant documentation on how to use the distributed cache in Hadoop 2. The DistributedCache class itself was deprecated in Hadoop 2, but there don’t appear to be very clear instructions on the alternative. I think it’s actually

Re: How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

2016-06-07 Thread Arun Natva
If you use the Instance of Job class, you can add files to distributed cache like this: Job job = Job.getInstanceOf(conf); job.addCacheFiles(filepath); Sent from my iPhone > On Jun 7, 2016, at 5:17 AM, Siddharth Dawar > wrote: > > Hi, > > I wrote a program which creates

How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

2016-06-07 Thread Siddharth Dawar
Int(args[3]));conf2.setNumReduceTasks(Integer.parseInt(args[4]));FileInputFormat.addInputPath(conf2, new Path(input));FileOutputFormat.setOutputPath(conf2, new Path(output)); } RunningJob job = JobClient.runJob(conf2); } Now, I want the first Job which gets created to write something in the distrib

Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-07 Thread Siddharth Dawar
Hi, I want to use the distributed cache to allow my mappers to access data in Hadoop 2.7.2. In main, I'm using the command String hdfs_path="hdfs://localhost:9000/bloomfilter";InputStream in = new BufferedInputStream(new FileInputStream("/home/siddharth/Desktop/data/bloom_fil

Files in distributed cache

2015-05-27 Thread Marko Dinic
Hello, I'm new to Hadoop and a bit used by one thing about distributed cache - when do files added to distributed cache get deleted? I'm concretely interested in Hadoop 0.20.2. I read the following from Hadoop the definitive guide "Files are deleted to make room for a new file

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
..@nissatech.com <mailto:marko.di...@nissatech.com> wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now in pseudo-distributed and dist

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Shahab Yunus
ion about it to understand how it works. > > Thanks, > Marko > > > On 05/11/2015 11:25 PM, marko.di...@nissatech.com wrote: > > Hello, > > > > I'm new to Hadoop and I'm having a problem reading from a sequence file > that I add to distributed cach

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
h.com wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now in pseudo-distributed and distributed I do. I'm adding file to di

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
Regards, Shahab On Mon, May 11, 2015 at 5:25 PM, mailto:marko.di...@nissatech.com>> wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mo

Re: Reading a sequence file from distributed cache

2015-05-11 Thread Shahab Yunus
What version are you using? Have you seen this? Regards, Shahab On Mon, May 11, 2015 at 5:25 PM, wrote: > Hello, > > > > I'm new to Hadoop and I'm having a problem reading from a sequence file > that I add to distributed cache. > > > > I didn't h

Reading a sequence file from distributed cache

2015-05-11 Thread marko.dinic
Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now in pseudo- distributed and distributed I do. I'm adding file to distributed cache like this

Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-30 Thread sam liu
OP-3078 >> >> https://issues.apache.org/jira/browse/HDFS-4659 >> >> >> >> Cheers >> >> Seb. >> >> >> >> *From:* sam liu [mailto:samliuhad...@gmail.com] >> *Sent:* Wednesday, May 28, 2014 7:40 AM >> *To:* user@hadoop.

distributed cache in reducer

2014-05-30 Thread Brian Jeltema
running Hadoop 2.2, my job places files in the distributed cache. in my mapper setup, I call context.getCacheFiles() and get back a URI[] with contents that make sense. in my reducer setup, I call context.getCacheFiles() and get back null. Is this expected behavior? If so, how do I get the

Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-30 Thread sam liu
/jira/browse/HADOOP-3078 > > https://issues.apache.org/jira/browse/HDFS-4659 > > > > Cheers > > Seb. > > > > *From:* sam liu [mailto:samliuhad...@gmail.com] > *Sent:* Wednesday, May 28, 2014 7:40 AM > *To:* user@hadoop.apache.org > *Subject:* Re: File Permission Is

RE: File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-28 Thread Sebastian Gäde
, 2014 7:40 AM To: user@hadoop.apache.org Subject: Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0 Is this possible a Hadoop issue? Or any options is wrong in my cluster? 2014-05-27 13:58 GMT+08:00 sam liu : Hi Experts, The original local file has execution permission, and

Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-27 Thread sam liu
Is this possible a Hadoop issue? Or any options is wrong in my cluster? 2014-05-27 13:58 GMT+08:00 sam liu : > Hi Experts, > > The original local file has execution permission, and then it was > distributed to multiple nodemanager nodes with Distributed Cache feature of > Hadoop

File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-26 Thread sam liu
Hi Experts, The original local file has execution permission, and then it was distributed to multiple nodemanager nodes with Distributed Cache feature of Hadoop-2.2.0, but the distributed file has lost the execution permission. However I did not encounter such issue in Hadoop-1.1.1. Why this

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Azuryy
11:17 AM, Serge Blazhievsky >> wrote: >> How are you putting files in distributed cache ? >> >> Sent from my iPhone >> >>> On Mar 27, 2014, at 9:20 AM, Jonathan Poon wrote: >>> >>> >>> Hi Stanley, >>> >>> Sor

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Jonathan Poon
on in the 2.2.0 API? Jonathan On Thu, Mar 27, 2014 at 11:17 AM, Serge Blazhievsky wrote: > How are you putting files in distributed cache ? > > Sent from my iPhone > > On Mar 27, 2014, at 9:20 AM, Jonathan Poon wrote: > > > Hi Stanley, > > Sorry about the confusion

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Serge Blazhievsky
How are you putting files in distributed cache ? Sent from my iPhone > On Mar 27, 2014, at 9:20 AM, Jonathan Poon wrote: > > > Hi Stanley, > > Sorry about the confusion, but I'm trying to read a txt file into my Mapper > function. I am trying to copy the file us

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Jonathan Poon
r saying getLocalCacheFiles() is undefined. I've >> imported the hadoop-mapreduce-client-core-2.2.0.jar as part of my build >> environment in Eclipse. >> >> Any ideas on what could be incorrect? >> >> If I'm incorrectly using the distributed cache, could someone point me to >> an example using the distributed cache with Hadoop 2.2.0? >> >> Thanks for your help! >> >> Jonathan >> > >

Re: Hadoop 2.2.0 Distributed Cache

2014-03-26 Thread Stanley Shi
core-2.2.0.jar as part of my build > environment in Eclipse. > > Any ideas on what could be incorrect? > > If I'm incorrectly using the distributed cache, could someone point me to > an example using the distributed cache with Hadoop 2.2.0? > > Thanks for your help! > > Jonathan >

Hadoop 2.2.0 Distributed Cache

2014-03-26 Thread Jonathan Poon
t an error saying getLocalCacheFiles() is undefined. I've imported the hadoop-mapreduce-client-core-2.2.0.jar as part of my build environment in Eclipse. Any ideas on what could be incorrect? If I'm incorrectly using the distributed cache, could someone point me to an example using t

Writing Bytes Directly to Distributed Cache?

2014-03-17 Thread Jonathan Miller
Hello, I was wondering if anyone might know of a way to write bytes directly to the distributed cache. I know I can call job.addCacheFile(URI uri), but in my case the file I wish to add to the cache is in memory and is job specific. I would prefer not writing it to a location that I have to then

Re: New Distributed Cache

2014-01-09 Thread Vinod Kumar Vavilapalli
= context.getCacheFiles(); > > File f = new File(localPaths[0]); > > > > However, I get a NullPointerException when I do that in the Mapper code. > > > > Any suggesstions? > > > > Andrew > > > > From: Shahab Yunus [mailto:shahab.yu...

Re: New Distributed Cache

2014-01-09 Thread Bill Q
s = context.getCacheFiles(); > > File f = new File(localPaths[0]); > > > > However, I get a NullPointerException when I do that in the Mapper code. > > > > Any suggesstions? > > > > Andrew > > > > *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com

RE: Distributed cache in command line

2013-09-24 Thread Chandra Mohan, Ananda Vel Murugan
Hi, Thanks for the response. I can create symlinks for the files. But I don't know how to add jar to distributed cache. I found one way is by using libjars argument while running hadoop job. Is it possible to add a jar file directly to distributed cache? Is there any specific folder in

Re: Distributed cache in command line

2013-09-23 Thread Omkar Joshi
Hi, I have no idea about RHadoop but in general in YARN we do create symlinks for the files in distributed cache in the current working directory of every container. You may be able to use that somehow. Thanks, Omkar Joshi *Hortonworks Inc.* <http://www.hortonworks.com> On Mon, Sep 23, 2

Distributed cache in command line

2013-09-23 Thread Chandra Mohan, Ananda Vel Murugan
Hi, Is it possible to access distributed cache in command line? I have written a custom InputFormat implementation which I want to add to distributed cache. Using libjars is not an option for me as I am not running Hadoop job in command line. I am running it using RHadoop package in R which

Re: New Distributed Cache

2013-07-11 Thread Omkar Joshi
* > > File f = new File(localPaths[0]); > > ** ** > > However, I get a NullPointerException when I do that in the Mapper code.** > ** > > ** ** > > Any suggesstions? > > ** ** > > Andrew > > ** ** > > *From:* Shahab Yunus [mailto:shaha

RE: New Distributed Cache

2013-07-11 Thread Botelho, Andrew
r, I get a NullPointerException when I do that in the Mapper code. Any suggesstions? Andrew From: Shahab Yunus [mailto:shahab.yu...@gmail.com] Sent: Wednesday, July 10, 2013 9:43 PM To: user@hadoop.apache.org Subject: Re: New Distributed Cache Also, once you have the array of URIs after calling get

Re: New Distributed Cache

2013-07-10 Thread Shahab Yunus
you try JobContext.getCacheFiles() ? > > > Thanks, > Omkar Joshi > *Hortonworks Inc.* <http://www.hortonworks.com> > > > On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew > wrote: > >> Hi, >> >> ** ** >> >> I am trying to store a

Re: Distributed Cache

2013-07-10 Thread Omkar Joshi
lly, how do I read my cached file(s) after I call > JobContext.getCacheFiles()? > > ** ** > > Thanks, > > ** ** > > Andrew > > ** ** > > *From:* Omkar Joshi [mailto:ojo...@hortonworks.com] > *Sent:* Wednesday, July 10, 2013 5:15 PM > > *To:* user@ha

RE: Distributed Cache

2013-07-10 Thread Botelho, Andrew
Files()? Thanks, Andrew From: Omkar Joshi [mailto:ojo...@hortonworks.com] Sent: Wednesday, July 10, 2013 5:15 PM To: user@hadoop.apache.org Subject: Re: Distributed Cache try JobContext.getCacheFiles() Thanks, Omkar Joshi Hortonworks Inc.<http://www.hortonworks.com> On Wed, Jul 10, 2013 at 6:31

Re: Distributed Cache

2013-07-10 Thread Omkar Joshi
apper code? Is there > a method that will look for any files in the cache? > > ** ** > > Thanks, > > ** ** > > Andrew > > ** ** > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* Tuesday, July 09, 2013 6:08 PM > *To:* user@hadoop.apache.

Re: New Distributed Cache

2013-07-10 Thread Omkar Joshi
did you try JobContext.getCacheFiles() ? Thanks, Omkar Joshi *Hortonworks Inc.* <http://www.hortonworks.com> On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew wrote: > Hi, > > ** ** > > I am trying to store a file in the Distributed Cache during my Hadoop job. >

New Distributed Cache

2013-07-10 Thread Botelho, Andrew
Hi, I am trying to store a file in the Distributed Cache during my Hadoop job. In the driver class, I tell the job to store the file in the cache with this code: Job job = Job.getInstance(); job.addCacheFile(new URI("file name")); That all compiles fine. In the Mapper code, I try

RE: Distributed Cache

2013-07-10 Thread Botelho, Andrew
@hadoop.apache.org Subject: Re: Distributed Cache You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew mailto:andrew.bote...@emc.com>> wrote: Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).

Re: Distributed Cache

2013-07-09 Thread Azuryy Yu
pache.hadoop.mapreduce.lib.output.FileOutputFormat; >> >> ** ** >> >> Configuration conf = new Configuration(); >> >> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf); >> >> Job job = Job.getInstance(); >> >> … >> >> ** ** >> >> However, I keep getting warnings that the method addCacheFile() is >> deprecated. >> >> Is there a more current way to add files to the distributed cache? >> >> ** ** >> >> Thanks in advance, >> >> ** ** >> >> Andrew >> > >

Re: Distributed Cache

2013-07-09 Thread Ted Yu
is code to try and add a file to the > distributed cache: > > ** ** > > import java.net.URI; > > import org.apache.hadoop.conf.Configuration; > > import org.apache.hadoop.filecache.DistributedCache; > > import org.apache.hadoop.fs

Distributed Cache

2013-07-09 Thread Botelho, Andrew
Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: import java.net.URI; import org.apache.hadoop.conf.Configuration; import

Re: Benefits of Hadoop Distributed Cache

2013-05-08 Thread Harsh J
This has been discussed before, see http://search-hadoop.com/m/xI5AHMD0Vm1 for the previous discussion on this. On Wed, May 8, 2013 at 12:54 AM, Saeed Shahrivari wrote: > Would you please tell me why we should use Distributed Cache instead of > HDFS? > Because HDFS seems more stable,

Re: Benefits of Hadoop Distributed Cache

2013-05-07 Thread Michael Segel
Not sure what you mean... If you want to put up a small file to be used by each Task in your job (mapper or reducer)... you could put it up on HDFS. Or if you're launching your job from an edge node, you could read in the small file and put it in to the distributed cache. It really de

Benefits of Hadoop Distributed Cache

2013-05-07 Thread Saeed Shahrivari
Would you please tell me why we should use Distributed Cache instead of HDFS? Because HDFS seems more stable, easier to use, and less error-prone. Thanks in advance.

Re: Unicode issues with Distributed Cache

2013-05-04 Thread Shahab Yunus
Anil, what issue are you facing? You have mentioned 'Unicode issue' but what is exactly the issue? Regards, Shahab On Sat, May 4, 2013 at 2:28 PM, AnilKumar B wrote: > Hi, > > We are adding ISO-8859-1 content type file in Distributed Cache for look > up purpose in MR Jo

Unicode issues with Distributed Cache

2013-05-04 Thread AnilKumar B
Hi, We are adding ISO-8859-1 content type file in Distributed Cache for look up purpose in MR Job. But when we try to read the content from Distributed Cache file in MR, we are facing Unicode issues. Please find the sample code snippet below: @Override protected void setup

RE: Distributed cache: how big is too big?

2013-04-09 Thread John Meza
The Distributed Cache uses the shared file system (which ever is specified). The Distributed Cache can be loaded via the GenericOptionsParser / TooRunner parameters. Those parameters (-files, -archives, -libjars) are seen on the commandline and available in a MR driver class that implements the

Re: Distributed cache: how big is too big?

2013-04-09 Thread Bjorn Jonsson
I think the correct question is why would you use distributed cache for a large file that is read during map/reduce instead of plain hdfs? It does not sound wise to shuffle GB of data onto all nodes on each job submission and then just remove it when the job is done. I would think about picking

Re: Distributed cache: how big is too big?

2013-04-09 Thread Jay Vyas
Hmmm.. maybe im missing something.. but (@bjorn) Why would you use hdfs as a replacement for the distributed cache? After all - the distributed cache is just a file with replication over the whole cluster, which isn't in hdfs. Cant you Just make the cache size big and store the file

RE: Distributed cache: how big is too big?

2013-04-09 Thread John Meza
"a replication factor equal to the number of DN"Hmmm... I'm not sure I understand: there are 8 DN in mytest cluster. Date: Tue, 9 Apr 2013 04:49:17 -0700 Subject: Re: Distributed cache: how big is too big? From: bjorn...@gmail.com To: user@hadoop.apache.org Put it once

Re: Distributed cache: how big is too big?

2013-04-09 Thread Bjorn Jonsson
hing a Hadoop solution for an existing application that > requires a directory structure full of data for processing. > > To make the Hadoop solution work I need to deploy the data directory to > each DN when the job is executed. > I know this isn't new and commonly done with a

Distributed cache: how big is too big?

2013-04-08 Thread John Meza
buted Cache. Based on experience what are the common file sizes deployed in a Distributed Cache? I know smaller is better, but how big is too big? the larger cache deployed I have read there will be startup latency. I also assume there are other factors that play into this. I know that->

Re: Child JVM, Distributed Cache and Language Embedding

2013-02-13 Thread David Boyd
could not do the System.loadLibrary call recommended). You can also bundle the stuff into the JAR file in a subdir and that will be unpacked to the local working dir. The nice thing about using the distributed cache is the files only need to be pushed to the cluster once with a copyFromLocal and

Re: Child JVM, Distributed Cache and Language Embedding

2013-02-13 Thread Saptarshi Guha
age distribution to the nodes (the nodes are really > bare and installing the language on the node is not an option) using the > distributed cache (as a tar.gz. file). > > My understanding is that HadoopMapreduce will unarchive this tgz file and > then for every task attempt symlink it int

Re: TT nodes distributed cache failure

2013-01-25 Thread Hemanth Yamijala
rs loaded into the > Distributed cache, several (~4) nodes have their map jobs fails because > of ClassNotFoundException. All the other nodes proceed through the job > normally and the jobs completes. But this is wasting 20-25% of my TT nodes. > > Can anyone explain why some nodes mi

TT nodes distributed cache failure

2013-01-25 Thread Terry Healy
Running hadoop-0.20.2 on a 20 node cluster. When running a Map/Reduce job that uses several .jars loaded into the Distributed cache, several (~4) nodes have their map jobs fails because of ClassNotFoundException. All the other nodes proceed through the job normally and the jobs completes. But

Re: task jvm bootstrapping via distributed cache

2013-01-17 Thread Stan Rosenberg
gt; http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache >> > >> > That should give you what you want. >> > >> > hth, >> > Arun >> > >> > On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: >> >

Re: distributed cache

2012-12-28 Thread Lin Ma
allel in a local filesystem - not too much of a > >> performance hit for small reads (by virtue of OS caches, and quick > >> completion per read, as is usually the case for distributed cache > >> files), and gradually decreasing performance for long reads (due to > >> freq

Re: distributed cache

2012-12-26 Thread Harsh J
:48 PM, Harsh J wrote: >> >> Hi Lin, >> >> It is comparable (and is also logically similar) to reading a file >> multiple times in parallel in a local filesystem - not too much of a >> performance hit for small reads (by virtue of OS caches, and quick >>

Re: distributed cache

2012-12-26 Thread Lin Ma
e of OS caches, and quick > completion per read, as is usually the case for distributed cache > files), and gradually decreasing performance for long reads (due to > frequent disk physical movement)? Thankfully, due to block sizes the > latter isn't a problem for large files on a

Re: distributed cache

2012-12-26 Thread Harsh J
Hi Lin, It is comparable (and is also logically similar) to reading a file multiple times in parallel in a local filesystem - not too much of a performance hit for small reads (by virtue of OS caches, and quick completion per read, as is usually the case for distributed cache files), and

Re: distributed cache

2012-12-26 Thread Lin Ma
cas > >> spread across racks such that on task bootup the downloads happen with > >> rack locality. > >> > >> On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote: > >> > Hi Kai, > >> > > >> > Smart answer! :-) > >> > > &

Re: distributed cache

2012-12-26 Thread Harsh J
- you would want more replicas >> spread across racks such that on task bootup the downloads happen with >> rack locality. >> >> On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote: >> > Hi Kai, >> > >> > Smart answer! :-) >> > >> > The assum

Re: distributed cache

2012-12-26 Thread Lin Ma
licas > spread across racks such that on task bootup the downloads happen with > rack locality. > > On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote: > > Hi Kai, > > > > Smart answer! :-) > > > > The assumption you have is one distributed cache replica coul

Re: distributed cache

2012-12-26 Thread Harsh J
so tied to the concept of racks in a cluster - you would want more replicas spread across racks such that on task bootup the downloads happen with rack locality. On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma wrote: > Hi Kai, > > Smart answer! :-) > > The assumption you have is one distri

Re: distributed cache

2012-12-25 Thread Lin Ma
I have figured out the 2nd issue, appreciate if anyone could advise on the first issue. regards, Lin On Sat, Dec 22, 2012 at 9:24 PM, Lin Ma wrote: > Hi Kai, > > Smart answer! :-) > >- The assumption you have is one distributed cache replica could only >serve one do

Re: distributed cache

2012-12-22 Thread Lin Ma
Hi Kai, Smart answer! :-) - The assumption you have is one distributed cache replica could only serve one download session for tasktracker node (this is why you get concurrency n/r). The question is, why one distributed cache replica cannot serve multiple concurrent download session

Re: distributed cache

2012-12-22 Thread Kai Voigt
Hi, simple math. Assuming you have n TaskTrackers in your cluster that will need to access the files in the distributed cache. And r is the replication level of those files. Copying the files into HDFS requires r copy operations over the network. The n TaskTrackers need to get their local

Re: distributed cache

2012-12-22 Thread Lin Ma
Thanks Kai, using higher replication count for the purpose of? regards, Lin On Sat, Dec 22, 2012 at 8:44 PM, Kai Voigt wrote: > Hi, > > Am 22.12.2012 um 13:03 schrieb Lin Ma : > > > I want to confirm when on each task node either mapper or reducer access > distributed cach

Re: distributed cache

2012-12-22 Thread Kai Voigt
Hi, Am 22.12.2012 um 13:03 schrieb Lin Ma : > I want to confirm when on each task node either mapper or reducer access > distributed cache file, it resides on disk, not resides in memory. Just want > to make sure distributed cache file does not fully loaded into memory which > co

distributed cache

2012-12-22 Thread Lin Ma
Hi guys, I want to confirm when on each task node either mapper or reducer access distributed cache file, it resides on disk, not resides in memory. Just want to make sure distributed cache file does not fully loaded into memory which compete memory consumption with mapper/reducer tasks. Is that

Re: Problem using distributed cache

2012-12-11 Thread surfer
On 12/07/2012 03:49 PM, surfer wrote: > Hello Peter > In my, humble, experience I never get hadoop 1.0.3 to work with > distributed cache and the new api (mapreduce). with the old api it works. > giovanni > > P.S. I already tried the approaches suggested by both Dhaval and H

Re: Problem using distributed cache

2012-12-07 Thread Peter Cogan
tCache1"), > >> conf); > >> > >> > >> > >> > >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J wrote: > >>> > >>> What is your conf object there? Is it job.getConfiguration() or an > >>> independent i

Re: Problem using distributed cache

2012-12-07 Thread bejoy . hadoop
- From: Peter Cogan Date: Fri, 7 Dec 2012 14:06:41 To: Reply-To: user@hadoop.apache.org Subject: Re: Problem using distributed cache Hi, any thoughts on this would be much appreciated thanks Peter On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan wrote: > Hi, > > It's an instance

Re: Problem using distributed cache

2012-12-07 Thread surfer
Hello Peter In my, humble, experience I never get hadoop 1.0.3 to work with distributed cache and the new api (mapreduce). with the old api it works. giovanni P.S. I already tried the approaches suggested by both Dhaval and Harsh J On 12/06/2012 05:59 PM, Peter Cogan wrote: > > Hi , >

Re: Problem using distributed cache

2012-12-07 Thread Harsh J
;> What is your conf object there? Is it job.getConfiguration() or an >>> independent instance? >>> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan >>> wrote: >>> > Hi , >>> > >>> > I want to use the distributed cac

Re: Problem using distributed cache

2012-12-07 Thread Dhaval Shah
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works   Regards, Dhaval From: Peter Cogan To: user@hadoop.apache.org Sent: Friday, 7 December 2012 9:06 AM Subject: Re: Problem using

Re: Problem using distributed cache

2012-12-07 Thread Peter Cogan
gt; What is your conf object there? Is it job.getConfiguration() or an >> independent instance? >> >> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan >> wrote: >> > Hi , >> > >> > I want to use the distributed cache to allow my mappers to acce

Re: Problem using distributed cache

2012-12-06 Thread Peter Cogan
e/testCache1"), conf); On Thu, Dec 6, 2012 at 5:02 PM, Harsh J wrote: > What is your conf object there? Is it job.getConfiguration() or an > independent instance? > > On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan > wrote: > > Hi , > > > > I want to use the

Re: Problem using distributed cache

2012-12-06 Thread Harsh J
What is your conf object there? Is it job.getConfiguration() or an independent instance? On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan wrote: > Hi , > > I want to use the distributed cache to allow my mappers to access data. In > main, I'm using the command > > Distribute

Re: distributed cache

2012-11-16 Thread Yanbo Liang
hen I use the distributed cache , I found that when the file is more than > 100MB or the number of records are more than 10 million > , > > the file can not be cache in the memory; and I try to set the io.sort.mb is > 200MB ; > it still can not work, Any suggestion would be fine!

distributed cache

2012-11-16 Thread yingnan.ma
when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you ! 2012-11-16

Re: Unable to Add Jars to distributed cache

2012-10-31 Thread Steve Loughran
On 31 October 2012 12:13, Saurabh Mishra wrote: > Hi, > I tried adding jars to distributed cache through following code : > DistributedCache.addArchiveToClassPath(path, jobConf); > DistributedCache.addCacheArchive(path.toUri(), jobConf); > > It w

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-15 Thread Mark Olimpiati
I'll try that thanks for the suggestion Steve! Mark On Fri, Oct 12, 2012 at 11:27 AM, Steve Loughran wrote: > > > On 11 October 2012 20:53, Mark Olimpiati wrote: > >> Thanks for the reply Harsh, but as I said I tried locally too by using >> the following: >> >> FileSystem localFs = cachedFi

Re: Distributed Cache For 100MB+ Data Structure

2012-10-13 Thread Michael Segel
before job submission time, then create a serialized >> representation (perhaps just Java object serialization), and send the >> serialized form through distributed cache? Then, each reducer would just >> need to deserialize during setup() instead of recomputing the full radix >

Re: Distributed Cache For 100MB+ Data Structure

2012-10-13 Thread Kyle Moses
g the setup time of the radix tree, is it possible to precompute the radix tree before job submission time, then create a serialized representation (perhaps just Java object serialization), and send the serialized form through distributed cache? Then, each reducer would just need to deserialize d

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-12 Thread Steve Loughran
On 11 October 2012 20:53, Mark Olimpiati wrote: > Thanks for the reply Harsh, but as I said I tried locally too by using the > following: > > FileSystem localFs = cachedFiles[0].getFileSystem(new > Configuration()); > > > Isn't the above supposed to give me the local file system ?? If yes, I

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-11 Thread Mark Olimpiati
> On Thu, Oct 11, 2012 at 5:15 AM, Mark Olimpiati > wrote: > > Hi, > > > > I'm storing sequence files in the distributed cache which seems to be > > stored somewher under each node's /tmp .../local/archive/ ... path. > > > > In mapper code,

Re: Distributed Cache For 100MB+ Data Structure

2012-10-11 Thread Chris Nauroth
Hello Kyle, Regarding the setup time of the radix tree, is it possible to precompute the radix tree before job submission time, then create a serialized representation (perhaps just Java object serialization), and send the serialized form through distributed cache? Then, each reducer would just

Distributed Cache For 100MB+ Data Structure

2012-10-11 Thread Kyle Moses
Problem Background: I have a Hadoop MapReduce program that uses a IPv6 radix tree to provide auxiliary input during the reduce phase of the second job in it's workflow, but doesn't need the data at any other point. It seems pretty straight forward to use the distributed cache to b

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-10 Thread Harsh J
e files in the distributed cache which seems to be > stored somewher under each node's /tmp .../local/archive/ ... path. > > In mapper code, I tried using SequenceFile.Reader with all possible > configurations (locally, distribtued) however, it can't find it. Are > sequence

Reading Sequence File from Hadoop Distributed Cache ..

2012-10-10 Thread Mark Olimpiati
Hi, I'm storing sequence files in the distributed cache which seems to be stored somewher under each node's /tmp .../local/archive/ ... path. In mapper code, I tried using SequenceFile.Reader with all possible configurations (locally, distribtued) however, it can't find it. Are

Re: Job jar not removed from staging directory on job failure/how to share a job jar using distributed cache

2012-10-06 Thread Harsh J
repository. > > The same question was asked in the jira but without clear resolution. > https://issues.apache.org/jira/browse/MAPREDUCE-236 > > My question might be related to > https://issues.apache.org/jira/browse/MAPREDUCE-4408 > which is resolved for next version. But i

Job jar not removed from staging directory on job failure/how to share a job jar using distributed cache

2012-10-05 Thread Bertrand Dechoux
browse/MAPREDUCE-236 My question might be related to https://issues.apache.org/jira/browse/MAPREDUCE-4408 which is resolved for next version. But it seems to be only about uberjar and I am using a standard jar. If it works with a hdfs location, what are the details? Won't it be cleaned duri