CVE-2018-8009: Apache Hadoop distributed cache archive vulnerability

2018-11-21 Thread Akira Ajisaka
CVE-2018-8009: Apache Hadoop distributed cache archive vulnerability Severity: Severe Vendor: The Apache Software Foundation Versions Affected: Hadoop 0.23.0 to 0.23.11 Hadoop 2.0.0-alpha to 2.7.6 Hadoop 2.8.0 to 2.8.4 Hadoop 2.9.0 to 2.9.1 Hadoop 3.0.0-alpha to 3.0.2 Hadoop 3.1.0

Re: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-21 Thread Gabriel Balan
Hi My code creates a new job named "job 1" which writes something to distributed cache (say a text file) and the job gets completed. Just to manage expectations, you add files to the distributed cache_in the job driver_, and the framework makes them available to maps and reducers.

RE: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-20 Thread Guttadauro, Jeff
Hi, Siddharth. Not sure I fully understand your problem. I think you are saying that you would like to run an initial M/R job to create some data that n jobs after that will be able to use, and you are saying you’d like to use the distributed cache for that. I think you may not need

Re: How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

2016-06-13 Thread Siddharth Dawar
to currently running job only). On Tue, Jun 7, 2016 at 6:36 PM, Arun Natva <arun.na...@gmail.com> wrote: > If you use the Instance of Job class, you can add files to distributed > cache like this: > Job job = Job.getInstanceOf(conf); > job.addCacheFiles(filepath); > > > Sen

Re: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-13 Thread Siddharth Dawar
Hi Jeff, Thanks for your prompt reply. Actually my problem is as follows: My code creates a new job named "job 1" which writes something to distributed cache (say a text file) and the job gets completed. Now, I want to create some n number of jobs in while loop below, which reads the

RE: Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-07 Thread Guttadauro, Jeff
Hi, Siddharth. I was also a bit frustrated at what I found to be scant documentation on how to use the distributed cache in Hadoop 2. The DistributedCache class itself was deprecated in Hadoop 2, but there don’t appear to be very clear instructions on the alternative. I think it’s actually

Re: How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

2016-06-07 Thread Arun Natva
If you use the Instance of Job class, you can add files to distributed cache like this: Job job = Job.getInstanceOf(conf); job.addCacheFiles(filepath); Sent from my iPhone > On Jun 7, 2016, at 5:17 AM, Siddharth Dawar <siddharthdawa...@gmail.com> > wrote: > > Hi, > >

How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2

2016-06-07 Thread Siddharth Dawar
Int(args[3]));conf2.setNumReduceTasks(Integer.parseInt(args[4]));FileInputFormat.addInputPath(conf2, new Path(input));FileOutputFormat.setOutputPath(conf2, new Path(output)); } RunningJob job = JobClient.runJob(conf2); } Now, I want the first Job which gets created to write something in the distrib

Accessing files in Hadoop 2.7.2 Distributed Cache

2016-06-07 Thread Siddharth Dawar
Hi, I want to use the distributed cache to allow my mappers to access data in Hadoop 2.7.2. In main, I'm using the command String hdfs_path="hdfs://localhost:9000/bloomfilter";InputStream in = new BufferedInputStream(new FileInputStream("/home/siddharth/Desktop/data/bloom_filter&q

Files in distributed cache

2015-05-27 Thread Marko Dinic
Hello, I'm new to Hadoop and a bit used by one thing about distributed cache - when do files added to distributed cache get deleted? I'm concretely interested in Hadoop 0.20.2. I read the following from Hadoop the definitive guide Files are deleted to make room for a new file when the cache

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
On Mon, May 11, 2015 at 5:25 PM, marko.di...@nissatech.com mailto:marko.di...@nissatech.com wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Shahab Yunus
about it to understand how it works. Thanks, Marko On 05/11/2015 11:25 PM, marko.di...@nissatech.com wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
...@nissatech.com mailto:marko.di...@nissatech.com wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now in pseudo-distributed and distributed I do

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Marko Dinic
wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now in pseudo-distributed and distributed I do. I'm adding file to distributed cache like

Reading a sequence file from distributed cache

2015-05-11 Thread marko.dinic
Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in standalone mode, but now in pseudo- distributed and distributed I do. I'm adding file to distributed cache like this And reading from

Re: Reading a sequence file from distributed cache

2015-05-11 Thread Shahab Yunus
What version are you using? Have you seen this? Regards, Shahab On Mon, May 11, 2015 at 5:25 PM, marko.di...@nissatech.com wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran

Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-30 Thread sam liu
. *From:* sam liu [mailto:samliuhad...@gmail.com] *Sent:* Wednesday, May 28, 2014 7:40 AM *To:* user@hadoop.apache.org *Subject:* Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0 Is this possible a Hadoop issue? Or any options is wrong in my cluster? 2014-05-27 13:58 GMT

distributed cache in reducer

2014-05-30 Thread Brian Jeltema
running Hadoop 2.2, my job places files in the distributed cache. in my mapper setup, I call context.getCacheFiles() and get back a URI[] with contents that make sense. in my reducer setup, I call context.getCacheFiles() and get back null. Is this expected behavior? If so, how do I get

Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-30 Thread sam liu
@hadoop.apache.org *Subject:* Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0 Is this possible a Hadoop issue? Or any options is wrong in my cluster? 2014-05-27 13:58 GMT+08:00 sam liu samliuhad...@gmail.com: Hi Experts, The original local file has execution permission

RE: File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-28 Thread Sebastian Gäde
, 2014 7:40 AM To: user@hadoop.apache.org Subject: Re: File Permission Issue using Distributed Cache of Hadoop-2.2.0 Is this possible a Hadoop issue? Or any options is wrong in my cluster? 2014-05-27 13:58 GMT+08:00 sam liu samliuhad...@gmail.com: Hi Experts, The original local file has

File Permission Issue using Distributed Cache of Hadoop-2.2.0

2014-05-26 Thread sam liu
Hi Experts, The original local file has execution permission, and then it was distributed to multiple nodemanager nodes with Distributed Cache feature of Hadoop-2.2.0, but the distributed file has lost the execution permission. However I did not encounter such issue in Hadoop-1.1.1. Why

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Jonathan Poon
as part of my build environment in Eclipse. Any ideas on what could be incorrect? If I'm incorrectly using the distributed cache, could someone point me to an example using the distributed cache with Hadoop 2.2.0? Thanks for your help! Jonathan

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Jonathan Poon
in the 2.2.0 API? Jonathan On Thu, Mar 27, 2014 at 11:17 AM, Serge Blazhievsky hadoop...@gmail.comwrote: How are you putting files in distributed cache ? Sent from my iPhone On Mar 27, 2014, at 9:20 AM, Jonathan Poon jkp...@ucdavis.edu wrote: Hi Stanley, Sorry about the confusion, but I'm

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Azuryy
are you putting files in distributed cache ? Sent from my iPhone On Mar 27, 2014, at 9:20 AM, Jonathan Poon jkp...@ucdavis.edu wrote: Hi Stanley, Sorry about the confusion, but I'm trying to read a txt file into my Mapper function. I am trying to copy the file using the -files option

Hadoop 2.2.0 Distributed Cache

2014-03-26 Thread Jonathan Poon
saying getLocalCacheFiles() is undefined. I've imported the hadoop-mapreduce-client-core-2.2.0.jar as part of my build environment in Eclipse. Any ideas on what could be incorrect? If I'm incorrectly using the distributed cache, could someone point me to an example using the distributed cache

Re: Hadoop 2.2.0 Distributed Cache

2014-03-26 Thread Stanley Shi
environment in Eclipse. Any ideas on what could be incorrect? If I'm incorrectly using the distributed cache, could someone point me to an example using the distributed cache with Hadoop 2.2.0? Thanks for your help! Jonathan

Writing Bytes Directly to Distributed Cache?

2014-03-17 Thread Jonathan Miller
Hello, I was wondering if anyone might know of a way to write bytes directly to the distributed cache. I know I can call job.addCacheFile(URI uri), but in my case the file I wish to add to the cache is in memory and is job specific. I would prefer not writing it to a location that I have

Re: New Distributed Cache

2014-01-09 Thread Bill Q
] *Sent:* Wednesday, July 10, 2013 9:43 PM *To:* user@hadoop.apache.org *Subject:* Re: New Distributed Cache Also, once you have the array of URIs after calling getCacheFiles you can iterate over them using File class or Path ( http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs

RE: Distributed cache in command line

2013-09-24 Thread Chandra Mohan, Ananda Vel Murugan
Hi, Thanks for the response. I can create symlinks for the files. But I don't know how to add jar to distributed cache. I found one way is by using libjars argument while running hadoop job. Is it possible to add a jar file directly to distributed cache? Is there any specific folder in HDFS

Distributed cache in command line

2013-09-23 Thread Chandra Mohan, Ananda Vel Murugan
Hi, Is it possible to access distributed cache in command line? I have written a custom InputFormat implementation which I want to add to distributed cache. Using libjars is not an option for me as I am not running Hadoop job in command line. I am running it using RHadoop package in R which

Re: Distributed cache in command line

2013-09-23 Thread Omkar Joshi
Hi, I have no idea about RHadoop but in general in YARN we do create symlinks for the files in distributed cache in the current working directory of every container. You may be able to use that somehow. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Mon, Sep 23, 2013 at 6

RE: New Distributed Cache

2013-07-11 Thread Botelho, Andrew
a NullPointerException when I do that in the Mapper code. Any suggesstions? Andrew From: Shahab Yunus [mailto:shahab.yu...@gmail.com] Sent: Wednesday, July 10, 2013 9:43 PM To: user@hadoop.apache.org Subject: Re: New Distributed Cache Also, once you have the array of URIs after calling getCacheFiles

Re: New Distributed Cache

2013-07-11 Thread Omkar Joshi
a NullPointerException when I do that in the Mapper code.** ** ** ** Any suggesstions? ** ** Andrew ** ** *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com] *Sent:* Wednesday, July 10, 2013 9:43 PM *To:* user@hadoop.apache.org *Subject:* Re: New Distributed Cache ** ** Also

RE: Distributed Cache

2013-07-10 Thread Botelho, Andrew
@hadoop.apache.org Subject: Re: Distributed Cache You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.commailto:andrew.bote...@emc.com wrote: Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version

New Distributed Cache

2013-07-10 Thread Botelho, Andrew
Hi, I am trying to store a file in the Distributed Cache during my Hadoop job. In the driver class, I tell the job to store the file in the cache with this code: Job job = Job.getInstance(); job.addCacheFile(new URI(file name)); That all compiles fine. In the Mapper code, I try accessing

Re: New Distributed Cache

2013-07-10 Thread Omkar Joshi
did you try JobContext.getCacheFiles() ? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew andrew.bote...@emc.comwrote: Hi, ** ** I am trying to store a file in the Distributed Cache during my Hadoop job

Re: Distributed Cache

2013-07-10 Thread Omkar Joshi
Mapper code? Is there a method that will look for any files in the cache? ** ** Thanks, ** ** Andrew ** ** *From:* Ted Yu [mailto:yuzhih...@gmail.com] *Sent:* Tuesday, July 09, 2013 6:08 PM *To:* user@hadoop.apache.org *Subject:* Re: Distributed Cache ** ** You

RE: Distributed Cache

2013-07-10 Thread Botelho, Andrew
()? Thanks, Andrew From: Omkar Joshi [mailto:ojo...@hortonworks.com] Sent: Wednesday, July 10, 2013 5:15 PM To: user@hadoop.apache.org Subject: Re: Distributed Cache try JobContext.getCacheFiles() Thanks, Omkar Joshi Hortonworks Inc.http://www.hortonworks.com On Wed, Jul 10, 2013 at 6:31 AM, Botelho

Re: Distributed Cache

2013-07-10 Thread Omkar Joshi
JobContext.getCacheFiles()? ** ** Thanks, ** ** Andrew ** ** *From:* Omkar Joshi [mailto:ojo...@hortonworks.com] *Sent:* Wednesday, July 10, 2013 5:15 PM *To:* user@hadoop.apache.org *Subject:* Re: Distributed Cache ** ** try JobContext.getCacheFiles() Thanks

Re: New Distributed Cache

2013-07-10 Thread Shahab Yunus
...@hortonworks.com wrote: did you try JobContext.getCacheFiles() ? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Wed, Jul 10, 2013 at 10:15 AM, Botelho, Andrew andrew.bote...@emc.comwrote: Hi, ** ** I am trying to store a file in the Distributed Cache during my Hadoop

Distributed Cache

2013-07-09 Thread Botelho, Andrew
Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: import java.net.URI; import org.apache.hadoop.conf.Configuration; import

Re: Distributed Cache

2013-07-09 Thread Ted Yu
to try and add a file to the distributed cache: ** ** import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import

Re: Distributed Cache

2013-07-09 Thread Azuryy Yu
to the distributed cache: ** ** import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import

Re: Benefits of Hadoop Distributed Cache

2013-05-08 Thread Harsh J
This has been discussed before, see http://search-hadoop.com/m/xI5AHMD0Vm1 for the previous discussion on this. On Wed, May 8, 2013 at 12:54 AM, Saeed Shahrivari saeed.shahriv...@gmail.com wrote: Would you please tell me why we should use Distributed Cache instead of HDFS? Because HDFS seems

Benefits of Hadoop Distributed Cache

2013-05-07 Thread Saeed Shahrivari
Would you please tell me why we should use Distributed Cache instead of HDFS? Because HDFS seems more stable, easier to use, and less error-prone. Thanks in advance.

Re: Benefits of Hadoop Distributed Cache

2013-05-07 Thread Michael Segel
Not sure what you mean... If you want to put up a small file to be used by each Task in your job (mapper or reducer)... you could put it up on HDFS. Or if you're launching your job from an edge node, you could read in the small file and put it in to the distributed cache. It really depends

Re: Distributed cache: how big is too big?

2013-04-09 Thread Bjorn Jonsson
researching a Hadoop solution for an existing application that requires a directory structure full of data for processing. To make the Hadoop solution work I need to deploy the data directory to each DN when the job is executed. I know this isn't new and commonly done with a Distributed Cache. *Based

RE: Distributed cache: how big is too big?

2013-04-09 Thread John Meza
a replication factor equal to the number of DNHmmm... I'm not sure I understand: there are 8 DN in mytest cluster. Date: Tue, 9 Apr 2013 04:49:17 -0700 Subject: Re: Distributed cache: how big is too big? From: bjorn...@gmail.com To: user@hadoop.apache.org Put it once on hdfs with a replication

Re: Distributed cache: how big is too big?

2013-04-09 Thread Jay Vyas
Hmmm.. maybe im missing something.. but (@bjorn) Why would you use hdfs as a replacement for the distributed cache? After all - the distributed cache is just a file with replication over the whole cluster, which isn't in hdfs. Cant you Just make the cache size big and store the file

Re: Distributed cache: how big is too big?

2013-04-09 Thread Bjorn Jonsson
I think the correct question is why would you use distributed cache for a large file that is read during map/reduce instead of plain hdfs? It does not sound wise to shuffle GB of data onto all nodes on each job submission and then just remove it when the job is done. I would think about picking

RE: Distributed cache: how big is too big?

2013-04-09 Thread John Meza
The Distributed Cache uses the shared file system (which ever is specified). The Distributed Cache can be loaded via the GenericOptionsParser / TooRunner parameters. Those parameters (-files, -archives, -libjars) are seen on the commandline and available in a MR driver class that implements

Distributed cache: how big is too big?

2013-04-08 Thread John Meza
Cache. Based on experience what are the common file sizes deployed in a Distributed Cache? I know smaller is better, but how big is too big? the larger cache deployed I have read there will be startup latency. I also assume there are other factors that play into this. I know that-Default

Re: Child JVM, Distributed Cache and Language Embedding

2013-02-13 Thread Saptarshi Guha
(the nodes are really bare and installing the language on the node is not an option) using the distributed cache (as a tar.gz. file). My understanding is that HadoopMapreduce will unarchive this tgz file and then for every task attempt symlink it into the task attempt's working folder. However

Re: Child JVM, Distributed Cache and Language Embedding

2013-02-13 Thread David Boyd
to and could not do the System.loadLibrary call recommended). You can also bundle the stuff into the JAR file in a subdir and that will be unpacked to the local working dir. The nice thing about using the distributed cache is the files only need to be pushed to the cluster once with a copyFromLocal

TT nodes distributed cache failure

2013-01-25 Thread Terry Healy
Running hadoop-0.20.2 on a 20 node cluster. When running a Map/Reduce job that uses several .jars loaded into the Distributed cache, several (~4) nodes have their map jobs fails because of ClassNotFoundException. All the other nodes proceed through the job normally and the jobs completes

Re: TT nodes distributed cache failure

2013-01-25 Thread Hemanth Yamijala
into the Distributed cache, several (~4) nodes have their map jobs fails because of ClassNotFoundException. All the other nodes proceed through the job normally and the jobs completes. But this is wasting 20-25% of my TT nodes. Can anyone explain why some nodes might fail to read all the .jars from

Re: task jvm bootstrapping via distributed cache

2013-01-17 Thread Stan Rosenberg
Hi, As I suspected, cache files are symlinked after a child JVM is started: TaskRunner.setupWorkDir is being called from org.apache.hadoop.mapred.Child.main. This is unfortunate as it makes impossible to leverage distributed cache for the purpose of deploying JVM agents. I could submit a jira

Re: task jvm bootstrapping via distributed cache

2013-01-17 Thread Stan Rosenberg
: http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache That should give you what you want. hth, Arun On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: Hi, I am seeking a way to leverage hadoop's distributed cache in order to ship jars

Re: distributed cache

2012-12-28 Thread Lin Ma
with rack locality. On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma lin...@gmail.com wrote: Hi Kai, Smart answer! :-) The assumption you have is one distributed cache replica could only serve one download session for tasktracker node (this is why you get

Re: distributed cache

2012-12-26 Thread Harsh J
to the concept of racks in a cluster - you would want more replicas spread across racks such that on task bootup the downloads happen with rack locality. On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma lin...@gmail.com wrote: Hi Kai, Smart answer! :-) The assumption you have is one distributed cache

Re: distributed cache

2012-12-26 Thread Harsh J
. On Sat, Dec 22, 2012 at 6:54 PM, Lin Ma lin...@gmail.com wrote: Hi Kai, Smart answer! :-) The assumption you have is one distributed cache replica could only serve one download session for tasktracker node (this is why you get concurrency n/r). The question is, why one distributed

Re: distributed cache

2012-12-26 Thread Harsh J
Hi Lin, It is comparable (and is also logically similar) to reading a file multiple times in parallel in a local filesystem - not too much of a performance hit for small reads (by virtue of OS caches, and quick completion per read, as is usually the case for distributed cache files

Re: distributed cache

2012-12-26 Thread Lin Ma
, and quick completion per read, as is usually the case for distributed cache files), and gradually decreasing performance for long reads (due to frequent disk physical movement)? Thankfully, due to block sizes the latter isn't a problem for large files on a proper DN, as the blocks are spread

Re: distributed cache

2012-12-26 Thread Harsh J
Lin, It is comparable (and is also logically similar) to reading a file multiple times in parallel in a local filesystem - not too much of a performance hit for small reads (by virtue of OS caches, and quick completion per read, as is usually the case for distributed cache files), and gradually

Re: distributed cache

2012-12-25 Thread Lin Ma
I have figured out the 2nd issue, appreciate if anyone could advise on the first issue. regards, Lin On Sat, Dec 22, 2012 at 9:24 PM, Lin Ma lin...@gmail.com wrote: Hi Kai, Smart answer! :-) - The assumption you have is one distributed cache replica could only serve one download

distributed cache

2012-12-22 Thread Lin Ma
Hi guys, I want to confirm when on each task node either mapper or reducer access distributed cache file, it resides on disk, not resides in memory. Just want to make sure distributed cache file does not fully loaded into memory which compete memory consumption with mapper/reducer tasks

Re: distributed cache

2012-12-22 Thread Kai Voigt
Hi, Am 22.12.2012 um 13:03 schrieb Lin Ma lin...@gmail.com: I want to confirm when on each task node either mapper or reducer access distributed cache file, it resides on disk, not resides in memory. Just want to make sure distributed cache file does not fully loaded into memory which

Re: distributed cache

2012-12-22 Thread Lin Ma
Hi Kai, Smart answer! :-) - The assumption you have is one distributed cache replica could only serve one download session for tasktracker node (this is why you get concurrency n/r). The question is, why one distributed cache replica cannot serve multiple concurrent download session

Re: Problem using distributed cache

2012-12-11 Thread surfer
On 12/07/2012 03:49 PM, surfer wrote: Hello Peter In my, humble, experience I never get hadoop 1.0.3 to work with distributed cache and the new api (mapreduce). with the old api it works. giovanni P.S. I already tried the approaches suggested by both Dhaval and Harsh J I'm writing

Re: Problem using distributed cache

2012-12-07 Thread Peter Cogan
instance? On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan peter.co...@gmail.com wrote: Hi , I want to use the distributed cache to allow my mappers to access data. In main, I'm using the command DistributedCache.addCacheFile(new URI(/user/peter/cacheFile/testCache1), conf); Where

Re: Problem using distributed cache

2012-12-07 Thread Dhaval Shah
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works   Regards, Dhaval From: Peter Cogan peter.co...@gmail.com To: user@hadoop.apache.org Sent: Friday, 7 December 2012 9:06 AM Subject

Re: Problem using distributed cache

2012-12-07 Thread Harsh J
the distributed cache to allow my mappers to access data. In main, I'm using the command DistributedCache.addCacheFile(new URI(/user/peter/cacheFile/testCache1), conf); Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs Then, my setup function looks like

Re: Problem using distributed cache

2012-12-07 Thread bejoy . hadoop
- From: Peter Cogan peter.co...@gmail.com Date: Fri, 7 Dec 2012 14:06:41 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: Re: Problem using distributed cache Hi, any thoughts on this would be much appreciated thanks Peter On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan peter.co

Re: Problem using distributed cache

2012-12-07 Thread Peter Cogan
wrote: Hi , I want to use the distributed cache to allow my mappers to access data. In main, I'm using the command DistributedCache.addCacheFile(new URI(/user/peter/cacheFile/testCache1), conf); Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs

Re: Problem using distributed cache

2012-12-06 Thread Harsh J
What is your conf object there? Is it job.getConfiguration() or an independent instance? On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan peter.co...@gmail.com wrote: Hi , I want to use the distributed cache to allow my mappers to access data. In main, I'm using the command

Re: distributed cache

2012-11-16 Thread Yanbo Liang
...@ipinyou.com ** when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you

Re: Unable to Add Jars to distributed cache

2012-10-31 Thread Steve Loughran
On 31 October 2012 12:13, Saurabh Mishra saurabhmishra.i...@outlook.comwrote: Hi, I tried adding jars to distributed cache through following code : DistributedCache.addArchiveToClassPath(path, jobConf); DistributedCache.addCacheArchive(path.toUri(), jobConf

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-15 Thread Mark Olimpiati
I'll try that thanks for the suggestion Steve! Mark On Fri, Oct 12, 2012 at 11:27 AM, Steve Loughran ste...@hortonworks.comwrote: On 11 October 2012 20:53, Mark Olimpiati markq2...@gmail.com wrote: Thanks for the reply Harsh, but as I said I tried locally too by using the following:

Re: Distributed Cache For 100MB+ Data Structure

2012-10-13 Thread Michael Segel
representation (perhaps just Java object serialization), and send the serialized form through distributed cache? Then, each reducer would just need to deserialize during setup() instead of recomputing the full radix tree for every reducer task. That might save time. Regarding the memory

Distributed Cache For 100MB+ Data Structure

2012-10-11 Thread Kyle Moses
Problem Background: I have a Hadoop MapReduce program that uses a IPv6 radix tree to provide auxiliary input during the reduce phase of the second job in it's workflow, but doesn't need the data at any other point. It seems pretty straight forward to use the distributed cache to build

Re: Distributed Cache For 100MB+ Data Structure

2012-10-11 Thread Chris Nauroth
Hello Kyle, Regarding the setup time of the radix tree, is it possible to precompute the radix tree before job submission time, then create a serialized representation (perhaps just Java object serialization), and send the serialized form through distributed cache? Then, each reducer would just

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-11 Thread Mark Olimpiati
, 2012 at 5:15 AM, Mark Olimpiati markq2...@gmail.com wrote: Hi, I'm storing sequence files in the distributed cache which seems to be stored somewher under each node's /tmp .../local/archive/ ... path. In mapper code, I tried using SequenceFile.Reader with all possible

Reading Sequence File from Hadoop Distributed Cache ..

2012-10-10 Thread Mark Olimpiati
Hi, I'm storing sequence files in the distributed cache which seems to be stored somewher under each node's /tmp .../local/archive/ ... path. In mapper code, I tried using SequenceFile.Reader with all possible configurations (locally, distribtued) however, it can't find it. Are sequence files

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-10 Thread Harsh J
files in the distributed cache which seems to be stored somewher under each node's /tmp .../local/archive/ ... path. In mapper code, I tried using SequenceFile.Reader with all possible configurations (locally, distribtued) however, it can't find it. Are sequence files supported

Re: Job jar not removed from staging directory on job failure/how to share a job jar using distributed cache

2012-10-06 Thread Harsh J
for next version. But it seems to be only about uberjar and I am using a standard jar. If it works with a hdfs location, what are the details? Won't it be cleaned during job termination? Why not? Will it also be setup within the distributed cache? Regards Bertrand PS : I know there are others

Job jar not removed from staging directory on job failure/how to share a job jar using distributed cache

2012-10-05 Thread Bertrand Dechoux
is resolved for next version. But it seems to be only about uberjar and I am using a standard jar. If it works with a hdfs location, what are the details? Won't it be cleaned during job termination? Why not? Will it also be setup within the distributed cache? Regards Bertrand PS : I know

Add file to distributed cache

2012-10-01 Thread Abhishek
Hi all How do you add a small file to distributed cache in MR program Regards Abhi Sent from my iPhone

Re: Add file to distributed cache

2012-10-01 Thread Bejoy KS
Hi Abshiek You can find a simple example of using Distributed Cache here http://kickstarthadoop.blogspot.co.uk/2011/05/hadoop-for-dependent-data-splits-using.html --Original Message-- From: Abhishek To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Add file

Re: task jvm bootstrapping via distributed cache

2012-08-04 Thread rahul p
/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache That should give you what you want. hth, Arun On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: Hi, I am seeking a way to leverage hadoop's distributed cache in order to ship jars that are required to bootstrap a task's

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Arun C Murthy
to leverage hadoop's distributed cache in order to ship jars that are required to bootstrap a task's jvm, i.e., before a map/reduce task is launched. As a concrete example, let's say that I need to launch with '-javaagent:/path/profiler.jar'. In theory, the task tracker is responsible for downloading

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Stan Rosenberg
distributed cache in order to ship jars that are required to bootstrap a task's jvm, i.e., before a map/reduce task is launched. As a concrete example, let's say that I need to launch with '-javaagent:/path/profiler.jar'. In theory, the task tracker is responsible for downloading cached files onto

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Harsh J
/mapred_tutorial.html#DistributedCache That should give you what you want. hth, Arun On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: Hi, I am seeking a way to leverage hadoop's distributed cache in order to ship jars that are required to bootstrap a task's jvm, i.e., before a map

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Stan Rosenberg
On Fri, Aug 3, 2012 at 1:31 PM, Harsh J ha...@cloudera.com wrote: What this would do is merely take your passed -files jar (client-common) and symlink it into the JVM's working directory (the task's working directory) _before_ the JVM is begun, as foo.jar. So if I pass additionally, JVM opts

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Arun C Murthy
On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: Hi, I am seeking a way to leverage hadoop's distributed cache in order to ship jars that are required to bootstrap a task's jvm, i.e., before a map/reduce task is launched. As a concrete example, let's say that I need to launch with '-javaagent

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Stan Rosenberg
On Fri, Aug 3, 2012 at 4:19 PM, Arun C Murthy a...@hortonworks.com wrote: Just do -javaagent:./profiler.jar? Yep, that should work. Thanks!

Re: task jvm bootstrapping via distributed cache

2012-08-01 Thread Stan Rosenberg
On Tue, Jul 31, 2012 at 7:26 PM, Michael Segel michael_se...@hotmail.com wrote: Hi Stan, If I understood your question... you want to ship a jar to the nodes where the task will run prior to the start of the task? Not sure what it is you're trying to do... Your example isn't really clear.

Re: task jvm bootstrapping via distributed cache

2012-07-31 Thread Stan Rosenberg
filesystem. Thanks, stan On Mon, Jul 30, 2012 at 6:23 PM, Stan Rosenberg stan.rosenb...@gmail.com wrote: Hi, I am seeking a way to leverage hadoop's distributed cache in order to ship jars that are required to bootstrap a task's jvm, i.e., before a map/reduce task is launched. As a concrete

Fwd: task jvm bootstrapping via distributed cache

2012-07-31 Thread Stan Rosenberg
Forwarding to common-user to hopefully get more exposure... -- Forwarded message -- From: Stan Rosenberg stan.rosenb...@gmail.com Date: Tue, Jul 31, 2012 at 11:55 AM Subject: Re: task jvm bootstrapping via distributed cache To: mapreduce-u...@hadoop.apache.org I am guessing

Re: task jvm bootstrapping via distributed cache

2012-07-31 Thread Michael Segel
Rosenberg stan.rosenb...@gmail.com wrote: Forwarding to common-user to hopefully get more exposure... -- Forwarded message -- From: Stan Rosenberg stan.rosenb...@gmail.com Date: Tue, Jul 31, 2012 at 11:55 AM Subject: Re: task jvm bootstrapping via distributed cache

Re: Comparing input hdfs file to a distributed cache files

2012-07-23 Thread Shanu Shushmita
())); String line = null; while ((line = wordReader.readLine()) != null) { dyads.add(line); } // end of while wordReader.close(); }// end of try catch (IOException ioe) { System.err.println(IOException reading from distributed cache); } // end of catch

  1   2   3   >