DistributedCache

2014-12-11 Thread Srinivas Chamarthi
to compare output files coming from speculative attempt and prior attempt so that I can calculate the credit scoring of each node. I want to use DistributedCache to cache the local file system files in CommitPending stage from TaskImpl. But the DistributedCache is actually deprecated. is there any

Re: DistributedCache

2014-12-11 Thread Shahab Yunus
Look at this thread. It has alternatives to DistributedCache. http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api Basically you can use the new method job.addCacheFiles to pass on stuff to the individual tasks. Regards, Shahab On Thu, Dec

Re: DistributedCache

2014-12-11 Thread unmesha sreeveni
= fs.globStatus(cachefile); for (FileStatus status : list) { DistributedCache.addCacheFile(status.getPath().toUri(), conf); } Hope this link helps [1] http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html -- *Thanks Regards * *Unmesha Sreeveni U.B* *Hadoop

FSDownload, LocalFileSystem and DistributedCache permissions

2014-08-20 Thread Andre Kelpe
Hi, I am trying to use the DistributedCache and I am running into problems in a test, when using the LocalFileSystem. FSDownload complains about permissions like so. This is hadoop 2.4.1 with JDK 6 on Linux.: Caused by: java.io.IOException: Resource file:/path/to/some/file is not publicly

Adding jars using DistributedCache API.

2014-02-04 Thread Kim Chew
Hello there, I know I can do it with -libjars but I want to play around with the DCache API. First I copy my jar file to hdfs, therefore, /user/kim/lib/foo.jar And my M/R program references a class (Say, Foo) in foo.jar. In my driver, I use the DCache API,

Re: DistributedCache deprecated

2014-01-30 Thread Amit Mittal
Hi Prav, Yes, you are correct that DistributedCache does not upload file into memory. Also using job configuration and DistributedCache are 2 different approaches. I am referring based on Hadoop: The definitive guide Chapter:8 Side Data Distribution (Page 288-295). As you are saying that now

Re: DistributedCache deprecated

2014-01-30 Thread praveenesh kumar
command line arguments that is required by mappers/reducers. We were not discussing Side data distribution at all. The question was DistributedCache gets deprecated, where we can find the right methods which DistributedCache delivers. If you see the DistributedCache class in MR v1 - https

Re: DistributedCache deprecated

2014-01-30 Thread Amit Mittal
Hi Prav, You are correct, thanks for the explanation. As per below link, I can see that Job's method internally calls to DistributedCache itself ( http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org

DistributedCache deprecated

2014-01-29 Thread Giordano, Michael
I noticed that in Hadoop 2.2.0 org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated. (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class) Is there a class that provides equivalent functionality? My application relies heavily on DistributedCache

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated. (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class) Is there a class that provides equivalent functionality? My application relies heavily on DistributedCache. Thanks, Mike G. This communication, along with its attachments

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
@Jay - I don't know how Job class is replacing the DistributedCache class , but I remember trying distributed cache functions like void *addArchiveToClassPath http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
@Jay - Plus if you see DistributedCache class, these methods have been added inside the Job class, I am guessing they have kept the functionality same, just merged DistributedCache class into Job class itself. giving more methods for developers with less classes to worry about, thus simplifying

Re: DistributedCache deprecated

2014-01-29 Thread Jay Vyas
gotcha this makes sense On Wed, Jan 29, 2014 at 4:44 PM, praveenesh kumar praveen...@gmail.comwrote: @Jay - Plus if you see DistributedCache class, these methods have been added inside the Job class, I am guessing they have kept the functionality same, just merged DistributedCache class

Re: DistributedCache deprecated

2014-01-29 Thread praveenesh kumar
...@gmail.com *Sent:* Wednesday, January 29, 2014 4:41 PM *To:* user@hadoop.apache.org *Subject:* Re: DistributedCache deprecated @Jay - I don't know how Job class is replacing the DistributedCache class , but I remember trying distributed cache functions like void *addArchiveToClassPath

Re: DistributedCache deprecated

2014-01-29 Thread Amit Mittal
Hi Mike Prav, Although I am new to Hadoop, but would like to add my 2 cents if that helps. We are having 2 ways for distribution of shared data, one is using Job configuration and other is DistributedCache. As job configuration is read by the JT, TT and child JVMs, and each time

Re: DistributedCache is empty

2014-01-17 Thread Vinod Kumar Vavilapalli
What is the version of Hadoop that you are using? +Vinod On Jan 16, 2014, at 2:41 PM, Keith Wiley kwi...@keithwiley.com wrote: My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem

DistributedCache is empty

2014-01-16 Thread Keith Wiley
My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add -files foo.py,bar.py right after the class name (where those

Re: DistributedCache incompatibility issue between 1.0 and 2.0

2013-07-24 Thread Edward J. Yoon
@hadoop.apache.org Subject: DistributedCache incompatibility issue between 1.0 and 2.0 Hi, I wonder why setLocalFiles and addLocalFiles methods have been removed, and what should I use instead of them? -- Best Regards, Edward J. Yoon @eddieyoon -- Best Regards, Edward J. Yoon

DistributedCache incompatibility issue between 1.0 and 2.0

2013-07-19 Thread Edward J. Yoon
Hi, I wonder why setLocalFiles and addLocalFiles methods have been removed, and what should I use instead of them? -- Best Regards, Edward J. Yoon @eddieyoon

RE: DistributedCache incompatibility issue between 1.0 and 2.0

2013-07-19 Thread Botelho, Andrew
[mailto:edwardy...@apache.org] Sent: Friday, July 19, 2013 8:03 AM To: user@hadoop.apache.org Subject: DistributedCache incompatibility issue between 1.0 and 2.0 Hi, I wonder why setLocalFiles and addLocalFiles methods have been removed, and what should I use instead of them? -- Best Regards, Edward J

Re: DistributedCache incompatibility issue between 1.0 and 2.0

2013-07-19 Thread Ted Yu
: DistributedCache incompatibility issue between 1.0 and 2.0 Hi, I wonder why setLocalFiles and addLocalFiles methods have been removed, and what should I use instead of them? -- Best Regards, Edward J. Yoon @eddieyoon

Re: DistributedCache incompatibility issue between 1.0 and 2.0

2013-07-19 Thread Omkar Joshi
(URI uri) Add a file to be localized Parameters: uri - The uri of the cache to be localized -Original Message- From: Edward J. Yoon [mailto:edwardy...@apache.org] Sent: Friday, July 19, 2013 8:03 AM To: user@hadoop.apache.org Subject: DistributedCache incompatibility issue between 1.0

FileNotFoundException When DistributedCache file with YARN

2013-05-13 Thread YouPeng Yang
? Regards [1]--- @Override public void setup(Context context){ try { //add DistributedCache files to the Mapper. //this DistributedCache files are on the HDFS URI[] cacheFiles = DistributedCache.getCacheFiles(context.getConfiguration()); if (cacheFiles != null

Re: FileNotFoundException When DistributedCache file with YARN

2013-05-13 Thread YouPeng Yang
]--- @Override public void setup(Context context){ try { //add DistributedCache files to the Mapper. //this DistributedCache files are on the HDFS URI[] cacheFiles = DistributedCache.getCacheFiles(context.getConfiguration()); if (cacheFiles != null

DistributedCache does not seem to copy the HDFS files to local

2013-05-07 Thread YouPeng Yang
Hi All I want to use the DistributedCache to perform replicated join on the map side. My java code refer to [1][2]. When I run the job,the file that I want to cache in the local dir of my DN is not to copied.So the FileNotFoundException error came out[3] And I checkout the source code

Re: DistributedCache - why not read directly from HDFS?

2013-03-25 Thread Arun C Murthy
a matter of performance. Alberto On 23 March 2013 16:17, Harsh J ha...@cloudera.com wrote: A DistributedCache is not used just to distribute simple files but also native libraries and such which cannot be loaded by certain if its on HDFS. Also, keeping it on HDFS could provide less

Re: DistributedCache - why not read directly from HDFS?

2013-03-24 Thread Alberto Cordioli
Thanks for your reply Harsh. So if I want to read a simple text file, choosing whether to use DistributedCachce or HDFS it becomes just a matter of performance. Alberto On 23 March 2013 16:17, Harsh J ha...@cloudera.com wrote: A DistributedCache is not used just to distribute simple files

DistributedCache - why not read directly from HDFS?

2013-03-23 Thread Alberto Cordioli
Hi all, I was not able to find an answer to the following question. If the question has already been answered please give me the pointer to the right thread. Which are actually the differences between read file from HDFS in one mapper and use DistributedCache. I saw that with DistributedCache

How to unit test mappers reading data from DistributedCache?

2013-01-17 Thread Barak Yaish
Hi, I've found MRUnit a very easy to unit test jobs, is it possible as well to test mappers reading data from DisributedCache? If yes, can you share an example how the test' setup() should look like? Thanks.

Re: How to unit test mappers reading data from DistributedCache?

2013-01-17 Thread Hemanth Yamijala
Hi, Not sure how to do it using MRUnit, but should be possible to do this using a mocking framework like Mockito or EasyMock. In a mapper (or reducer), you'd use the Context classes to get the DistributedCache files. By mocking these to return what you want, you could potentially run a true unit

Re: FileNotFoundExcepion when getting files from DistributedCache

2012-11-22 Thread Harsh J
...@gmail.com wrote: Hi, I’ve 2 nodes cluster (v1.04), master and slave. On the master, in Tool.run() we add two files to the DistributedCache using addCacheFile(). Files do exist in HDFS. In the Mapper.setup() we want to retrieve those files from the cache using FSDataInputStream fs

Re: FileNotFoundExcepion when getting files from DistributedCache

2012-11-22 Thread Barak Yaish
Thanks for the quick response. I wanted to use DistributedCache to localized the files in interest to all nodes, so which API should I use in order to be able to read all those files, regardless the node running the mapper? On Thu, Nov 22, 2012 at 10:38 PM, Harsh J ha...@cloudera.com wrote

Re: DistributedCache: getLocalCacheFiles() always null

2012-10-19 Thread Alberto Cordioli
Ok, it was my fault. Instead of using getConf() when I added a new cache file I should use job.getConfiguration() Not it works. Cheers, Alberto On 19 October 2012 09:19, Alberto Cordioli cordioli.albe...@gmail.com wrote: Hi all, I am trying to use the DistributedCache with the new Hadoop

DistributedCache Question

2012-04-09 Thread Nick Collier
Hi, Using Hadoop 1.0.1 I'm trying to use the DistributedCache to add additional jars to the classpath used by my Mappers but I can't get it to work. In the run(String[] args) method of my Tool implementation, I've tried: FileSystem fs = DistributedFileSystem.get(conf

Re: DistributedCache Question

2012-04-09 Thread Harsh J
, as they call it), and just submit that. Either of these approaches will get you going. On Mon, Apr 9, 2012 at 11:08 PM, Nick Collier nick.coll...@verizon.net wrote: Hi, Using Hadoop 1.0.1 I'm trying to use the DistributedCache to add additional jars to the classpath used by my Mappers but I

Re: DistributedCache Question

2012-04-09 Thread Nick Collier
, as they call it), and just submit that. Either of these approaches will get you going. On Mon, Apr 9, 2012 at 11:08 PM, Nick Collier nick.coll...@verizon.net wrote: Hi, Using Hadoop 1.0.1 I'm trying to use the DistributedCache to add additional jars to the classpath used by my Mappers but I

DistributedCache. addFileToClassPath non-jars

2012-03-20 Thread Nabib El-Rahman
Hi All, We are using DistributedCache.addFileToClassPath to have jars as well as a property file available in our classpath. For some reason, the property file cannot be found in our classpath, but the jars are found. Is there something specific to the implementation of addFileToClassPath that

Re: DistributedCache in NewAPI on 0.20.X branch

2011-12-17 Thread Bejoy Ks
Hi Shi My Bad, the syntax i posted last time was not the right one , sorry was from my hand held @Override public void setup(Context context) { File file = new File(TestFile.txt); . . . } I didn't get a chance to debug your code, but if you are looking for a working example

Re: DistributedCache in NewAPI on 0.20.X branch

2011-12-17 Thread Shi Yu
Thank you Bejoy! Following your code examples, it finally works. Actually I only changed two places in my original code. First, I added the Override tag. Second, I added a new exception catch(FileNotFoundException e), and now it works! I appreciate your kind and precise help. Best, Shi

Re: DistributedCache in NewAPI on 0.20.X branch

2011-12-16 Thread Bejoy Ks
the DistributedCache files using old API (JobConf), but in new API it always returns null. I read some previous discussions that on 0.20.X branch, calling DistributedCache using old API is encouraged. My question is: Is it possible to use DistributedCache using new API, or the only possible

Re: DistributedCache in NewAPI on 0.20.X branch

2011-12-16 Thread Shi Yu
Follow my previous question, I put the complete code as follows, I doubt is there any method to get this working on 0.20.X using the new API. The command I executed was: bin/hadoop jar myjar.jar FileTest -files textFile.txt /input/ /output/ The complete code: public class FileTest extends

DistributedCache in NewAPI on 0.20.X branch

2011-12-15 Thread Shi Yu
Hi, I am using 0.20.X branch. However, I need to use the new API because it has the cleanup(context) method in Mapper. However, I am confused about how to load the cached files in mapper. I could load the DistributedCache files using old API (JobConf), but in new API it always returns

Issue with DistributedCache

2011-11-24 Thread Denis Kreis
Hi I' trying to modify the word count example (http://wiki.apache.org/hadoop/WordCount) using the new api (org.apache.hadoop.mapreduce.*). I run the job on a remote pseudo-distributed cluster. It works fine with the old api, but when I using the new one, i'm getting this: 11/11/24 11:28:02 INFO

Re: Issue with DistributedCache

2011-11-24 Thread Bejoy Ks
Hi Denis Unfortunately the mailing lists strips off attachments, So it'd be great if you could paste the source in some location and share the url of the same. If the source is small enough then please include the same in subject body. For a quick comparison, Try comparing your code with

Re: Issue with DistributedCache

2011-11-24 Thread Denis Kreis
Hi Bejoy 1. Old API: The Map and Reduce classes are the same as in the example, the main method is as follows public static void main(String[] args) throws IOException, InterruptedException { UserGroupInformation ugi = UserGroupInformation.createProxyUser(remote user name,

Re: Issue with DistributedCache

2011-11-24 Thread Michel Segel
Silly question... Why do you need to use the distributed cache for the word count program? What are you trying to accomplish? I've only had to play with it for one project where we had to push out a bunch of c++ code to the nodes as part of a job... Sent from a remote device. Please excuse

Re: Issue with DistributedCache

2011-11-24 Thread Denis Kreis
Without using the distributed cache i'm getting the same error. It's because i start the job from a remote client / programmatically 2011/11/24 Michel Segel michael_se...@hotmail.com: Silly question... Why do you need to use the distributed cache for the word count program?  What are you

Re: Issue with DistributedCache

2011-11-24 Thread Michel Segel
Denis... Sorry, you lost me. Just to make sure we're using the same terminology... The cluster is comprised of two types of nodes... The data nodes which run DN,TT, and if you have HBase, RS. Then there are control nodes which run you NN,SN, JT and if you run HBase, HM and ZKs ... Outside of

Re: Issue with DistributedCache

2011-11-24 Thread Bejoy Ks
Hi Denis I tried your code with out distributed cache locally and it worked fine for me. Please find it at http://pastebin.com/ki175YUx I echo Mike's words in submitting a map reduce jobs remotely. The remote machine can be your local PC or any utility server as Mike specified. What you

Re: Issue with DistributedCache

2011-11-24 Thread Alexander C.H. Lorenz
Hi, a typo? import com.bejoy.sampels.worcount.WordCountDriver; = wor_d_count ? - alex On Thu, Nov 24, 2011 at 3:45 PM, Bejoy Ks bejoy.had...@gmail.com wrote: Hi Denis I tried your code with out distributed cache locally and it worked fine for me. Please find it at

Re: Issue with DistributedCache

2011-11-24 Thread Bejoy Ks
My Bad, I pasted the wrong file. It is updated now, did a few tiny modifications(commented in code) and it was working fine for me. http://pastebin.com/RDuZX7Qd Alex, Thanks a lot for pointing out that. Regards Bejoy.KS On Thu, Nov 24, 2011 at 8:31 PM, Alexander C.H. Lorenz

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Robert Evans
in HDFS 2. Run Job A, which specifies those files to be put into DistributedCache space 3. job runs fine 4. Run Job A some time later. job runs fine again. Breaking sequence: 1. have files to be cached in HDFS 2. Run Job A, which specifies those files to be put into DistributedCache space 3. job runs

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
into DistributedCache space 3. job runs fine 4. Run Job A some time later. job runs fine again. Breaking sequence: 1. have files to be cached in HDFS 2. Run Job A, which specifies those files to be put into DistributedCache space 3. job runs fine 4. Manually delete cached files out of local disk

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Robert Evans
into DistributedCache space 3. job runs fine 4. Run Job A some time later. job runs fine again. Breaking sequence: 1. have files to be cached in HDFS 2. Run Job A, which specifies those files to be put into DistributedCache space 3. job runs fine 4. Manually delete cached files out of local

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Robert Evans
. Run Job A, which specifies those files to be put into DistributedCache space 3. job runs fine 4. Run Job A some time later. job runs fine again. Breaking sequence: 1. have files to be cached in HDFS 2. Run Job A, which specifies those files to be put into DistributedCache space 3

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
I'm not concerned about disk space usage -- the script we used that deleted the taskTracker cache path has been fixed not to do so. I'm curious about the exact behavior of jobs that use DistributedCache files. Again, it seems safe from your description to delete files between completed runs. How

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Robert Evans
the taskTracker cache path has been fixed not to do so. I'm curious about the exact behavior of jobs that use DistributedCache files. Again, it seems safe from your description to delete files between completed runs. How could the job or the taskTracker distinguish between the files having been

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Meng Mao
So the proper description of how DistributedCache normally works is: 1. have files to be cached sitting around in HDFS 2. Run Job A, which specifies those files to be put into DistributedCache space. Each worker node copies the to-be-cached files from HDFS to local disk, but more importantly

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-27 Thread Robert Evans
stamp the distributed cache will start downloading the new file. Also when the distributed cache on a disk fills up unused entries in it are deleted. --Bobby Evans On 9/27/11 2:32 PM, Meng Mao meng...@gmail.com wrote: So the proper description of how DistributedCache normally works is: 1. have

RE: Temporary Files to be sent to DistributedCache

2011-09-27 Thread GOEKE, MATTHEW (AG/1000)
[mailto:less...@q.com] Sent: Tuesday, September 27, 2011 4:48 PM To: common-user@hadoop.apache.org Subject: Temporary Files to be sent to DistributedCache I have a need to write information retrieved from a database to a series of files that need to be made available to my mappers. Because each mapper needs

Re: Temporary Files to be sent to DistributedCache

2011-09-27 Thread lessonz
So, I thought about that, and I'd considered writing to the HDFS and then copying the file into the DistributedCache so each mapper/reducer doesn't have to reach into the HDFS for these files. Is that the best way to handle this? On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000

Re: Temporary Files to be sent to DistributedCache

2011-09-27 Thread Linden Hillenbrand
and then copying the file into the DistributedCache so each mapper/reducer doesn't have to reach into the HDFS for these files. Is that the best way to handle this? On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: The simplest route I can think of is to ingest

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-26 Thread Meng Mao
Let's frame the issue in another way. I'll describe a sequence of Hadoop operations that I think should work, and then I'll get into what we did and how it failed. Normal sequence: 1. have files to be cached in HDFS 2. Run Job A, which specifies those files to be put into DistributedCache space 3

operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Meng Mao
We use the DistributedCache class to distribute a few lookup files for our jobs. We have been aggressively deleting failed task attempts' leftover data , and our script accidentally deleted the path to our distributed cache files. Our task attempt leftover data was here [per node]: /hadoop/hadoop

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Robert Evans
it was before. --Bobby Evans On 9/23/11 1:57 AM, Meng Mao meng...@gmail.com wrote: We use the DistributedCache class to distribute a few lookup files for our jobs. We have been aggressively deleting failed task attempts' leftover data , and our script accidentally deleted the path to our distributed

Re: operation of DistributedCache following manual deletion of cached files?

2011-09-23 Thread Meng Mao
Hmm, I must have really missed an important piece somewhere. This is from the MapRed tutorial text: DistributedCache is a facility provided by the Map/Reduce framework to cache files (text, archives, jars and so on) needed by applications. Applications specify the files to be cached via urls

DistributedCache adds prefix to the file

2011-06-15 Thread Sergey Bartunov
Hello. I'm using DistributedCache at first time, and I have found that it adds some prefixes to the files. For example, the original file was test.txt, it became localhosttest.txt in the cache. How to handle such things? Just see if cache file ends with original filename?

Re: DistributedCache

2011-06-09 Thread Robert Evans
Armstrong john.armstr...@ccri.com wrote: On Tue, 7 Jun 2011 09:41:21 -0300, Juan P. gordoslo...@gmail.com wrote: Not 100% clear on what you meant. You are saying I should put the file into my HDFS cluster or should I use DistributedCache? If you suggest the latter, could you address my original

Re: DistributedCache

2011-06-07 Thread John Armstrong
On Tue, 7 Jun 2011 09:41:21 -0300, Juan P. gordoslo...@gmail.com wrote: Not 100% clear on what you meant. You are saying I should put the file into my HDFS cluster or should I use DistributedCache? If you suggest the latter, could you address my original question? I mean that you can

Re: DistributedCache

2011-06-06 Thread John Armstrong
why bother DistributedCache, the only reason might be the shared directory is costly for network and usually has storage limit. That's exactly the problem the DistributedCache is designed for. It guarantees that you only need to copy the file to any given local filesystem once. Using the way

Re: DistributedCache - getLocalCacheFiles method returns null

2011-06-01 Thread neeral beladia
btw, just to let you know that I am running my job in a pseudo-distributed mode. Thanks, Neeral From: neeral beladia neeral_bela...@yahoo.com To: common-user@hadoop.apache.org Sent: Tue, May 31, 2011 10:00:00 PM Subject: DistributedCache - getLocalCacheFiles

DistributedCache - getLocalCacheFiles method returns null

2011-05-31 Thread neeral beladia
Hi, I have a file on amazon aws under : s3n://Access Key:Secret Key@Bucket Name/file.txt I want this file to be accessible by the slave nodes via Distributed Cache. I put the following after the job configuration statements in the Driver program : DistributedCache.addCacheFile(new

use DistributedCache to add many files to class path

2011-02-16 Thread lei liu
I use DistributedCache to add two files to class path, exampe below code : String jeJarPath = /group/aladdin/lib/je-4.1.7.jar; DistributedCache.addFileToClassPath(new Path(jeJarPath), conf); String tairJarPath = /group/aladdin/lib/tair-aladdin-2.3.1.jar

Re: use DistributedCache to add many files to class path

2011-02-16 Thread Alejandro Abdelnur
Lei Liu, You have a cutpaste error the second addition should use 'tairJarPath' but it is using the 'jeJarPath' Hope this helps. Alejandro On Thu, Feb 17, 2011 at 11:50 AM, lei liu liulei...@gmail.com wrote: I use DistributedCache to add two files to class path, exampe below code

RE: How to use DistributedCache to load data generated from a previous MapReduce job?

2011-02-16 Thread praveen.peddi
] Sent: Wednesday, February 16, 2011 11:36 AM To: core-u...@hadoop.apache.org Subject: How to use DistributedCache to load data generated from a previous MapReduce job? I have a MapReduce job #1, which processes input files, and produces key, value pairs data. These key-value pairs data are stored

Re: Problem with DistributedCache after upgrading to CDH3b2

2010-10-05 Thread Kim Vogt
the DistributedCache have failed. Typically, we add files to the cache prior to job startup, using addCacheFile(URI, conf) and then get them on the other side, using getLocalCacheFiles(conf). I believe the hadoop-core versions for these are 0.20.2+228 and +320 respectively. We then open the files and read

Re: Problem with DistributedCache after upgrading to CDH3b2

2010-10-05 Thread Jamie Cockrill
Hi Kim, We didn't fix it in the end. I just ended up manually writing the files to the cluster using the FileSystem class, and then reading them back out again on the other side. Not terribly efficient as I guess the point of DistributedCache is that the files get distributed to every node

Problem with DistributedCache after upgrading to CDH3b2

2010-07-16 Thread Jamie Cockrill
Dear All, We recently upgraded from CDH3b1 to b2 and ever since, all our mapreduce jobs that use the DistributedCache have failed. Typically, we add files to the cache prior to job startup, using addCacheFile(URI, conf) and then get them on the other side, using getLocalCacheFiles(conf). I

Running into problems with DistributedCache

2010-04-15 Thread Kris Nuttycombe
Hi, all, I'm having problems with my Mapper instances accessing the DistributedCache. A bit of background: I'm running on a single-node cluster, just trying to get my first map/reduce job functioning. Both the job tracker and the primary namenode exist on the same host. In the client, I am able

Example for using DistributedCache class

2010-02-03 Thread Udaya Lakshmi
Hi, As a newbie to hadoop, I am not able to figure out how to use DistributedCache class. Can someone give me a small code which distributes file to the cluster and the show how to open and use the file in the map or reduce task. Thanks, Udaya

Re: Example for using DistributedCache class

2010-02-03 Thread Nick Jones
); DistributedCache.addCacheFile(new URI(/path/to/file2.txt), conf); DistributedCache.addCacheFile(new URI(/path/to/file3.txt), conf); ... } } Nick Jones Udaya Lakshmi wrote: Hi, As a newbie to hadoop, I am not able to figure out how to use DistributedCache class. Can someone give me a small code which

Re: Example for using DistributedCache class

2010-02-03 Thread Udaya Lakshmi
Hi Nick, I am not able to start the following job. I have the file that has to be passed to distributedcache in the local filesystem of the task tracker. Can you tell me if I am missing something? import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.mapred

Re: Example for using DistributedCache class

2010-02-03 Thread Jones, Nick
The files for the DC need to be on HDFS. Nick Jones Sent by radiation. On Feb 3, 2010, at 12:32 PM, Udaya Lakshmi udaya...@gmail.com wrote: Hi Nick, I am not able to start the following job. I have the file that has to be passed to distributedcache in the local filesystem of the task

RE: DistributedCache purgeCache()

2009-09-07 Thread Amogh Vasekar
: DistributedCache purgeCache() Thanks for your swift response. But where can I find deletecache()? Thanks. -Original Message- From: Amogh Vasekar [mailto:am...@yahoo-inc.com] Sent: Thu 9/3/2009 2:44 PM To: common-user@hadoop.apache.org Subject: RE: DistributedCache purgeCache() AFAIK

RE: DistributedCache purgeCache()

2009-09-06 Thread #YONG YONG CHENG#
Thanks for your swift response. But where can I find deletecache()? Thanks. -Original Message- From: Amogh Vasekar [mailto:am...@yahoo-inc.com] Sent: Thu 9/3/2009 2:44 PM To: common-user@hadoop.apache.org Subject: RE: DistributedCache purgeCache() AFAIK, releaseCache only works

DistributedCache purgeCache()

2009-09-02 Thread #YONG YONG CHENG#
Good Day, I have a question on the DistributedCache as follows. I have used DistributedCache to move my executable(.exe) around the (onto the local filesystems of) nodes in Hadoop and run the .exe (via addCacheArchive() and getLocalCacheArchives()). But I discovered after my job, the .exe

Re: Example of deploying jars through DistributedCache?

2009-04-08 Thread Tom White
Does it work if you use addArchiveToClassPath()? Also, it may be more convenient to use GenericOptionsParser's -libjars option. Tom On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball aa...@cloudera.com wrote: Hi all, I'm stumped as to how to use the distributed cache's classpath feature. I have

RE: Example of deploying jars through DistributedCache?

2009-04-08 Thread Brian MacKay
/aaronTest2.jar)); -Original Message- From: Tom White [mailto:t...@cloudera.com] Sent: Wednesday, April 08, 2009 9:36 AM To: core-user@hadoop.apache.org Subject: Re: Example of deploying jars through DistributedCache? Does it work if you use addArchiveToClassPath()? Also, it may be more

Re: Example of deploying jars through DistributedCache?

2009-04-08 Thread Aaron Kimball
- From: Tom White [mailto:t...@cloudera.com] Sent: Wednesday, April 08, 2009 9:36 AM To: core-user@hadoop.apache.org Subject: Re: Example of deploying jars through DistributedCache? Does it work if you use addArchiveToClassPath()? Also, it may be more convenient to use

Example of deploying jars through DistributedCache?

2009-03-01 Thread Aaron Kimball
Hi all, I'm stumped as to how to use the distributed cache's classpath feature. I have a library of Java classes I'd like to distribute to jobs and use in my mapper; I figured the DCache's addFileToClassPath() method was the correct means, given the example at

Re: How can I use DistributedCache in Hive programs?

2009-02-28 Thread Min Zhou
DistributedCache in Hive programs? Hi list, In the past time, I used to store auxiliaries by DistributedCache in my hadoop programs, and read them locally when mappers configuring. I've found an add [FILE] value [value]* option in Hive cli mode for sending files to HDFS . How can I use

RE: How can I use DistributedCache in Hive programs?

2009-02-27 Thread Joydeep Sen Sarma
@hadoop.apache.org Subject: How can I use DistributedCache in Hive programs? Hi list, In the past time, I used to store auxiliaries by DistributedCache in my hadoop programs, and read them locally when mappers configuring. I've found an add [FILE] value [value]* option in Hive cli mode for sending files

Re: Does anyone have a working example for using MapFiles on the DistributedCache?

2008-12-29 Thread Sean Shanny
a MapReader based on a file in the DistributedCache. Thanks. --sean Sean Shanny ssha...@tripadvisor.com On Dec 28, 2008, at 10:59 PM, Amareshwari Sriramadasu wrote: Sean Shanny wrote: To all, Version: hadoop-0.17.2.1-core.jar I have created a MapFile. What I don't seem to be able to do

Re: Does anyone have a working example for using MapFiles on the DistributedCache?

2008-12-28 Thread Amareshwari Sriramadasu
Sean Shanny wrote: To all, Version: hadoop-0.17.2.1-core.jar I have created a MapFile. What I don't seem to be able to do is correctly place the MapFile in the DistributedCache and the make use of it in a map method. I need the following info please: 1.How and where to place

Re: Having trouble accessing MapFiles in the DistributedCache

2008-12-26 Thread Sean Shanny
/url/data $ bin/hadoop fs -copyFromLocal /tmp/ur/index /2008-12-19/url/index and placed them in the DistributedCache using the following calls in the JobConf class: DistributedCache.addCacheFile(new URI(/2008-12-19/url/data), conf); DistributedCache.addCacheFile(new URI(/2008-12-19/url/index

Does anyone have a working example for using MapFiles on the DistributedCache?

2008-12-26 Thread Sean Shanny
To all, Version: hadoop-0.17.2.1-core.jar I have created a MapFile. What I don't seem to be able to do is correctly place the MapFile in the DistributedCache and the make use of it in a map method. I need the following info please: 1. How and where to place the MapFile directory so

Re: Having trouble accessing MapFiles in the DistributedCache

2008-12-25 Thread Devaraj Das
. I put the files into the HDFS using the following commands: $ bin/hadoop fs -copyFromLocal /tmp/ur/data/2008-12-19/url/data $ bin/hadoop fs -copyFromLocal /tmp/ur/index /2008-12-19/url/index and placed them in the DistributedCache using the following calls in the JobConf class

Having trouble accessing MapFiles in the DistributedCache

2008-12-24 Thread Sean Shanny
in the DistributedCache using the following calls in the JobConf class: DistributedCache.addCacheFile(new URI(/2008-12-19/url/data), conf); DistributedCache.addCacheFile(new URI(/2008-12-19/url/index), conf); What I cannot figure out how to do is actually access the MapFile now within my Map

DistributedCache staleness

2008-12-10 Thread Anthony Urso
I have been having problems with changes to DistributedCache files on HDFS not being reflected on subsequently run jobs. I can change the filename to work around this, but I would prefer a way to invalidate the Cache when neccesary. Is there a way to lower the timeout or flush the Cache? Cheers

Archives larger thatn 2^31 bytes in DistributedCache

2008-10-31 Thread Christian Kunz
) at java.util.zip.ZipFile.(ZipFile.java:131) at org.apache.hadoop.fs.FileUtil.unZip(FileUtil.java:421) at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache. java:338) at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache. java:161

  1   2   >