to compare output files coming from speculative attempt and
prior attempt so that I can calculate the credit scoring of each node.
I want to use DistributedCache to cache the local file system files in
CommitPending stage from TaskImpl. But the DistributedCache is actually
deprecated. is there any
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.
Regards,
Shahab
On Thu, Dec
= fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
Hope this link helps
[1]
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
--
*Thanks Regards *
*Unmesha Sreeveni U.B*
*Hadoop
Hi,
I am trying to use the DistributedCache and I am running into problems in a
test, when using the LocalFileSystem. FSDownload complains about
permissions like so. This is hadoop 2.4.1 with JDK 6 on Linux.:
Caused by: java.io.IOException: Resource file:/path/to/some/file is not
publicly
Hello there,
I know I can do it with -libjars but I want to play around with the
DCache API. First I copy my jar file to hdfs, therefore,
/user/kim/lib/foo.jar
And my M/R program references a class (Say, Foo) in foo.jar. In my driver,
I use the DCache API,
Hi Prav,
Yes, you are correct that DistributedCache does not upload file into
memory. Also using job configuration and DistributedCache are 2 different
approaches. I am referring based on Hadoop: The definitive guide
Chapter:8 Side Data Distribution (Page 288-295).
As you are saying that now
command line
arguments that is required by mappers/reducers.
We were not discussing Side data distribution at all.
The question was DistributedCache gets deprecated, where we can find the
right methods which DistributedCache delivers.
If you see the DistributedCache class in MR v1 -
https
Hi Prav,
You are correct, thanks for the explanation. As per below link, I can see
that Job's method internally calls to DistributedCache itself (
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org
I noticed that in Hadoop 2.2.0
org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
(http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
Is there a class that provides equivalent functionality? My application relies
heavily on DistributedCache
org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated.
(http://hadoop.apache.org/docs/current/api/deprecated-list.html#class)
Is there a class that provides equivalent functionality? My application
relies heavily on DistributedCache.
Thanks,
Mike G.
This communication, along with its attachments
@Jay - I don't know how Job class is replacing the DistributedCache class ,
but I remember trying distributed cache functions like
void *addArchiveToClassPath
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path%29
@Jay - Plus if you see DistributedCache class, these methods have been
added inside the Job class, I am guessing they have kept the functionality
same, just merged DistributedCache class into Job class itself. giving more
methods for developers with less classes to worry about, thus simplifying
gotcha this makes sense
On Wed, Jan 29, 2014 at 4:44 PM, praveenesh kumar praveen...@gmail.comwrote:
@Jay - Plus if you see DistributedCache class, these methods have been
added inside the Job class, I am guessing they have kept the functionality
same, just merged DistributedCache class
...@gmail.com
*Sent:* Wednesday, January 29, 2014 4:41 PM
*To:* user@hadoop.apache.org
*Subject:* Re: DistributedCache deprecated
@Jay - I don't know how Job class is replacing the DistributedCache
class , but I remember trying distributed cache functions like
void *addArchiveToClassPath
Hi Mike Prav,
Although I am new to Hadoop, but would like to add my 2 cents if that helps.
We are having 2 ways for distribution of shared data, one is using Job
configuration and other is DistributedCache.
As job configuration is read by the JT, TT and child JVMs, and each time
What is the version of Hadoop that you are using?
+Vinod
On Jan 16, 2014, at 2:41 PM, Keith Wiley kwi...@keithwiley.com wrote:
My driver is implemented around Tool and so should be wrapping
GenericOptionsParser internally. Nevertheless, neither -files nor
DistributedCache methods seem
My driver is implemented around Tool and so should be wrapping
GenericOptionsParser internally. Nevertheless, neither -files nor
DistributedCache methods seem to work. Usage on the command line is straight
forward, I simply add -files foo.py,bar.py right after the class name (where
those
@hadoop.apache.org
Subject: DistributedCache incompatibility issue between 1.0 and 2.0
Hi,
I wonder why setLocalFiles and addLocalFiles methods have been removed,
and what should I use instead of them?
--
Best Regards, Edward J. Yoon
@eddieyoon
--
Best Regards, Edward J. Yoon
Hi,
I wonder why setLocalFiles and addLocalFiles methods have been
removed, and what should I use instead of them?
--
Best Regards, Edward J. Yoon
@eddieyoon
[mailto:edwardy...@apache.org]
Sent: Friday, July 19, 2013 8:03 AM
To: user@hadoop.apache.org
Subject: DistributedCache incompatibility issue between 1.0 and 2.0
Hi,
I wonder why setLocalFiles and addLocalFiles methods have been removed, and
what should I use instead of them?
--
Best Regards, Edward J
: DistributedCache incompatibility issue between 1.0 and 2.0
Hi,
I wonder why setLocalFiles and addLocalFiles methods have been removed,
and what should I use instead of them?
--
Best Regards, Edward J. Yoon
@eddieyoon
(URI uri)
Add a file to be localized
Parameters:
uri - The uri of the cache to be localized
-Original Message-
From: Edward J. Yoon [mailto:edwardy...@apache.org]
Sent: Friday, July 19, 2013 8:03 AM
To: user@hadoop.apache.org
Subject: DistributedCache incompatibility issue between 1.0
?
Regards
[1]---
@Override
public void setup(Context context){
try {
//add DistributedCache files to the Mapper.
//this DistributedCache files are on the HDFS
URI[] cacheFiles =
DistributedCache.getCacheFiles(context.getConfiguration());
if (cacheFiles != null
]---
@Override
public void setup(Context context){
try {
//add DistributedCache files to the Mapper.
//this DistributedCache files are on the HDFS
URI[] cacheFiles =
DistributedCache.getCacheFiles(context.getConfiguration());
if (cacheFiles != null
Hi All
I want to use the DistributedCache to perform replicated join on the
map side.
My java code refer to [1][2].
When I run the job,the file that I want to cache in the local dir of my
DN is not to copied.So the FileNotFoundException error came out[3]
And I checkout the source code
a matter of performance.
Alberto
On 23 March 2013 16:17, Harsh J ha...@cloudera.com wrote:
A DistributedCache is not used just to distribute simple files but
also native libraries and such which cannot be loaded by certain if
its on HDFS.
Also, keeping it on HDFS could provide less
Thanks for your reply Harsh.
So if I want to read a simple text file, choosing whether to use
DistributedCachce or HDFS it becomes just a matter of performance.
Alberto
On 23 March 2013 16:17, Harsh J ha...@cloudera.com wrote:
A DistributedCache is not used just to distribute simple files
Hi all,
I was not able to find an answer to the following question. If the
question has already been answered please give me the pointer to the
right thread.
Which are actually the differences between read file from HDFS in one
mapper and use DistributedCache.
I saw that with DistributedCache
Hi,
I've found MRUnit a very easy to unit test jobs, is it possible as well to
test mappers reading data from DisributedCache? If yes, can you share an
example how the test' setup() should look like?
Thanks.
Hi,
Not sure how to do it using MRUnit, but should be possible to do this using
a mocking framework like Mockito or EasyMock. In a mapper (or reducer),
you'd use the Context classes to get the DistributedCache files. By mocking
these to return what you want, you could potentially run a true unit
...@gmail.com wrote:
Hi,
I’ve 2 nodes cluster (v1.04), master and slave. On the master, in Tool.run()
we add two files to the DistributedCache using addCacheFile(). Files do
exist in HDFS. In the Mapper.setup() we want to retrieve those files from
the cache using FSDataInputStream fs
Thanks for the quick response.
I wanted to use DistributedCache to localized the files in interest to all
nodes, so which API should I use in order to be able to read all those
files, regardless the node running the mapper?
On Thu, Nov 22, 2012 at 10:38 PM, Harsh J ha...@cloudera.com wrote
Ok, it was my fault.
Instead of using getConf() when I added a new cache file I should use
job.getConfiguration()
Not it works.
Cheers,
Alberto
On 19 October 2012 09:19, Alberto Cordioli cordioli.albe...@gmail.com wrote:
Hi all,
I am trying to use the DistributedCache with the new Hadoop
Hi,
Using Hadoop 1.0.1 I'm trying to use the DistributedCache to add additional
jars to the classpath used by my Mappers but I can't get it to work. In the
run(String[] args) method of my Tool implementation, I've tried:
FileSystem fs = DistributedFileSystem.get(conf
, as they call
it), and just submit that.
Either of these approaches will get you going.
On Mon, Apr 9, 2012 at 11:08 PM, Nick Collier nick.coll...@verizon.net wrote:
Hi,
Using Hadoop 1.0.1 I'm trying to use the DistributedCache to add additional
jars to the classpath used by my Mappers but I
, as they call
it), and just submit that.
Either of these approaches will get you going.
On Mon, Apr 9, 2012 at 11:08 PM, Nick Collier nick.coll...@verizon.net
wrote:
Hi,
Using Hadoop 1.0.1 I'm trying to use the DistributedCache to add additional
jars to the classpath used by my Mappers but I
Hi All,
We are using DistributedCache.addFileToClassPath to have jars as well as a
property file available in our classpath.
For some reason, the property file cannot be found in our classpath, but
the jars are found.
Is there something specific to the implementation of addFileToClassPath
that
Hi Shi
My Bad, the syntax i posted last time was not the right one ,
sorry was from my hand held
@Override
public void setup(Context context)
{
File file = new File(TestFile.txt);
.
.
.
}
I didn't get a chance to debug your code, but if you are looking for a
working example
Thank you Bejoy!
Following your code examples, it finally works.
Actually I only changed two places in my original code. First,
I added the Override tag. Second, I added a new exception
catch(FileNotFoundException e), and now it works!
I appreciate your kind and precise help.
Best,
Shi
the DistributedCache
files using old API (JobConf), but in new API it always returns null. I
read some previous discussions that on 0.20.X branch, calling
DistributedCache using old API is encouraged. My question is: Is it
possible to use DistributedCache using new API, or the only possible
Follow my previous question, I put the complete code as
follows, I doubt is there any method to get this working on
0.20.X using the new API.
The command I executed was:
bin/hadoop jar myjar.jar FileTest -files textFile.txt /input/
/output/
The complete code:
public class FileTest extends
Hi,
I am using 0.20.X branch. However, I need to use the new API because it
has the cleanup(context) method in Mapper. However, I am confused about
how to load the cached files in mapper. I could load the
DistributedCache files using old API (JobConf), but in new API it
always returns
Hi
I' trying to modify the word count example
(http://wiki.apache.org/hadoop/WordCount) using the new api
(org.apache.hadoop.mapreduce.*). I run the job on a remote
pseudo-distributed cluster. It works fine with the old api, but when I
using the new one, i'm getting this:
11/11/24 11:28:02 INFO
Hi Denis
Unfortunately the mailing lists strips off attachments, So it'd be
great if you could paste the source in some location and share the url of
the same. If the source is small enough then please include the same in
subject body.
For a quick comparison, Try comparing your code with
Hi Bejoy
1. Old API:
The Map and Reduce classes are the same as in the example, the main
method is as follows
public static void main(String[] args) throws IOException,
InterruptedException {
UserGroupInformation ugi =
UserGroupInformation.createProxyUser(remote user name,
Silly question... Why do you need to use the distributed cache for the word
count program?
What are you trying to accomplish?
I've only had to play with it for one project where we had to push out a bunch
of c++ code to the nodes as part of a job...
Sent from a remote device. Please excuse
Without using the distributed cache i'm getting the same error. It's
because i start the job from a remote client / programmatically
2011/11/24 Michel Segel michael_se...@hotmail.com:
Silly question... Why do you need to use the distributed cache for the word
count program?
What are you
Denis...
Sorry, you lost me.
Just to make sure we're using the same terminology...
The cluster is comprised of two types of nodes...
The data nodes which run DN,TT, and if you have HBase, RS.
Then there are control nodes which run you NN,SN, JT and if you run HBase, HM
and ZKs ...
Outside of
Hi Denis
I tried your code with out distributed cache locally and it worked
fine for me. Please find it at
http://pastebin.com/ki175YUx
I echo Mike's words in submitting a map reduce jobs remotely. The remote
machine can be your local PC or any utility server as Mike specified. What
you
Hi,
a typo?
import com.bejoy.sampels.worcount.WordCountDriver;
= wor_d_count ?
- alex
On Thu, Nov 24, 2011 at 3:45 PM, Bejoy Ks bejoy.had...@gmail.com wrote:
Hi Denis
I tried your code with out distributed cache locally and it worked
fine for me. Please find it at
My Bad, I pasted the wrong file. It is updated now, did a few tiny
modifications(commented in code) and it was working fine for me.
http://pastebin.com/RDuZX7Qd
Alex,
Thanks a lot for pointing out that.
Regards
Bejoy.KS
On Thu, Nov 24, 2011 at 8:31 PM, Alexander C.H. Lorenz
in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Run Job A some time later. job runs fine again.
Breaking sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs
into DistributedCache
space
3. job runs fine
4. Run Job A some time later. job runs fine again.
Breaking sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Manually delete cached files out of local disk
into DistributedCache
space
3. job runs fine
4. Run Job A some time later. job runs fine again.
Breaking sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Manually delete cached files out of local
. Run Job A, which specifies those files to be put into DistributedCache
space
3. job runs fine
4. Run Job A some time later. job runs fine again.
Breaking sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3
I'm not concerned about disk space usage -- the script we used that deleted
the taskTracker cache path has been fixed not to do so.
I'm curious about the exact behavior of jobs that use DistributedCache
files. Again, it seems safe from your description to delete files between
completed runs. How
the taskTracker cache path has been fixed not to do so.
I'm curious about the exact behavior of jobs that use DistributedCache
files. Again, it seems safe from your description to delete files between
completed runs. How could the job or the taskTracker distinguish between the
files having been
So the proper description of how DistributedCache normally works is:
1. have files to be cached sitting around in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space. Each worker node copies the to-be-cached files from HDFS to local
disk, but more importantly
stamp the distributed
cache will start downloading the new file.
Also when the distributed cache on a disk fills up unused entries in it are
deleted.
--Bobby Evans
On 9/27/11 2:32 PM, Meng Mao meng...@gmail.com wrote:
So the proper description of how DistributedCache normally works is:
1. have
[mailto:less...@q.com]
Sent: Tuesday, September 27, 2011 4:48 PM
To: common-user@hadoop.apache.org
Subject: Temporary Files to be sent to DistributedCache
I have a need to write information retrieved from a database to a series of
files that need to be made available to my mappers. Because each mapper
needs
So, I thought about that, and I'd considered writing to the HDFS and then
copying the file into the DistributedCache so each mapper/reducer doesn't
have to reach into the HDFS for these files. Is that the best way to
handle this?
On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000
and then
copying the file into the DistributedCache so each mapper/reducer doesn't
have to reach into the HDFS for these files. Is that the best way to
handle this?
On Tue, Sep 27, 2011 at 4:01 PM, GOEKE, MATTHEW (AG/1000)
matthew.go...@monsanto.com wrote:
The simplest route I can think of is to ingest
Let's frame the issue in another way. I'll describe a sequence of Hadoop
operations that I think should work, and then I'll get into what we did and
how it failed.
Normal sequence:
1. have files to be cached in HDFS
2. Run Job A, which specifies those files to be put into DistributedCache
space
3
We use the DistributedCache class to distribute a few lookup files for our
jobs. We have been aggressively deleting failed task attempts' leftover data
, and our script accidentally deleted the path to our distributed cache
files.
Our task attempt leftover data was here [per node]:
/hadoop/hadoop
it was before.
--Bobby Evans
On 9/23/11 1:57 AM, Meng Mao meng...@gmail.com wrote:
We use the DistributedCache class to distribute a few lookup files for our
jobs. We have been aggressively deleting failed task attempts' leftover data
, and our script accidentally deleted the path to our distributed
Hmm, I must have really missed an important piece somewhere. This is from
the MapRed tutorial text:
DistributedCache is a facility provided by the Map/Reduce framework to
cache files (text, archives, jars and so on) needed by applications.
Applications specify the files to be cached via urls
Hello. I'm using DistributedCache at first time, and I have found that
it adds some prefixes to the files. For example, the original file was
test.txt, it became localhosttest.txt in the cache.
How to handle such things? Just see if cache file ends with original filename?
Armstrong john.armstr...@ccri.com wrote:
On Tue, 7 Jun 2011 09:41:21 -0300, Juan P. gordoslo...@gmail.com
wrote:
Not 100% clear on what you meant. You are saying I should put the file
into
my HDFS cluster or should I use DistributedCache? If you suggest the
latter,
could you address my original
On Tue, 7 Jun 2011 09:41:21 -0300, Juan P. gordoslo...@gmail.com
wrote:
Not 100% clear on what you meant. You are saying I should put the file
into
my HDFS cluster or should I use DistributedCache? If you suggest the
latter,
could you address my original question?
I mean that you can
why bother DistributedCache, the only reason
might be the shared directory is costly for network and usually has
storage limit.
That's exactly the problem the DistributedCache is designed for. It
guarantees that you only need to copy the file to any given local
filesystem once. Using the way
btw, just to let you know that I am running my job in a pseudo-distributed mode.
Thanks,
Neeral
From: neeral beladia neeral_bela...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Tue, May 31, 2011 10:00:00 PM
Subject: DistributedCache - getLocalCacheFiles
Hi,
I have a file on amazon aws under :
s3n://Access Key:Secret Key@Bucket Name/file.txt
I want this file to be accessible by the slave nodes via Distributed Cache.
I put the following after the job configuration statements in the Driver
program
:
DistributedCache.addCacheFile(new
I use DistributedCache to add two files to class path, exampe below code :
String jeJarPath = /group/aladdin/lib/je-4.1.7.jar;
DistributedCache.addFileToClassPath(new Path(jeJarPath), conf);
String tairJarPath = /group/aladdin/lib/tair-aladdin-2.3.1.jar
Lei Liu,
You have a cutpaste error the second addition should use 'tairJarPath' but
it is using the 'jeJarPath'
Hope this helps.
Alejandro
On Thu, Feb 17, 2011 at 11:50 AM, lei liu liulei...@gmail.com wrote:
I use DistributedCache to add two files to class path, exampe below code
]
Sent: Wednesday, February 16, 2011 11:36 AM
To: core-u...@hadoop.apache.org
Subject: How to use DistributedCache to load data generated from a previous
MapReduce job?
I have a MapReduce job #1, which processes input files, and produces key,
value pairs data. These key-value pairs data are stored
the DistributedCache have failed. Typically,
we add files to the cache prior to job startup, using
addCacheFile(URI, conf) and then get them on the other side, using
getLocalCacheFiles(conf). I believe the hadoop-core versions for these
are 0.20.2+228 and +320 respectively.
We then open the files and read
Hi Kim,
We didn't fix it in the end. I just ended up manually writing the
files to the cluster using the FileSystem class, and then reading them
back out again on the other side. Not terribly efficient as I guess
the point of DistributedCache is that the files get distributed to
every node
Dear All,
We recently upgraded from CDH3b1 to b2 and ever since, all our
mapreduce jobs that use the DistributedCache have failed. Typically,
we add files to the cache prior to job startup, using
addCacheFile(URI, conf) and then get them on the other side, using
getLocalCacheFiles(conf). I
Hi, all,
I'm having problems with my Mapper instances accessing the
DistributedCache. A bit of background:
I'm running on a single-node cluster, just trying to get my first
map/reduce job functioning. Both the job tracker and the primary
namenode exist on the same host. In the client, I am able
Hi,
As a newbie to hadoop, I am not able to figure out how to use
DistributedCache class. Can someone give me a small code which
distributes file to the cluster and the show how to open and use the
file in the map or reduce task.
Thanks,
Udaya
);
DistributedCache.addCacheFile(new URI(/path/to/file2.txt), conf);
DistributedCache.addCacheFile(new URI(/path/to/file3.txt), conf);
...
}
}
Nick Jones
Udaya Lakshmi wrote:
Hi,
As a newbie to hadoop, I am not able to figure out how to use
DistributedCache class. Can someone give me a small code which
Hi Nick,
I am not able to start the following job. I have the file that has to be
passed to distributedcache in the local filesystem of the task tracker.
Can you tell me if I am missing something?
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.mapred
The files for the DC need to be on HDFS.
Nick Jones
Sent by radiation.
On Feb 3, 2010, at 12:32 PM, Udaya Lakshmi udaya...@gmail.com wrote:
Hi Nick,
I am not able to start the following job. I have the file that has
to be
passed to distributedcache in the local filesystem of the task
: DistributedCache purgeCache()
Thanks for your swift response.
But where can I find deletecache()?
Thanks.
-Original Message-
From: Amogh Vasekar [mailto:am...@yahoo-inc.com]
Sent: Thu 9/3/2009 2:44 PM
To: common-user@hadoop.apache.org
Subject: RE: DistributedCache purgeCache()
AFAIK
Thanks for your swift response.
But where can I find deletecache()?
Thanks.
-Original Message-
From: Amogh Vasekar [mailto:am...@yahoo-inc.com]
Sent: Thu 9/3/2009 2:44 PM
To: common-user@hadoop.apache.org
Subject: RE: DistributedCache purgeCache()
AFAIK, releaseCache only works
Good Day,
I have a question on the DistributedCache as follows.
I have used DistributedCache to move my executable(.exe) around the (onto the
local filesystems of) nodes in Hadoop and run the .exe (via addCacheArchive()
and getLocalCacheArchives()). But I discovered after my job, the .exe
Does it work if you use addArchiveToClassPath()?
Also, it may be more convenient to use GenericOptionsParser's -libjars option.
Tom
On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball aa...@cloudera.com wrote:
Hi all,
I'm stumped as to how to use the distributed cache's classpath feature. I
have
/aaronTest2.jar));
-Original Message-
From: Tom White [mailto:t...@cloudera.com]
Sent: Wednesday, April 08, 2009 9:36 AM
To: core-user@hadoop.apache.org
Subject: Re: Example of deploying jars through DistributedCache?
Does it work if you use addArchiveToClassPath()?
Also, it may be more
-
From: Tom White [mailto:t...@cloudera.com]
Sent: Wednesday, April 08, 2009 9:36 AM
To: core-user@hadoop.apache.org
Subject: Re: Example of deploying jars through DistributedCache?
Does it work if you use addArchiveToClassPath()?
Also, it may be more convenient to use
Hi all,
I'm stumped as to how to use the distributed cache's classpath feature. I
have a library of Java classes I'd like to distribute to jobs and use in my
mapper; I figured the DCache's addFileToClassPath() method was the correct
means, given the example at
DistributedCache in Hive programs?
Hi list,
In the past time, I used to store auxiliaries by DistributedCache in
my hadoop programs, and read them locally when mappers configuring.
I've found an add [FILE] value [value]* option in Hive cli mode for
sending files to HDFS . How can I use
@hadoop.apache.org
Subject: How can I use DistributedCache in Hive programs?
Hi list,
In the past time, I used to store auxiliaries by DistributedCache in my
hadoop programs, and read them locally when mappers configuring.
I've found an add [FILE] value [value]* option in Hive cli mode for sending
files
a MapReader based on a file in the
DistributedCache.
Thanks.
--sean
Sean Shanny
ssha...@tripadvisor.com
On Dec 28, 2008, at 10:59 PM, Amareshwari Sriramadasu wrote:
Sean Shanny wrote:
To all,
Version: hadoop-0.17.2.1-core.jar
I have created a MapFile.
What I don't seem to be able to do
Sean Shanny wrote:
To all,
Version: hadoop-0.17.2.1-core.jar
I have created a MapFile.
What I don't seem to be able to do is correctly place the MapFile in
the DistributedCache and the make use of it in a map method.
I need the following info please:
1.How and where to place
/url/data
$ bin/hadoop fs -copyFromLocal /tmp/ur/index /2008-12-19/url/index
and placed them in the DistributedCache using the following calls in
the JobConf class:
DistributedCache.addCacheFile(new URI(/2008-12-19/url/data), conf);
DistributedCache.addCacheFile(new URI(/2008-12-19/url/index
To all,
Version: hadoop-0.17.2.1-core.jar
I have created a MapFile.
What I don't seem to be able to do is correctly place the MapFile in
the DistributedCache and the make use of it in a map method.
I need the following info please:
1. How and where to place the MapFile directory so
.
I put the files into the HDFS using the following commands:
$ bin/hadoop fs -copyFromLocal /tmp/ur/data/2008-12-19/url/data
$ bin/hadoop fs -copyFromLocal /tmp/ur/index /2008-12-19/url/index
and placed them in the DistributedCache using the following calls in
the JobConf class
in the DistributedCache using the following calls in
the JobConf class:
DistributedCache.addCacheFile(new URI(/2008-12-19/url/data), conf);
DistributedCache.addCacheFile(new URI(/2008-12-19/url/index), conf);
What I cannot figure out how to do is actually access the MapFile now
within my Map
I have been having problems with changes to DistributedCache files on
HDFS not being reflected on subsequently run jobs. I can change the
filename to work around this, but I would prefer a way to invalidate
the Cache when neccesary.
Is there a way to lower the timeout or flush the Cache?
Cheers
)
at java.util.zip.ZipFile.(ZipFile.java:131)
at org.apache.hadoop.fs.FileUtil.unZip(FileUtil.java:421)
at
org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.
java:338)
at
org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.
java:161
1 - 100 of 120 matches
Mail list logo