Re: distributed cache across jobs?

2009-09-29 Thread Philip Zeyliger
The distributed cache, does, I believe, cache files across jobs. The TaskTracker keeps the files around as long as it's got space for them. It also reference counts the files in use so they don't get deleted while a task might still be using them. DistributedCache.localizeCache() is where you wan

distributed cache across jobs?

2009-09-29 Thread Zheng Shao
Is it true that distributed cache only work for a single job? Is it possible for 2 different jobs to share the same local copy of the same file from distributed cache? Thanks, Zheng

Re: Can you disable the rule forcing replication to go outside rack?

2009-09-29 Thread Koji Noguchi
Stuart, Can you disable the topology(rack-awareness) on hdfs? That way, all 17 nodes should get the equal amount (assuming you have enough tasks to run on all the nodes). Koji On 9/29/09 10:19 AM, "Stuart White" wrote: > I have a hadoop cluster across 2 racks. One rack contains 12 nodes, > t

Does Using MultipleTextOutputFormat Require the Deprecated API?

2009-09-29 Thread Geoffry Roberts
All, What I want to do is output from my reducer multiple files one for each key value. Can this still be done in the current API? It seems that using MultipleTextOutputFormat requires one to use deprecated parts of API. It this correct? I would like to use the class or its equivalent and stay

Can you disable the rule forcing replication to go outside rack?

2009-09-29 Thread Stuart White
I have a hadoop cluster across 2 racks. One rack contains 12 nodes, the other rack contains 5 nodes. When I run a really large job, the disks on the 5 nodes fill up much sooner than the disks on the 12 nodes, and I believe it's because the 12 nodes are sending their replicated blocks to the 5-nod

how to handle large volume reduce input value in mapreduce program?

2009-09-29 Thread Yin_Hongbin
Hi, all I am a newbie to hadoop and just begin to play it recent days. I am trying to write a mapreduce program to parse a large dataset (about 20G) to abstract object id and store to HBase table. The issue is there is one keyword which associates with several million object id. Here is my firs