The distributed cache, does, I believe, cache files across jobs. The
TaskTracker keeps the files around as long as it's got space for them. It
also reference counts the files in use so they don't get deleted while a
task might still be using them.
DistributedCache.localizeCache() is where you wan
Is it true that distributed cache only work for a single job?
Is it possible for 2 different jobs to share the same local copy of the same
file from distributed cache?
Thanks,
Zheng
Stuart,
Can you disable the topology(rack-awareness) on hdfs?
That way, all 17 nodes should get the equal amount
(assuming you have enough tasks to run on all the nodes).
Koji
On 9/29/09 10:19 AM, "Stuart White" wrote:
> I have a hadoop cluster across 2 racks. One rack contains 12 nodes,
> t
All,
What I want to do is output from my reducer multiple files one for each key
value.
Can this still be done in the current API?
It seems that using MultipleTextOutputFormat requires one to use deprecated
parts of API.
It this correct?
I would like to use the class or its equivalent and stay
I have a hadoop cluster across 2 racks. One rack contains 12 nodes,
the other rack contains 5 nodes.
When I run a really large job, the disks on the 5 nodes fill up much
sooner than the disks on the 12 nodes, and I believe it's because the
12 nodes are sending their replicated blocks to the 5-nod
Hi, all
I am a newbie to hadoop and just begin to play it recent days. I am
trying to write a mapreduce program to parse a large dataset (about 20G)
to abstract object id and store to HBase table. The issue is there is
one keyword which associates with several million object id. Here is my
firs