is left cached perpetually.
Setting, for example, keep.failed.task.files=true or
keep.task.files.pattern= prevents CleanupQueue from getting called
which seems to solve my issues. You get junk left in .staging but that can be
dealt with.
-Marcin
From: Marcin Mejran [mailto:marcin.mej...@hooklogic.com
We've recently run into jobtracker memory issues on our new hadoop cluster. A
heap dump shows that there are thousands of copies of DistributedFileSystem
kept in FileSystem$Cache, a bit over one for each job run on the cluster and
their jobconf objects support this view. I believe these are crea
That won't stop a bad job (say a fork bomb or a massive memory leak in a
streaming script) from taking out a node which is what I believe Dhanasekaran
was asking about. He wants to physically isolate certain lobs to certain "non
critical" nodes. I don't believe this is possible and data would be
The issue may be that the nodes are trying to use the ec2 public ip (which
would be used for external access) to access each other which does not work (or
doesn't work trivially). You need to use the private ips which are given by
ifconfig.
ec2 gives you static ips as long as you don't restart
Yeah, oozie sounds like the best approach. I think "timeout" in Oozie refers to
something different (stopping a coordinator if it hasn't started within X
minutes) but the SLA mechanism should do what's asked for.
-Marcin
From: Ted Dunning [mailto:tdunn...@maprtech.com]
Sent: Saturday, December