RE: Jobtracker memory issues due to FileSystem$Cache

2013-04-17 Thread Marcin Mejran
is left cached perpetually. Setting, for example, keep.failed.task.files=true or keep.task.files.pattern= prevents CleanupQueue from getting called which seems to solve my issues. You get junk left in .staging but that can be dealt with. -Marcin From: Marcin Mejran [mailto:marcin.mej...@hooklogic.com

Jobtracker memory issues due to FileSystem$Cache

2013-04-16 Thread Marcin Mejran
We've recently run into jobtracker memory issues on our new hadoop cluster. A heap dump shows that there are thousands of copies of DistributedFileSystem kept in FileSystem$Cache, a bit over one for each job run on the cluster and their jobconf objects support this view. I believe these are crea

Re: Hadoop efficient resource isolation

2013-02-25 Thread Marcin Mejran
That won't stop a bad job (say a fork bomb or a massive memory leak in a streaming script) from taking out a node which is what I believe Dhanasekaran was asking about. He wants to physically isolate certain lobs to certain "non critical" nodes. I don't believe this is possible and data would be

RE: Namenode formatting problem

2013-02-19 Thread Marcin Mejran
The issue may be that the nodes are trying to use the ec2 public ip (which would be used for external access) to access each other which does not work (or doesn't work trivially). You need to use the private ips which are given by ifconfig. ec2 gives you static ips as long as you don't restart

RE: Alerting

2012-12-23 Thread Marcin Mejran
Yeah, oozie sounds like the best approach. I think "timeout" in Oozie refers to something different (stopping a coordinator if it hasn't started within X minutes) but the SLA mechanism should do what's asked for. -Marcin From: Ted Dunning [mailto:tdunn...@maprtech.com] Sent: Saturday, December