Varun Vasudev created YARN-4309:
-----------------------------------

             Summary: Add debug information to application logs when a 
container fails
                 Key: YARN-4309
                 URL: https://issues.apache.org/jira/browse/YARN-4309
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager
            Reporter: Varun Vasudev
            Assignee: Varun Vasudev


Sometimes when a container fails, it can be pretty hard to figure out why it 
failed.

My proposal is that if a container fails, we collect information about the 
container local dir and dump it into the container log dir. Ideally, I'd like 
to tar up the directory entirely, but I'm not sure of the security and space 
implications of such a approach. At the very least, we can list all the files 
in the container local dir, and dump the contents of launch_container.sh(into 
the container log dir).

When log aggregation occurs, all this information will automatically get 
collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to