Steve Loughran created YARN-3337:
------------------------------------

             Summary: Provide YARN chaos monkey
                 Key: YARN-3337
                 URL: https://issues.apache.org/jira/browse/YARN-3337
             Project: Hadoop YARN
          Issue Type: New Feature
          Components: test
    Affects Versions: 2.7.0
            Reporter: Steve Loughran


To test failure resilience today you either need custom scripts or implement 
Chaos Monkey-like logic in your application (SLIDER-202). 

Killing AMs and containers on a schedule & probability is the core activity 
here, one that could be handled by a CLI App/client lib that does this. 

# entry point to have a startup delay before acting
# frequency of chaos wakeup/polling
# probability to AM failure generation (0-100)
# probability of non-AM container kill
# future: other operations




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to