[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672084#comment-13672084 ]
Hadoop QA commented on YARN-117: -------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12585719/YARN-117-013.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1064//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1064//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1064//console This message is automatically generated. > Enhance YARN service model > -------------------------- > > Key: YARN-117 > URL: https://issues.apache.org/jira/browse/YARN-117 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 2.0.4-alpha > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: YARN-117-007.patch, YARN-117-008.patch, > YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, > YARN-117-012.patch, YARN-117-013.patch, YARN-117-2.patch, YARN-117-3.patch, > YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch > > > Having played the YARN service model, there are some issues > that I've identified based on past work and initial use. > This JIRA issue is an overall one to cover the issues, with solutions pushed > out to separate JIRAs. > h2. state model prevents stopped state being entered if you could not > successfully start the service. > In the current lifecycle you cannot stop a service unless it was successfully > started, but > * {{init()}} may acquire resources that need to be explicitly released > * if the {{start()}} operation fails partway through, the {{stop()}} > operation may be needed to release resources. > *Fix:* make {{stop()}} a valid state transition from all states and require > the implementations to be able to stop safely without requiring all fields to > be non null. > Before anyone points out that the {{stop()}} operations assume that all > fields are valid; and if called before a {{start()}} they will NPE; > MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix > for this. It is independent of the rest of the issues in this doc but it will > aid making {{stop()}} execute from all states other than "stopped". > MAPREDUCE-3502 is too big a patch and needs to be broken down for easier > review and take up; this can be done with issues linked to this one. > h2. AbstractService doesn't prevent duplicate state change requests. > The {{ensureState()}} checks to verify whether or not a state transition is > allowed from the current state are performed in the base {{AbstractService}} > class -yet subclasses tend to call this *after* their own {{init()}}, > {{start()}} & {{stop()}} operations. This means that these operations can be > performed out of order, and even if the outcome of the call is an exception, > all actions performed by the subclasses will have taken place. MAPREDUCE-3877 > demonstrates this. > This is a tricky one to address. In HADOOP-3128 I used a base class instead > of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods > {{final}}. These methods would do the checks, and then invoke protected inner > methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to > retrofit the same behaviour to everything that extends {{AbstractService}} > -something that must be done before the class is considered stable (because > once the lifecycle methods are declared final, all subclasses that are out of > the source tree will need fixing by the respective developers. > h2. AbstractService state change doesn't defend against race conditions. > There's no concurrency locks on the state transitions. Whatever fix for wrong > state calls is added should correct this to prevent re-entrancy, such as > {{stop()}} being called from two threads. > h2. Static methods to choreograph of lifecycle operations > Helper methods to move things through lifecycles. init->start is common, > stop-if-service!=null another. Some static methods can execute these, and > even call {{stop()}} if {{init()}} raises an exception. These could go into a > class {{ServiceOps}} in the same package. These can be used by those services > that wrap other services, and help manage more robust shutdowns. > h2. state transition failures are something that registered service listeners > may wish to be informed of. > When a state transition fails a {{RuntimeException}} can be thrown -and the > service listeners are not informed as the notification point isn't reached. > They may wish to know this, especially for management and diagnostics. > *Fix:* extend {{ServiceStateChangeListener}} with a callback such as > {{stateChangeFailed(Service service,Service.State targeted-state, > RuntimeException e)}} that is invoked from the (final) state change methods > in the {{AbstractService}} class (once they delegate to their inner > {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing > implementations of the interface. > h2. Service listener failures not handled > Is this an error an error or not? Log and ignore may not be what is desired. > *Proposed:* during {{stop()}} any exception by a listener is caught and > discarded, to increase the likelihood of a better shutdown, but do not add > try-catch clauses to the other state changes. > h2. Support static listeners for all AbstractServices > Add support to {{AbstractService}} that allow callers to register listeners > for all instances. The existing listener interface could be used. This allows > management tools to hook into the events. > The static listeners would be invoked for all state changes except creation > (base class shouldn't be handing out references to itself at this point). > These static events could all be async, pushed through a shared > {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the > listeners invoked. > h2. Add some example listeners for management/diagnostics > * event to commons log for humans. > * events for machines hooked up to the JSON logger. > * for testing: something that be told to fail. > h2. Services should support signal interruptibility > The services would benefit from a way of shutting them down on a kill signal; > this can be done via a runtime hook. It should not be automatic though, as > composite services will get into a very complex state during shutdown. Better > to provide a hook that lets you register/unregister services to terminate, > and have the relevant {{main()}} entry points tell their root services to > register themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira