[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797804#comment-13797804 ]
Steve Loughran commented on YARN-1139: -------------------------------------- # you don't need to convert any exceptions now, because the inner {{serviceStart()/serviceStop()}} methods throw exceptions. Just pass them up. The only reason the existing services didn't have their exception catch/wrap logic changed as part of YARN-117 is that I didn't want to add extra changes # AbstractService catches a failure and relays to noteFailure(), which, for the first exception caught, gets saved away; {{getFailureCause()}} and {{getFailureState()}} returns that exception and the state when it happened. # when an exception is caught during state changes, it triggers a {{Service.stop()}} action -which is why it is required to be a best-effort operation & do its best even when trying to stop a partially inited or started service # it then calls {{ServiceStateException.convert(e);}} to convert the exception into a RuntimeException; if it is one it is left alone, otherwise it is surrounded by a ServiceStateException. # The composite service runs through its children starting each one in turn. The first one that fails by throwing a runtime exception will trigger the noteFailure operation on the parent, then the composite service's stop() operation -which then walks back through all inited services (but not the UNINITED ones -things failed when we tried that), stopping them in turn. What that means is that if a child service fails, the composite should pick that up and save it as its own failure cause. I've actually done a couple more child-holding services for my own work, which I'd happily push back into trunk/2.3 [https://github.com/hortonworks/hoya/tree/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service] * The [SequenceService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/SequenceService.java] runs its children in sequence, failing when one fails * The [CompoundService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/CompoundService.java] stops as soon as any one of its children fail, again propagating any faults up These both implement a [Parent interface| Parent.java] so that they can be treated uniformally -and allow other bits of the code to add children Alongside that: * [EventNotifyingService| https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/EventNotifyingService.java] : sleeps, notifies a callback, stops * [ForkedProcessService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/ForkedProcessService.java]: forks off a native process, stops when the process stops, kills the process when it itself is stopped, and forwards up exceptions on a process failure These let me build up more complex workflows like this one [to start accumulo|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/providers/accumulo/AccumuloProviderService.java#L331] -runs a sequence of "accumulo init" (if needed), followed by, in parallel, "accumulo start" and a delayed event callback. That callback will, if accumulo start hasn't failed in the meantime, trigger the request for containers for whatever other accumulo roles have been added. Anyway, the services will catch, record, wrap and relay exceptions, the parents just need to be able to handle the fact that it will be a RuntimeException that comes back -and there is no need to catch and wrap it again if you want to pass it upstream. > [Umbrella] Convert all RM components to Services > ------------------------------------------------ > > Key: YARN-1139 > URL: https://issues.apache.org/jira/browse/YARN-1139 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Affects Versions: 2.1.0-beta > Reporter: Karthik Kambatla > Assignee: Tsuyoshi OZAWA > > Some of the RM components - state store, scheduler etc. are not services. > Converting them to services goes well with the "Always On" and "Active" > service separation proposed on YARN-1098. > Given that some of them already have start(), stop() methods, it should not > be too hard to convert them to services. > That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)