[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841672#comment-13841672 ] Vinod Kumar Vavilapalli commented on YARN-1139: --- bq. My primary rationale for this JIRA was to reduce the time to transition to Active in the long-term, or to support a warmer Standby mode. Converting all the components to services is not absolutely required for that. Then let's not do more than what is necessary. We can keep these JIRAs around and tackle them when it's needed, in the correct way. For now, are we good with the conversion of some always-on vs active services? bq. There could be a single SecretManagerService that handles the lifecycle of all YARN-related *SecretManagers. If converting them to services is really necessary, that is the same approach that I was thinking about. As I see on YARN-1172, we are adding a lot of boiler plate code for not a whole lot of benefits. [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841973#comment-13841973 ] Karthik Kambatla commented on YARN-1139: bq. For now, are we good with the conversion of some always-on vs active services? Let us do that, change these on a need-to basis. [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840850#comment-13840850 ] Vinod Kumar Vavilapalli commented on YARN-1139: --- Is it absolutely necessary to convert all of the components to be services. Can we focus on the absolute minimum needed? And what would that minimal set be? [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840932#comment-13840932 ] Karthik Kambatla commented on YARN-1139: I was under the impression that the long-term goal was to use services where applicable. My primary rationale for this JIRA was to reduce the time to transition to Active in the long-term, or to support a warmer Standby mode. Converting all the components to services is not absolutely required for that. It might be nice to make the scheduler a service, so individual schedulers can take up the task of handling start/stop etc. There could be a single SecretManagerService that handles the lifecycle of all YARN-related *SecretManagers. [~vinodkv], does that sound like a more reasonable approach? [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799921#comment-13799921 ] Tsuyoshi OZAWA commented on YARN-1139: -- Now a patch for YARN-1172 is also available. Could you also review it and catch it up, Steve? [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799553#comment-13799553 ] Tsuyoshi OZAWA commented on YARN-1139: -- [~ste...@apache.org], could you also check YARN-1305 and review a patch? The JIRA is subtask of this JIRA. [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797804#comment-13797804 ] Steve Loughran commented on YARN-1139: -- # you don't need to convert any exceptions now, because the inner {{serviceStart()/serviceStop()}} methods throw exceptions. Just pass them up. The only reason the existing services didn't have their exception catch/wrap logic changed as part of YARN-117 is that I didn't want to add extra changes # AbstractService catches a failure and relays to noteFailure(), which, for the first exception caught, gets saved away; {{getFailureCause()}} and {{getFailureState()}} returns that exception and the state when it happened. # when an exception is caught during state changes, it triggers a {{Service.stop()}} action -which is why it is required to be a best-effort operation do its best even when trying to stop a partially inited or started service # it then calls {{ServiceStateException.convert(e);}} to convert the exception into a RuntimeException; if it is one it is left alone, otherwise it is surrounded by a ServiceStateException. # The composite service runs through its children starting each one in turn. The first one that fails by throwing a runtime exception will trigger the noteFailure operation on the parent, then the composite service's stop() operation -which then walks back through all inited services (but not the UNINITED ones -things failed when we tried that), stopping them in turn. What that means is that if a child service fails, the composite should pick that up and save it as its own failure cause. I've actually done a couple more child-holding services for my own work, which I'd happily push back into trunk/2.3 [https://github.com/hortonworks/hoya/tree/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service] * The [SequenceService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/SequenceService.java] runs its children in sequence, failing when one fails * The [CompoundService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/CompoundService.java] stops as soon as any one of its children fail, again propagating any faults up These both implement a [Parent interface| Parent.java] so that they can be treated uniformally -and allow other bits of the code to add children Alongside that: * [EventNotifyingService| https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/EventNotifyingService.java] : sleeps, notifies a callback, stops * [ForkedProcessService|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/service/ForkedProcessService.java]: forks off a native process, stops when the process stops, kills the process when it itself is stopped, and forwards up exceptions on a process failure These let me build up more complex workflows like this one [to start accumulo|https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/main/java/org/apache/hadoop/hoya/providers/accumulo/AccumuloProviderService.java#L331] -runs a sequence of accumulo init (if needed), followed by, in parallel, accumulo start and a delayed event callback. That callback will, if accumulo start hasn't failed in the meantime, trigger the request for containers for whatever other accumulo roles have been added. Anyway, the services will catch, record, wrap and relay exceptions, the parents just need to be able to handle the fact that it will be a RuntimeException that comes back -and there is no need to catch and wrap it again if you want to pass it upstream. [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798690#comment-13798690 ] Tsuyoshi OZAWA commented on YARN-1139: -- Thank you for sharing the knowledge, [~ste...@apache.org]! I created patch on YARN-1172 based on the design you mentioned - overriding serviceInit()/serviceStart()/serviceStop(). I'll also take this approach on this JIRA, because of the reason you mentioned(e.g. easy error handling). [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797156#comment-13797156 ] Zhijie Shen commented on YARN-1139: --- When converting the components into services, one thing I think we may need to take care is that the exception will be isolated by the service model. For example, {code} try { this.scheduler.reinitialize(conf, this.rmContext); } catch (IOException ioe) { throw new RuntimeException(Failed to initialize scheduler, ioe); } {code} If the scheduler turns into a service, RM cannot catch the exception like that. Previously, we also met the problem of the composite service cannot directly receive the exception that is thrown by its child. [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services
[ https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797242#comment-13797242 ] Tsuyoshi OZAWA commented on YARN-1139: -- Thank you for the advice, [~zjshen]. I've checked the AbstractService code, and I've gotten we need to convert all exceptions into ServiceStateException - subclass of RuntimeException - as you described. I'll check and update a patch for YARN-1172 based on your advice. [Umbrella] Convert all RM components to Services Key: YARN-1139 URL: https://issues.apache.org/jira/browse/YARN-1139 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Some of the RM components - state store, scheduler etc. are not services. Converting them to services goes well with the Always On and Active service separation proposed on YARN-1098. Given that some of them already have start(), stop() methods, it should not be too hard to convert them to services. That would also be a cleaner way of addressing YARN-1125. -- This message was sent by Atlassian JIRA (v6.1#6144)