[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904773#comment-15904773 ] Varun Saxena commented on YARN-6148: [~sunilg], sorry I had missed your comment. I will update a small doc over the weekend. The approach suggested above takes into account non exclusive label, move app, default label of queue, etc. I will detail it in the document to clarify what is meant for what. Yes, I agree we should brainstorm more and involve others to gain a consensus on the approach. Some other scenarios may also come out of it. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898807#comment-15898807 ] Sunil G commented on YARN-6148: --- I generally went through whole approach suggested here, and I still feel there are some more grey areas. Could [~varun_saxena] or [~naganarasimha...@apache.org] to put these in a small doc here. I feel this may be needed because there are small feature sets from node label / scheduler is playing impact on MR's blacklisting. Few of them are discussed here, but still its better to brainstorm. To summarize some features which has impact on blacklist are non-exclusive label, move app, default queue's label, preemption of non-exclusive label etc. In hindsight, I feel MR is going to get multiple data sets. Like per-label-node-count, node-report-on-label-change, app-queue-on-change, app-default-label-on-change etc in addition to existing MR blacklist. Its better to get a common consensus here on approach. I thought of looping [~jlowe] and [~leftnoteasy] as changes are proposed in MR and in YARN's interface end. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897404#comment-15897404 ] Naganarasimha G R commented on YARN-6148: - bq. When default label for an application changes which will typically happen when application moves across queues, we will consider only the blacklisting failures on target queue's default label to ignore blacklisting. The reason we need to consider {{"target queue's default label"}} for the *threshold* is because we have *threshold* to ensure that we do not hang the app by blacklisting all the nodes for this app, so once we have moved to the new queue then if the container fails then the new request gets submitted to the target queue's default label . so we need to consider the threshold for the target queue's label only and we can ignore blacklist for the previous label, Thoughts? > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897155#comment-15897155 ] Varun Saxena commented on YARN-6148: Me and Naga just had an offline discussion with [~sunilg]. Will update the summary. # We will report a default applicable label for application (if none is specified) which can either be the label in Application Submission context or queue default label, in both Register AM response and in Allocate Response. Latter because application can move across queues. # When the label for a node changes, we will also report this node as part of updated nodes report in Allocate Response. # We would also update Container Info in allocated containers list to include information such as label info including whether its a non exclusive label and the type of container i.e. guaranteed or opportunistic. # At a high level, MR AM will maintain a list of node to label information so as to ascertain which label should the node blacklist be accounted for. Implementation wise, this can probably be done by updating the assigned requests. # When default label for an application changes which will typically happen when application moves across queues, we will consider only the blacklisting failures on target queue's default label to ignore blacklisting. If application moves back and forth across queues, say y => x => y, we can reconstruct the applicable node count for blacklisting as we have both the blacklisted nodes and their mapping to a label. Thoughts? > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896978#comment-15896978 ] Varun Saxena commented on YARN-6148: Offline. we (me, [~Naganarasimha] and [~bibinchundatt]) had discussed some more scenarios which need to be handled. So following is the plan regarding what will be done. # We will send a label to node count map containing label to active NM count mapping for requested labels. # If labels are disabled, we will report cluster count as earlier and not send anything in label to active NM count mapping. AM can consider only the cluster node count reported earlier in such cases. This will also help us in handling the case for rolling downgrades. # Non Exclusive labels will not be reported in the label to node count unless explicitly requested by AM. # Mapreduce AM would consider both map and reduce label expression separately, if they are different, while deciding on ignoring AM blacklisting. We will check label to node count mapping to determine the applicable node counts for both map and reduce. # Queue can have a default label too. This needs to be reported to AM as well. Also needs to be reported on move. Can be handled in YARN-6209. Active NM count may depend on it and AM currently does not know anything about default label of queue. # Container info sent while reporting allocated containers would also contain label on which container was allocated on. Based on this info we will determine if the node assigned is a default label of queue, map/reduce partition or non exclusive label. We will not count non exclusive label towards ignoring blacklisting. Thoughts? > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867298#comment-15867298 ] Sunil G commented on YARN-6148: --- +1 for the current approach. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867294#comment-15867294 ] Varun Saxena commented on YARN-6148: Yes, that's the plan. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867139#comment-15867139 ] Rohith Sharma K S commented on YARN-6148: - Basically in this JIRA lets do for YARN related changes such as adding new API of node count per partition. AM need to use this API in case of partitioned cluster. MR related changes need to be handled separately in MapReduce project. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866481#comment-15866481 ] Varun Saxena commented on YARN-6148: bq. IIUC we don't track label of nodes in AM side of MR We get map and reduce labels from MR configurations mapreduce.map.node-label-expression and mapreduce.reduce.node-label-expression respectively if this is what you were asking. For using the count per label, a sibling Mapreduce JIRA will have to be opened. Will do so after making some progress on this JIRA. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866479#comment-15866479 ] Wangda Tan commented on YARN-6148: -- [~varun_saxena], I think we should add the new field to branch-2 if required. Since it is a compatible change, I think we're good now. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866475#comment-15866475 ] Varun Saxena commented on YARN-6148: [~leftnoteasy], only concern was whether we add a new field in AllocateResponse for branch-2 or not. In principle we are fine with sending count per partition. But as we specify only one partition per job, the patch sufficed our use case. So we thought make a simple fix in as of now and get this in quickly as other proposed fix may take some time. However, as we have nothing against sending node count per label and most of the guys here are ok with adding a new field so lets go with that. I will make the necessary change and update. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866440#comment-15866440 ] Wangda Tan commented on YARN-6148: -- [~Naganarasimha], [~varun_saxena], [~rohithsharma], Discussed with [~sunilg] about this, I think it might be better to add a new field to AllocateResponse which return per-partition #node, which will include all partitions with #nodes > 0. Please let me know if you have any concerns of this. (Didn't read all comments in the JIRA, apologize if it is already discussed). > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866432#comment-15866432 ] Bibin A Chundatt commented on YARN-6148: [~sunilg]/[~varun_saxena] {quote} we must send per label node count to MR (only for request partitions). <, > It can be a new parameter in the existing protocol. Also MR code to be changed as per this. {quote} IIUC we don't track label of nodes in AM side of MR so even if we send node count per label will be able to disable blacklisting for partitioned nodes.?? > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865986#comment-15865986 ] Varun Saxena commented on YARN-6148: I fully understand the use case you are pointing out. Yes, the flaws with respect to headroom won't be as problematic as it would be with blacklisting in the scenario pointed above. If extra headroom is reported, it will just lead to more reducers being ramped up(as AM will think there are enough resources for map tasks) which may impact assigning of resources to maps. But post MAPREDUCE-6302 this wont lead to forever hanging of jobs. If we send headroom for each label separately we may be able to ramp up/down reducers more effectively. However, headroom is used at multiple places and there would be other areas to look at. In principle I am fine with sending node count per label. This isn't necessarily a use case for us as of now but in general there are use cases as pointed out in discussion above as well, so we can handle it this way if there is a general agreement with adding a new field in Allocate Response. I will wait for feedback from Wangda on this and then we can proceed. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865181#comment-15865181 ] Rohith Sharma K S commented on YARN-6148: - IIUC, headroom is relax feature and blacklisting is strict feature which are totally in opposite direction. If more head room is sent to AM, then it is fine because there is no impact for AM's. Basically AMs can send more resource request to YARN. But in case of blacklisting, node count affects the AM directly and leads to hang forever if node count is not proper. It is bigger issue than headroom. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865157#comment-15865157 ] Varun Saxena commented on YARN-6148: Agree... As I said above, I am not against whatever Sunil suggested. Maybe make the fix I have done here for branch-2 (i.e. ,make it consistent with the way headroom is currently sent) and send node count per label for trunk. Also as pointed out earlier, we need to make this change for headroom too then. Some flaws can be pointed out in the way we send headroom calculation as of today as well. However, if we do make this change, what do we do with the old headroom and node count fields. Keep them for backward compatibility? Or deprecate them or remove them? But latter will be disruptive for current AMs'. So some further discussion will be required for this proposed change. And I think we can target that for trunk only. Thoughts? By the way, for 3.0.0 can we break compatibility across alpha releases? > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865050#comment-15865050 ] Naganarasimha G R commented on YARN-6148: - Thanks [~rohithsharma] for your thoughts, Agree that it will solve the issue completely but we were targeting this modification for 2.8, hence we kept it simple. we are not against fixing in that way but only thing is where should the modifications land to ? I think similar issue exists with Headroom too. In case of slow start there are probabilities that headroom gets violated if more headroom is sent. Also would like to know [~wangda]'s opinion on this ! > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865008#comment-15865008 ] Rohith Sharma K S commented on YARN-6148: - Sorry to pitch in late, went through the discussion above. I think we should consider per partition node count strictly rather than assuming the application is running with one/two partition. In case of blacklisting scenario, application would hang if application is running with multiple partition. Lets say, application is running withIf all nodes of labelB blacklisted by application, then purging will not happen at any cost. This leads to application hang forever. YARN has to send per partition node count and also AM should handle per partition count blacklisting. Then the whole problem is solved. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864967#comment-15864967 ] Varun Saxena commented on YARN-6148: By the way I need to make a check for node labels enabled in the patch. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864966#comment-15864966 ] Varun Saxena commented on YARN-6148: Or probably this improvement can be done as part of YARN-4902 > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864277#comment-15864277 ] Varun Saxena commented on YARN-6148: [~sunilg] I do see your point here. But as Naga mentioned our intention was to make behavior similar to that of headroom. Even in the case of headroom, we consider AM partition. I am slightly worried about backward compatibility though. I doubt any AM would be expecting all the nodes even in a partitioned cluster though. Thoughts? Enhancing the API i.e. sending node count per label or say even headroom per label can be done as an improvement in another JIRA if there is a use case? > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863971#comment-15863971 ] Sunil G commented on YARN-6148: --- Yes. I also thought in the same line. However this may cause issues later in cases what we discussed earlier. And this is to be exposed to external applications as well. So I am not so sure with a linear approach here. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863942#comment-15863942 ] Naganarasimha G R commented on YARN-6148: - [~sunilg], Yes you are right, its similar to calculating the headroom (YARN-3215) when partitions are specified , but we avoided to modify the API based on [~wangda] 's [comment|https://issues.apache.org/jira/browse/YARN-3215?focusedCommentId=15074330=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15074330] . So offline me and [~varun_saxena] thought it would be better to have similar logic for total cluster size too. Also mostly use case what we have currently is that whole job is run with a single partition. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863347#comment-15863347 ] Sunil G commented on YARN-6148: --- [~varun_saxena] I have some doubts over here. {{allocateResponse.setNumClusterNodes(allocation.getApplicableNMCount());}} Still we sent one integer value of node count to AM. This count is better than earlier because its the node count for all used partitions for this app (instead of all nodes in cluster). But I am wary about few use cases. Lets consider below scenario for MR: - AM container is running in LabelA(it has 10 nodes) - Maps and reducers are running in LabelB (10 nodes) 4 tasks are failed in one node of LabelB. Now that node ll be blacklisted. As per your code, node count is 20 (Am container's LabelA, Map/Reducer containers LabelB). But for task failure, we need not have to worry about LabelA (AM container node label). We should consider only LabelB here. So I think we must send per label node count to MR (only for request partitions). {{<, >}} It can be a new parameter in the existing protocol. Also MR code to be changed as per this. This may be more complete. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861842#comment-15861842 ] Varun Saxena commented on YARN-6148: Checkstyle cant be fixed and test failure is unrelated. Passes locally > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-6148.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861614#comment-15861614 ] Hadoop QA commented on YARN-6148: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 45s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 138 unchanged - 1 fixed = 139 total (was 139) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 43s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 99m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6148 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12852072/YARN-6148.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 800bdb081862 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3a0a0a4 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14882/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14882/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results |
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861554#comment-15861554 ] Hadoop QA commented on YARN-6148: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 127 unchanged - 1 fixed = 128 total (was 128) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 1s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6148 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12852070/YARN-6148.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 639ed3288870 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3a0a0a4 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/14881/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14881/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14881/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855816#comment-15855816 ] Varun Saxena commented on YARN-6148: Yes the intention is to make behavior consistent. The nodes available to the application depends on the partitions requested by it (if they are accessible from the queue application has been submitted to) so number of nodes reported should also honor that. Moreover, NM node count is used by MapReduce AM for instance, to ignore blacklisting. It should use the node count based on requested partitions. Because, it is possible that blacklisting will never be ignored if total number of nodes in the requested partitions(in terms of %) are less than configured % to ignore blacklisting. We can still give a different config value per application but it will be better to use node count based on requested partitions to determine when blacklisting has to be ignored. > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854629#comment-15854629 ] Naganarasimha G R commented on YARN-6148: - Currently we are giving the Headroom based on the partitions requested by the application. Similarly total nodes in the cluster also needs to be given based on the the partitions being queried. This will ensure that the behavior is in sync with other features for the partition label. IIUC [~varun_saxena] is trying to mention this to avoid blacklisting all the partition nodes for the app if we consider the total cluster nodes, and generally partition nodes are subset of it. cc [~wangda] > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6148) NM node count reported to AM in Allocate Response should consider requested node label partitions.
[ https://issues.apache.org/jira/browse/YARN-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854222#comment-15854222 ] Bibin A Chundatt commented on YARN-6148: [~varun_saxena] could you please share background and details > NM node count reported to AM in Allocate Response should consider requested > node label partitions. > -- > > Key: YARN-6148 > URL: https://issues.apache.org/jira/browse/YARN-6148 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org