[jira] [Updated] (YARN-5982) Simplifying opportunistic container parameters and metrics
[ https://issues.apache.org/jira/browse/YARN-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5982: - Attachment: YARN-5982.002.patch Thanks for the review, [~asuresh]. I addressed your comments. I put back to the proto file the increment resource, but I did not move it from the FairScheduler configuration to the general YarnConfiguration. Let's do that in a separate JIRA if it is needed. I also fixed the checkstyle issues and the related test case that was failing. > Simplifying opportunistic container parameters and metrics > -- > > Key: YARN-5982 > URL: https://issues.apache.org/jira/browse/YARN-5982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5982.001.patch, YARN-5982.002.patch > > > This JIRA removes some of the parameters that are related to opportunistic > containers (e.g., min/max memory/cpu). Instead, we will be using the > parameters already used by guaranteed containers. > The goal is to reduce the number of parameters that need to be used by the > user. > We also fix a small issue related to the container metrics (opportunistic > memory reported in GB in Web UI, although it was in MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733301#comment-15733301 ] Konstantinos Karanasos commented on YARN-3409: -- {{PlacementStrategy}} sounds good to me. {{PlacementConstraints}} or something similar might be even more descriptive. +1 for using the same expression for defining both (anti-)affinity and label constraints. I was wondering whether we could even use a single type of constraint to express all these different constraint types. Let me think about it a bit more and I will let you know if I find a concrete solution. > Add constraint node labels > -- > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, > YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Constraints are orthogonal to partition, they’re describing attributes of > node’s hardware/software just for affinity. Some example of constraints: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5982) Simplifying opportunistic container parameters and metrics
[ https://issues.apache.org/jira/browse/YARN-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5982: - Target Version/s: 2.9.0, 3.0.0-alpha2 (was: 3.0.0-alpha2) > Simplifying opportunistic container parameters and metrics > -- > > Key: YARN-5982 > URL: https://issues.apache.org/jira/browse/YARN-5982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5982.001.patch > > > This JIRA removes some of the parameters that are related to opportunistic > containers (e.g., min/max memory/cpu). Instead, we will be using the > parameters already used by guaranteed containers. > The goal is to reduce the number of parameters that need to be used by the > user. > We also fix a small issue related to the container metrics (opportunistic > memory reported in GB in Web UI, although it was in MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5982) Simplifying opportunistic container parameters and metrics
[ https://issues.apache.org/jira/browse/YARN-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5982: - Fix Version/s: 3.0.0-alpha2 2.9.0 > Simplifying opportunistic container parameters and metrics > -- > > Key: YARN-5982 > URL: https://issues.apache.org/jira/browse/YARN-5982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5982.001.patch > > > This JIRA removes some of the parameters that are related to opportunistic > containers (e.g., min/max memory/cpu). Instead, we will be using the > parameters already used by guaranteed containers. > The goal is to reduce the number of parameters that need to be used by the > user. > We also fix a small issue related to the container metrics (opportunistic > memory reported in GB in Web UI, although it was in MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5982) Simplifying opportunistic container parameters and metrics
[ https://issues.apache.org/jira/browse/YARN-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5982: - Attachment: YARN-5982.001.patch Attaching patch. > Simplifying opportunistic container parameters and metrics > -- > > Key: YARN-5982 > URL: https://issues.apache.org/jira/browse/YARN-5982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-5982.001.patch > > > This JIRA removes some of the parameters that are related to opportunistic > containers (e.g., min/max memory/cpu). Instead, we will be using the > parameters already used by guaranteed containers. > The goal is to reduce the number of parameters that need to be used by the > user. > We also fix a small issue related to the container metrics (opportunistic > memory reported in GB in Web UI, although it was in MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5982) Simplifying opportunistic container parameters and metrics
[ https://issues.apache.org/jira/browse/YARN-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5982: - Summary: Simplifying opportunistic container parameters and metrics (was: Simplifying some opportunistic container parameters and metrics) > Simplifying opportunistic container parameters and metrics > -- > > Key: YARN-5982 > URL: https://issues.apache.org/jira/browse/YARN-5982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > This JIRA removes some of the parameters that are related to opportunistic > containers (e.g., min/max memory/cpu). Instead, we will be using the > parameters already used by guaranteed containers. > The goal is to reduce the number of parameters that need to be used by the > user. > We also fix a small issue related to the container metrics (opportunistic > memory reported in GB in Web UI, although it was in MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5982) Simplifying some opportunistic container parameters and metrics
Konstantinos Karanasos created YARN-5982: Summary: Simplifying some opportunistic container parameters and metrics Key: YARN-5982 URL: https://issues.apache.org/jira/browse/YARN-5982 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Assignee: Konstantinos Karanasos This JIRA removes some of the parameters that are related to opportunistic containers (e.g., min/max memory/cpu). Instead, we will be using the parameters already used by guaranteed containers. The goal is to reduce the number of parameters that need to be used by the user. We also fix a small issue related to the container metrics (opportunistic memory reported in GB in Web UI, although it was in MB). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730729#comment-15730729 ] Konstantinos Karanasos edited comment on YARN-3409 at 12/8/16 1:30 AM: --- Hey guys, apologies for the late reply. Here are my thoughts... bq. Add a new field for constraint expression, and also for affnity/anti-affinity (Per suggested by Kostas). This should have minimum impact to existing features. But after this, the "nodeLabelExpression becomes a little ambiguous, we may need to deprecate existing nodeLabelExpression. Agreed with that, with one clarification: do you mean having an extra affinity/anti-affinity constraint expression or use the same constraint expression? Probably we will need a separate one. bq. Extend existing NodeLabel object to support node constraint, we only need two additional field to support node constraint. 1) isNodeConstraint 2) Value (For example, we can have a constraint named jdk-verion, and value could be 6/7/8). I followed your discussion on this and on evaluating the constraints. I also had an offline discussion with [~chris.douglas]. I will suggest to have an even simpler approach than the one Wangda proposed. I believe we should have a first version with just boolean expressions, that is, simply request whether a label exists or not (possibly including negation of boolean expressions). In other words, I suggest to have neither a constraint type nor a value. Let's have a first simple version of (boolean) labels that works. In a future iteration of this, we can add attributes (i.e., with values) instead of labels. Having simple labels allows us to bypass the problem of constraint types. Like Wangda says, constraint types are not really solving the problem of comparing values, given that people will right their values in different formats. You can also give a look at YARN-4476 for an efficient boolean expression matcher. For example, using simple labels, one node can be annotated with label "Java6". Then a task that requires at least Java 5 can request for a node with "Java5 || Java6". I think that with our current use cases, this will be sufficient. Let me know what you think. was (Author: kkaranasos): Hey guys, apologies for the late reply. Here are my thoughts... bq. Add a new field for constraint expression, and also for affnity/anti-affinity (Per suggested by Kostas). This should have minimum impact to existing features. But after this, the "nodeLabelExpression becomes a little ambiguous, we may need to deprecate existing nodeLabelExpression. Agreed with that, with one clarification: do you mean having an extra affinity/anti-affinity constraint expression or use the same constraint expression? Probably we will need a separate one. bq. Extend existing NodeLabel object to support node constraint, we only need two additional field to support node constraint. 1) isNodeConstraint 2) Value (For example, we can have a constraint named jdk-verion, and value could be 6/7/8). I followed your discussion on this and on evaluating the constraints. I also had an offline discussion with [~chris.douglas]. I will suggest to have an even simpler approach than the one Wangda proposed. I believe we should have a first version with just boolean expressions, that is, simply request whether a label exists or not (possibly including negation of boolean expressions). In other words, I suggest to have neither a constraint type nor a value. Let's have a first simple version of (boolean) labels that works. In a future iteration of this, we can add attributes (i.e., with values) instead of labels. Having simple labels allows us to bypass the problem of constraint types. Like Wangda says, constraint types are not really solving the problem of comparing values, given that people will right their values in different formats. You can also give a look at YARN-44676 for an efficient boolean expression matcher. For example, using simple labels, one node can be annotated with label "Java6". Then a task that requires at least Java 5 can request for a node with "Java5 || Java6". I think that with our current use cases, this will be sufficient. Let me know what you think. > Add constraint node labels > -- > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, > YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.).
[jira] [Comment Edited] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730729#comment-15730729 ] Konstantinos Karanasos edited comment on YARN-3409 at 12/8/16 1:30 AM: --- Hey guys, apologies for the late reply. Here are my thoughts... bq. Add a new field for constraint expression, and also for affnity/anti-affinity (Per suggested by Kostas). This should have minimum impact to existing features. But after this, the "nodeLabelExpression becomes a little ambiguous, we may need to deprecate existing nodeLabelExpression. Agreed with that, with one clarification: do you mean having an extra affinity/anti-affinity constraint expression or use the same constraint expression? Probably we will need a separate one. bq. Extend existing NodeLabel object to support node constraint, we only need two additional field to support node constraint. 1) isNodeConstraint 2) Value (For example, we can have a constraint named jdk-verion, and value could be 6/7/8). I followed your discussion on this and on evaluating the constraints. I also had an offline discussion with [~chris.douglas]. I will suggest to have an even simpler approach than the one Wangda proposed. I believe we should have a first version with just boolean expressions, that is, simply request whether a label exists or not (possibly including negation of boolean expressions). In other words, I suggest to have neither a constraint type nor a value. Let's have a first simple version of (boolean) labels that works. In a future iteration of this, we can add attributes (i.e., with values) instead of labels. Having simple labels allows us to bypass the problem of constraint types. Like Wangda says, constraint types are not really solving the problem of comparing values, given that people will right their values in different formats. You can also give a look at YARN-44676 for an efficient boolean expression matcher. For example, using simple labels, one node can be annotated with label "Java6". Then a task that requires at least Java 5 can request for a node with "Java5 || Java6". I think that with our current use cases, this will be sufficient. Let me know what you think. was (Author: kkaranasos): Hey guys, apologies for the late reply. Here are my thoughts... bq. Add a new field for constraint expression, and also for affnity/anti-affinity (Per suggested by Kostas). This should have minimum impact to existing features. But after this, the "nodeLabelExpression becomes a little ambiguous, we may need to deprecate existing nodeLabelExpression. Agreed with that, with one clarification: do you mean having an extra affinity/anti-affinity constraint expression or use the same constraint expression? Probably we will need a separate one. bq. Extend existing NodeLabel object to support node constraint, we only need two additional field to support node constraint. 1) isNodeConstraint 2) Value (For example, we can have a constraint named jdk-verion, and value could be 6/7/8). I followed your discussion on this and on evaluating the constraints. I also had an offline discussion with [~chris.douglas]. I will suggest to have an even simpler approach than the one Wangda proposed. I believe we should have a first version with just boolean expressions, that is, simply request whether a label exists or not (possibly including negation of boolean expressions). In other words, I suggest to have neither a constraint type nor a value. Let's have a first simple version of (boolean) labels that works. In a future iteration of this, we can add attributes (i.e., with values) instead of labels. Having simple labels allows us to bypass the problem of constraint types. Like Wangda says, constraint types are not really solving the problem of comparing values, given that people will right their values in different formats. You can also give a look at YARN-4467 for an efficient boolean expression matcher. For example, using simple labels, one node can be annotated with label "Java6". Then a task that requires at least Java 5 can request for a node with "Java5 || Java6". I think that with our current use cases, this will be sufficient. Let me know what you think. > Add constraint node labels > -- > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, > YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.).
[jira] [Commented] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730729#comment-15730729 ] Konstantinos Karanasos commented on YARN-3409: -- Hey guys, apologies for the late reply. Here are my thoughts... bq. Add a new field for constraint expression, and also for affnity/anti-affinity (Per suggested by Kostas). This should have minimum impact to existing features. But after this, the "nodeLabelExpression becomes a little ambiguous, we may need to deprecate existing nodeLabelExpression. Agreed with that, with one clarification: do you mean having an extra affinity/anti-affinity constraint expression or use the same constraint expression? Probably we will need a separate one. bq. Extend existing NodeLabel object to support node constraint, we only need two additional field to support node constraint. 1) isNodeConstraint 2) Value (For example, we can have a constraint named jdk-verion, and value could be 6/7/8). I followed your discussion on this and on evaluating the constraints. I also had an offline discussion with [~chris.douglas]. I will suggest to have an even simpler approach than the one Wangda proposed. I believe we should have a first version with just boolean expressions, that is, simply request whether a label exists or not (possibly including negation of boolean expressions). In other words, I suggest to have neither a constraint type nor a value. Let's have a first simple version of (boolean) labels that works. In a future iteration of this, we can add attributes (i.e., with values) instead of labels. Having simple labels allows us to bypass the problem of constraint types. Like Wangda says, constraint types are not really solving the problem of comparing values, given that people will right their values in different formats. You can also give a look at YARN-4467 for an efficient boolean expression matcher. For example, using simple labels, one node can be annotated with label "Java6". Then a task that requires at least Java 5 can request for a node with "Java5 || Java6". I think that with our current use cases, this will be sufficient. Let me know what you think. > Add constraint node labels > -- > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, > YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Constraints are orthogonal to partition, they’re describing attributes of > node’s hardware/software just for affinity. Some example of constraints: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5646) Documentation for scheduling of OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/YARN-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730332#comment-15730332 ] Konstantinos Karanasos commented on YARN-5646: -- Thanks for the detailed feedback, [~templedf]! I also got some offline feedback from [~curino] yesterday. I will incorporate your changes and upload a new version. Regarding the min queue length and wait time, I will improve the description -- it is indeed not easy to understand what it does in its current form. These parameters are used to "not dequeue containers for load rebalancing purposes, if queue length is smaller than X tasks (or seconds)". So if you have shorter queues than that, you simply don't perform any action. As per Carlo's suggestion too, I will raise a JIRA to simplify some of the properties related to opportunistic containers, including the incremental one. For instance, I don't think there will be many cases where we will want the min/max opportunistic container size to be different from the guaranteed one. > Documentation for scheduling of OPPORTUNISTIC containers > > > Key: YARN-5646 > URL: https://issues.apache.org/jira/browse/YARN-5646 > Project: Hadoop YARN > Issue Type: Task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5646.001.patch > > > This is for adding documentation regarding the scheduling of OPPORTUNISTIC > containers. > It includes both the centralized (YARN-5220) and the distributed (YARN-2877) > scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Add constraint node labels
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707465#comment-15707465 ] Konstantinos Karanasos commented on YARN-3409: -- Hi guys, thank you for driving this, for the documentation, and for all the discussion. I think it is a super-useful feature to have. We also built a prototype over the summer with the end goal of supporting placement constraints (of the form affinity/anti-affinity/cardinality, similar in spirit to YARN-1042), and had to add some initial support for similar node labels along the way. Please find some comments below. # As was mentioned above in one of [~leftnoteasy]'s comments too, I also (strongly :)) suggest to use these ConstraintNodeLabels in the context of YARN-1042 for supporting (anti-)affinity constraints as well. I think it will greatly avoid duplicate effort and simplify the code. # On a similar note, can these ContraintNodeLabels be added/removed dynamically? For example, when a container starts its execution, it might contain some attributes (to be added -- I know such attributes cannot be specified at the moment). Those attributes will then be added to the node's labels, for the time the container is running. This can be useful for (anti-)affinity constraints. For instance, a task can add the label "HBase-master", and then another resource request can have a constraint of the form "don't put me at a node with an HBase-master label". What do you think? # A few people above mentioned that the naming of ContainerNodeLabels might not be ideal. I think they look more like attributes (as in key-value pairs), so we might consider using a name that denotes that (labels sound to me more like something that exists or not, but does not have a value). # I like that you don't take headroom into account when it comes to constraint label expressions. # +1 for Option 1. It might also be that the implementation of ConstraintNodeLabels will be easier at some places than that of NodeLabels/Partitions (e.g., given there is no need for supporting headroom). In terms of logistics, +1 for the branch too. I think we should make this an umbrella JIRA. # Can you please give an example of a cluster-level constraint? # bq. Constraints will be matched within the scope of a node partition. Making sure I understand.. Why do we need this constraint? I think they are orthogonal, right? Unless you mean that if the user specifies a constraint, it has to be taken into account too, which I understand. # Also adding one last thing we did in our prototype that I think is related to the locality (node/rack/any) discussion above and might be useful to consider. We assumed that the ConstraintNodeLabels are following the hierarchy of the cluster. That is, a rack was inheriting the ConstraintNodeLabels of all its nodes. A detail here is that we considered only existential ConstraintNodeLabels (as I mentioned above, without values), which avoids conflicts. In the more general case you are describing, it is not clear what happens if a node of the rack has Java 6 and the other Java 7 (not clear what should be the label of the rack). We will need to resolve conflicts in those case. However, I think that design is quite powerful. Think that eventually we can even define different logical classes of nodes and register them in the RM. For instance, group nodes that belong to the same upgrade domain (being upgraded at the same time -- we see this use case a lot in our clusters). > Add constraint node labels > -- > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, client >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: Constraint-Node-Labels-Requirements-Design-doc_v1.pdf > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Constraints are orthogonal to partition, they’re describing attributes of > node’s hardware/software just for affinity. Some example of constraints: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). --
[jira] [Commented] (YARN-5886) Dynamically prioritize execution of opportunistic containers (NM queue reordering)
[ https://issues.apache.org/jira/browse/YARN-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706181#comment-15706181 ] Konstantinos Karanasos commented on YARN-5886: -- Thank you for the feedback, [~cxcw]. bq. And also Microsoft also published a paper in ATC to talk about this feature. Here is some of my concerns. Which paper are you referring too? We had a paper in EuroSys 2016 ("Efficient Queue Management for Cluster Scheduling"), in which we are investigating different queue reordering strategies, along with other techniques for efficient queue management (queue sizing, placement, etc.). We called the system Yaq (we had both a centralized and a distributed scheduling version). Is that the paper you meant? Many of the techniques we are planning to add here will originate from Yaq. bq. 1. How the local NM CotnaienrScheduler coordinate with global scheduler. since global scheduler will try to keep fair and grarantee share across the applications(queue). So the way we are planning to do this is by letting the global scheduler send the tasks to the nodes. Then the reordering happens only locally at each node (only for the tasks that are queued at the moment). Note that reordering is done only for opportunistic containers (guaranteed are not allowed to be queued). This way we are not affecting the fairness guarantees of guaranteed containers. If we want to do fairness across opportunistic containers, we will need some additional techniques (we did this through a timeout in the EuroSys paper). Does this make sense or you had something else in mind? bq. 2. Nodemanger may not know(or estimate) the runtime for queued container. Falsely estimation(mistake a long-running as a short-running) may cause serious results.(inverse priority?) That is a good point. In the initial strategies, we are planning to not take into account the task duration (because it might not always be available or might be imprecise like you say). One way is to take into account the progress of the job, in terms of tasks completed. Later, if we introduce task durations, we can have even better strategies. But we will have to make sure we are robust in case of mis-estimations. > Dynamically prioritize execution of opportunistic containers (NM queue > reordering) > -- > > Key: YARN-5886 > URL: https://issues.apache.org/jira/browse/YARN-5886 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > Currently the {{ContainerScheduler}} in the NM picks the next queued > opportunistic container to be executed in a FIFO manner. That is, we first > execute containers that arrived first at the NM. > This JIRA proposes to add pluggable queue reordering strategies at the NM > that will dynamically determine which opportunistic container will be > executed next. > For example, we can choose to prioritize containers that belong to jobs which > are closer to completion, or containers that are short-running (if such > information is available). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5646) Documentation for scheduling of OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/YARN-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677692#comment-15677692 ] Konstantinos Karanasos commented on YARN-5646: -- I understand the concern about future work. I added it on purpose, so that people that read the documentation, can get an idea of open items (and even contribute to them). But if you all think it's not suitable, I can remove it. [~kasha], I also included the motivation for over-commitment through opportunistic containers, but made clear in the text that we do not yet support it. Once over-commitment is also available, we will update the document. > Documentation for scheduling of OPPORTUNISTIC containers > > > Key: YARN-5646 > URL: https://issues.apache.org/jira/browse/YARN-5646 > Project: Hadoop YARN > Issue Type: Task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5646.001.patch > > > This is for adding documentation regarding the scheduling of OPPORTUNISTIC > containers. > It includes both the centralized (YARN-5220) and the distributed (YARN-2877) > scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5646) Documentation for scheduling of OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/YARN-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675677#comment-15675677 ] Konstantinos Karanasos commented on YARN-5646: -- Please let's wait for a few days before committing this. > Documentation for scheduling of OPPORTUNISTIC containers > > > Key: YARN-5646 > URL: https://issues.apache.org/jira/browse/YARN-5646 > Project: Hadoop YARN > Issue Type: Task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5646.001.patch > > > This is for adding documentation regarding the scheduling of OPPORTUNISTIC > containers. > It includes both the centralized (YARN-5220) and the distributed (YARN-2877) > scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5646) Documentation for scheduling of OPPORTUNISTIC containers
[ https://issues.apache.org/jira/browse/YARN-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5646: - Attachment: YARN-5646.001.patch Attaching documentation. > Documentation for scheduling of OPPORTUNISTIC containers > > > Key: YARN-5646 > URL: https://issues.apache.org/jira/browse/YARN-5646 > Project: Hadoop YARN > Issue Type: Task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5646.001.patch > > > This is for adding documentation regarding the scheduling of OPPORTUNISTIC > containers. > It includes both the centralized (YARN-5220) and the distributed (YARN-2877) > scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1593) support out-of-proc AuxiliaryServices
[ https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668916#comment-15668916 ] Konstantinos Karanasos commented on YARN-1593: -- Thanks for starting this! As [~asuresh] and [~hrsharma] pointed out, this is very related to the container pooling we have been thinking of, so it's great to see there is more work to this direction. Here are some first thoughts: - There seems to be a common need to have containers not belonging to an AM. I like your analysis about the pros and cons of the three approaches. Ideally, and if possible, it would be good to agree on an approach that is not hybrid, i.e., to not have some containers going through option (1) and some others through option (3), but rather have a unified approach. In container pooling we have thought of having a component in the RM that manages how many "system" containers will running at each node, but we are willing to adopt another approach if it is more suitable. - Looking both at your document and the comments above, it seems that no approach can properly tackle the dependencies problem. Probably we should solve this in the scheduler: just like there will be support for (anti-)affinity constraints, we can add support for dependencies in the scheduler, e.g., to not schedule that container to a node before a shuffle container is running on that node. - Although I like your proposal of using a new ExecutionType for the system containers, I am not sure it is always desirable to couple system containers with the highest priority ExecutionType. For instance, there can be system containers that are not as important and can be preempted to make space if needed. Also, apart from the execution priority, I am not sure if the ExecutionType should determine whether a container should be automatically relaunched. If we end up having a component managing those containers, maybe it is its role to determine if they get restarted upon failure (irrespective of their ExecutionType). > support out-of-proc AuxiliaryServices > - > > Key: YARN-1593 > URL: https://issues.apache.org/jira/browse/YARN-1593 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, rolling upgrade >Reporter: Ming Ma >Assignee: Varun Vasudev > Attachments: SystemContainersandSystemServices.pdf > > > AuxiliaryServices such as ShuffleHandler currently run in the same process as > NM. There are some benefits to host them in dedicated processes. > 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the > ShuffleHandler restart. If ShuffleHandler runs as a separate process, > ShuffleHandler can continue to run during NM restart. NM can reconnect the > the running ShuffleHandler after restart. > 2. Resource management. It is possible another type of AuxiliaryServices will > be implemented. AuxiliaryServices are considered YARN application specific > and could consume lots of resources. Running AuxiliaryServices in separate > processes allow easier resource management. NM could potentially stop a > specific AuxiliaryServices process from running if it consumes resource way > above its allocation. > Here are some high level ideas: > 1. NM provides a hosting process for each AuxiliaryService. Existing > AuxiliaryService API doesn't change. > 2. The hosting process provides RPC server for AuxiliaryService proxy object > inside NM to connect to. > 3. When we rolling restart NM, the existing AuxiliaryService processes will > continue to run. NM could reconnect to the running AuxiliaryService processes > upon restart. > 4. Policy and resource management of AuxiliaryServices. So far we don't have > immediate need for this. AuxiliaryService could run inside a container and > its resource utilization could be taken into account by RM and RM could > consider a specific type of applications overutilize cluster resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5887) Policies for choosing which opportunistic containers to kill
Konstantinos Karanasos created YARN-5887: Summary: Policies for choosing which opportunistic containers to kill Key: YARN-5887 URL: https://issues.apache.org/jira/browse/YARN-5887 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos When a guaranteed container arrives at an NM but there are no resources to start its execution, opportunistic containers will be killed to make space for the guaranteed container. At the moment, we kill opportunistic containers in reverse order of arrival (first the most recently started ones). This is not always the right decision. For example, we might want to minimize the number of containers killed: to start a 6GB container, we could kill one 6GB opportunistic or three 2GB ones. Another example would be to refrain from killing containers of jobs that are very close to completion (we have to pass job completion information to the NM in that case). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5886) Dynamically prioritize execution of opportunistic containers (NM queue reordering)
[ https://issues.apache.org/jira/browse/YARN-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos reassigned YARN-5886: Assignee: Konstantinos Karanasos > Dynamically prioritize execution of opportunistic containers (NM queue > reordering) > -- > > Key: YARN-5886 > URL: https://issues.apache.org/jira/browse/YARN-5886 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > Currently the {{ContainerScheduler}} in the NM picks the next queued > opportunistic container to be executed in a FIFO manner. That is, we first > execute containers that arrived first at the NM. > This JIRA proposes to add pluggable queue reordering strategies at the NM > that will dynamically determine which opportunistic container will be > executed next. > For example, we can choose to prioritize containers that belong to jobs which > are closer to completion, or containers that are short-running (if such > information is available). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5886) Dynamically prioritize execution of opportunistic containers (NM queue reordering)
Konstantinos Karanasos created YARN-5886: Summary: Dynamically prioritize execution of opportunistic containers (NM queue reordering) Key: YARN-5886 URL: https://issues.apache.org/jira/browse/YARN-5886 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Currently the {{ContainerScheduler}} in the NM picks the next queued opportunistic container to be executed in a FIFO manner. That is, we first execute containers that arrived first at the NM. This JIRA proposes to add pluggable queue reordering strategies at the NM that will dynamically determine which opportunistic container will be executed next. For example, we can choose to prioritize containers that belong to jobs which are closer to completion, or containers that are short-running (if such information is available). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4972) Cleanup ContainerScheduler tests to remove long sleep times
[ https://issues.apache.org/jira/browse/YARN-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-4972: - Parent Issue: YARN-5541 (was: YARN-4742) > Cleanup ContainerScheduler tests to remove long sleep times > --- > > Key: YARN-4972 > URL: https://issues.apache.org/jira/browse/YARN-4972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4972) Cleanup ContainerScheduler tests to remove long sleep times
[ https://issues.apache.org/jira/browse/YARN-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-4972: - Summary: Cleanup ContainerScheduler tests to remove long sleep times (was: Cleanup QueuingContainerManager tests to remove long sleep times) > Cleanup ContainerScheduler tests to remove long sleep times > --- > > Key: YARN-4972 > URL: https://issues.apache.org/jira/browse/YARN-4972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2889) Limit in the number of opportunistic container requests per AM
[ https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2889: - Parent Issue: YARN-5542 (was: YARN-4742) > Limit in the number of opportunistic container requests per AM > -- > > Key: YARN-2889 > URL: https://issues.apache.org/jira/browse/YARN-2889 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > > We introduce a way to limit the number of queueable requests that each AM can > submit to the LocalRM. > This way we can restrict the number of queueable containers handed out by the > system, as well as throttle down misbehaving AMs (asking for too many > queueable containers). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2889) Limit in the number of opportunistic container requests per AM
[ https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2889: - Summary: Limit in the number of opportunistic container requests per AM (was: Limit in the number of queueable container requests per AM) > Limit in the number of opportunistic container requests per AM > -- > > Key: YARN-2889 > URL: https://issues.apache.org/jira/browse/YARN-2889 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > > We introduce a way to limit the number of queueable requests that each AM can > submit to the LocalRM. > This way we can restrict the number of queueable containers handed out by the > system, as well as throttle down misbehaving AMs (asking for too many > queueable containers). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5216) Expose configurable preemption policy for OPPORTUNISTIC containers running on the NM
[ https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5216: - Issue Type: Sub-task (was: Bug) Parent: YARN-5541 > Expose configurable preemption policy for OPPORTUNISTIC containers running on > the NM > > > Key: YARN-5216 > URL: https://issues.apache.org/jira/browse/YARN-5216 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-scheduling >Reporter: Arun Suresh >Assignee: Hitesh Sharma > Labels: oct16-hard > Attachments: YARN5216.001.patch, yarn5216.002.patch > > > Currently, the default action taken by the QueuingContainerManager, > introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM > with OPPORTUNISTIC containers using up resources, is to KILL the running > OPPORTUNISTIC containers. > This JIRA proposes to expose a configurable hook to allow the NM to take a > different action. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5415) Add support for NodeLocal and RackLocal OPPORTUNISTIC requests
[ https://issues.apache.org/jira/browse/YARN-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5415: - Parent Issue: YARN-5542 (was: YARN-4742) > Add support for NodeLocal and RackLocal OPPORTUNISTIC requests > -- > > Key: YARN-5415 > URL: https://issues.apache.org/jira/browse/YARN-5415 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Konstantinos Karanasos > > Currently, the Distirbuted Scheduling framework only support ResourceRequests > with *ANY* resource name and additionally requires that the resource requests > have relaxLocality turned on. > This jira seeks to add support for Node and Rack local allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2886) Estimating waiting time in NM container queues
[ https://issues.apache.org/jira/browse/YARN-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2886: - Parent Issue: YARN-5542 (was: YARN-4742) > Estimating waiting time in NM container queues > -- > > Key: YARN-2886 > URL: https://issues.apache.org/jira/browse/YARN-2886 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > This JIRA is about estimating the waiting time of each NM queue. > Having these estimates is crucial for the distributed scheduling of container > requests, as it allows the LocalRM to decide in which NMs to queue the > queuable container requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker
[ https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5414: - Parent Issue: YARN-5542 (was: YARN-4742) > Integrate NodeQueueLoadMonitor with ClusterNodeTracker > -- > > Key: YARN-5414 > URL: https://issues.apache.org/jira/browse/YARN-5414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: container-queuing, distributed-scheduling, scheduler >Reporter: Arun Suresh >Assignee: Arun Suresh > > The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides > convenience methods like sort and filter. > The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of > maintaining its own data-structure of node information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5688) Make allocation of opportunistic containers asynchronous
[ https://issues.apache.org/jira/browse/YARN-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5688: - Issue Type: Sub-task (was: Improvement) Parent: YARN-5542 > Make allocation of opportunistic containers asynchronous > > > Key: YARN-5688 > URL: https://issues.apache.org/jira/browse/YARN-5688 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > In the current implementation of the > {{OpportunisticContainerAllocatorAMService}}, we synchronously perform the > allocation of opportunistic containers. This results in "blocking" the > service at the RM when scheduling the opportunistic containers. > The {{OpportunisticContainerAllocator}} should instead asynchronously run as > a separate thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666401#comment-15666401 ] Konstantinos Karanasos commented on YARN-4597: -- Thanks for the updated patch, [~asuresh]. Looks good to me. Some final comments below... All are minor, so up to you to address (I would only "insist" about the last one). - In the {{ContainerScheduler}}: -- In the comment for the runningContainers, let's mention that these are the running containers, including the containers that are in the process of transitioning from the SCHEDULED to the RUNNING state. I think the rest are details that might be confusing. -- In the {{updateQueuingLimit}}, you can do an extra check of the form {{if (this.queuingLimit.getMaxQueueLength() < queuedOpportunisticContainers.size())}} to avoid calling the shedding if the queue is not long enough. This might often be the case if the NM has imposed a small queue size. -- I was thinking that, although less likely than before, the fields of the {{OpportunisticContainersStatus()}} can still be updated during the {{getOpportunisticContainersStatus()}}. To avoid synchronization, we could set the fields using an event, and then in the {{getOpportunisticContainersStatus()}} we would just return the object. But if you think it is too much, we can leave it as is. -- In the {{onContainerCompleted}}, a container can belong either to queued guaranteed, to queued opportunistic or to running. So, you could avoid doing the remove from all lists once found in one of them. - In the {{YarnConfiguration}}, let's include in a comment that the max queue length coming from the RM is the globally max queue length. - In the {{SchedulerNode}}, I still suggest to put the {{++numContainers}} and the {{--numContainers}} inside the if statements. If I remember well, these fields are used for the web UI, so there will be a disconnect between the resources used (referring only to guaranteed containers) and the number of containers (referring to both guaranteed and opportunistic at the moment). The stats for the opportunistic containers are carried by the opportunisticContainersStatus, so we are good with reporting them too. Again, all comments are minor. +1 for the patch and thanks for all the work! > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chris Douglas >Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch, YARN-4597.010.patch, YARN-4597.011.patch, > YARN-4597.012.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658402#comment-15658402 ] Konstantinos Karanasos commented on YARN-4597: -- Here are some comments on the {{ContainerScheduler}}: - {{queuedOpportunisticContainers}} will have concurrency issues. We are updating it when containers arrive but also in the {{shedQueuedOpportunisticContainers}}. - {{queuedGuaranteedContainers}} and {{queuedOpportunisticContainers}}: I think we should use queues. I don't think we retrieve the container by the key anywhere either ways. - {{oppContainersMarkedForKill}}: could be a Set, right? - {{scheduledToRunContainers}} are containers that are either already running or are going to run very soon (transitioning from SCHEDULED to RUNNING state). Name is a bit misleading, because it sounds like they are only the ones belonging to the second category. I would rather say {{runningContainers}} and specify in a comment that they might not be running at this very moment but will be running very soon. - In the {{onContainerCompleted()}}, the {{scheduledToRunContainers.remove(container.getContainerId())}} and the {{startPendingContainers()}} can go inside the if statement above. If the container was not running and no resources were freed up, we don't need to call the {{startPendingContainers()}}. - fields of the {{opportunisticContainersStatus}} are set in different places. Due to that, when we call {{getOpportunisticContainersStatus()}} we may see an inconsistent object. Let's set the fields only in the {{getOpportunisticContainersStatus()}}. - line 252, indeed we can now do extraOpportContainersToKill -> opportContainersToKill, as Karthik mentioned at a comment. - line 87: increase -> increases - {{shedQueuedOpportunisticContainers}}: -- {{numAllowed}} is the number of allowed containers in the queue. Instead, we are killing numAllowed containers. In other words, we should not kill numAllowed, but {{queuedOpportunisticContainers.size() - numAllowed}}. -- "Container Killed to make room for Guaranteed Container." -> "Container killed to meet NM queuing limits". Instead of kill, you can also say de-queued. > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chris Douglas >Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655975#comment-15655975 ] Konstantinos Karanasos commented on YARN-4597: -- Thanks for working on this, [~asuresh]! I am sending some first comments. I have not yet looked at the {{ContainerScheduler}} -- I will do that tomorrow. - The {{Container}} has two new methods ({{sendLaunchEvent}} and {{sendKillEvent}}), which are public and are not following the design of the rest of the code that keeps such methods private and calls them through transitions in the {{ContainerImpl}}. Let's try to use the existing design if possible. - In {{RMNodeImpl}}: -- Instead of using the {{launchedContainers}} for both the launched and the queued, we might want to split it in two: one for the launched and one for the queued containers. -- I think we should not add opportunistic containers to the {{launchContainers}}. If we do, they will be added to the {{newlyLaunchedContainers}}, then to the {{nodeUpdateQueue}}, and, if I am not wrong, they will be propagated to the schedulers for the guaranteed containers, which will create problems. I have to look at it a bit more, but my hunch is that we should avoid doing it. Even if it does not affect the resource accounting, I don't see any advantage to adding them. - In the {{OpportunisticContainerAllocatorAMService}} we are now calling the {{SchedulerNode::allocate}}, and then we do not update the used resources, but we do update some other counters, which leads to inconsistencies. For example, when releasing a container, I think at the moment we are not calling the release of the {{SchedulerNode}}, which means that the container count will become inconsistent. -- Instead, I suggest to add some counters for opportunistic containers at the {{SchedulerNode}}, both for the number of containers and for the resources used. In this case, we need to make sure that those resources are released too. - Maybe as part of a different JIRA, we should at some point extend the {{container.metrics}} in the {{ContainerImpl}} to keep track of the scheduled/queued containers. h6. Nits: - There seem to be two redundant parameters at {{YarnConfiguration}} at the moment: {{NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH}} and {{NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH}}. If I am not missing something, we should keep one of the two. - {{yarn-default.xml}}: numbed -> number (in a comment) - {{TestNodeManagerResync}}: I think it is better to use one of the existing methods for waiting to get to the RUNNING state. - In {{Container}}/{{ContainerImpl}} and all the associated classes, I would suggest to rename {{isMarkedToKill}} to {{isMarkedForKilling}}. I know it is minor, but it is more self-explanatory. I will send more comments once I check the {{ContainerScheduler}}. Also, let's stress-test the code in a cluster before committing to make sure everything is good. I can help with that. > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chris Douglas >Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652246#comment-15652246 ] Konstantinos Karanasos commented on YARN-4597: -- Hi [~jianhe]. Yes, I will check the patch today. > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chris Douglas >Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651784#comment-15651784 ] Konstantinos Karanasos commented on YARN-5823: -- Thanks for the review and the commit, [~asuresh]! > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Fix For: 3.0.0-alpha2 > > Attachments: YARN-5823.001.patch, YARN-5823.002.patch, > YARN-5823.003.patch, YARN-5823.004.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5833) Add validation to ensure default ports are unique in Configuration
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651781#comment-15651781 ] Konstantinos Karanasos commented on YARN-5833: -- Thanks [~asuresh], indeed those tests were unfortunately not kicked off by Jenkins... > Add validation to ensure default ports are unique in Configuration > -- > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5833.003.addendum-2.patch, > YARN-5833.003.addendum.patch, YARN-5833.003.patch, YARN-5883.001.patch, > YARN-5883.002.patch > > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5833) Add validation to ensure default ports are unique in Configuration
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5833: - Attachment: YARN-5833.003.addendum.patch Thanks for catching this, [~liuml07]. We had compiled it with Java 8. Attaching addendum patch that fixes the problem. > Add validation to ensure default ports are unique in Configuration > -- > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5833.003.addendum.patch, YARN-5833.003.patch, > YARN-5883.001.patch, YARN-5883.002.patch > > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5833) Add validation to ensure default ports are unique in Configuration
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649277#comment-15649277 ] Konstantinos Karanasos commented on YARN-5833: -- [~liuml07], I have checked it on trunk.. What error are you getting on branch-2? > Add validation to ensure default ports are unique in Configuration > -- > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5833.003.patch, YARN-5883.001.patch, > YARN-5883.002.patch > > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5833) Add validation to ensure default ports are unique in Configuration
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649096#comment-15649096 ] Konstantinos Karanasos commented on YARN-5833: -- Thanks for reviewing and committing the patch, [~subru]! > Add validation to ensure default ports are unique in Configuration > -- > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5833.003.patch, YARN-5883.001.patch, > YARN-5883.002.patch > > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5823: - Attachment: YARN-5823.004.patch Attaching the right patch. > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5823.001.patch, YARN-5823.002.patch, > YARN-5823.003.patch, YARN-5823.004.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5823: - Attachment: (was: YARN-5823.004.patch) > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5823.001.patch, YARN-5823.002.patch, > YARN-5823.003.patch, YARN-5823.004.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5823: - Attachment: YARN-5823.004.patch Rebasing against trunk and fixing the failing test cases. > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5823.001.patch, YARN-5823.002.patch, > YARN-5823.003.patch, YARN-5823.004.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5823: - Attachment: YARN-5823.003.patch Adding new version of the patch, in which I am calling {{pullNMTokens()}} only once, following [~asuresh]'s suggestion. I also included a new test in {{TestOpportunisticContainerAllocation}}, which would fail without the present patch. > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5823.001.patch, YARN-5823.002.patch, > YARN-5823.003.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5833) Change default port for AMRMProxy
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5833: - Attachment: YARN-5833.003.patch Attaching new version of the patch in which I fixed the checkstyle issue. Also, I ran the new test with the previous parameter of the AMRMProxy port, which was causing a collision, and got the following output: {noformat}java.lang.AssertionError: Parameters DEFAULT_AMRM_PROXY_PORT and DEFAULT_NM_COLLECTOR_SERVICE_PORT are using the same default value!{noformat} With the port change, the test passes successfully. > Change default port for AMRMProxy > - > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-5833.003.patch, YARN-5883.001.patch, > YARN-5883.002.patch > > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5833) Change default port for AMRMProxy
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5833: - Attachment: YARN-5883.002.patch Thanks for the feedback, [~subru]. I added a new test method in {{TestConfigurationFieldsBase}} that checks for collision of default values. Each subclass of {{TestConfigurationFieldsBase}} can specify a set of filter strings. Then the above method goes over each of these filters and makes sure that there is no collision between the values of the default parameters that contain this filter in their name. The {{TestYarnConfigurationFields}} initialize method adds the "_PORT" filter in the filter set to check for default port collision. At the moment the method that adds the filters in the {{TestYarnConfigurationFields}} is private. Let me know if you think it's better to move it to the base class and have the YARN specific one override it. > Change default port for AMRMProxy > - > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-5883.001.patch, YARN-5883.002.patch > > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5688) Make allocation of opportunistic containers asynchronous
[ https://issues.apache.org/jira/browse/YARN-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos reassigned YARN-5688: Assignee: Konstantinos Karanasos > Make allocation of opportunistic containers asynchronous > > > Key: YARN-5688 > URL: https://issues.apache.org/jira/browse/YARN-5688 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > In the current implementation of the > {{OpportunisticContainerAllocatorAMService}}, we synchronously perform the > allocation of opportunistic containers. This results in "blocking" the > service at the RM when scheduling the opportunistic containers. > The {{OpportunisticContainerAllocator}} should instead asynchronously run as > a separate thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637295#comment-15637295 ] Konstantinos Karanasos commented on YARN-5823: -- Thanks for checking the patch, [~asuresh]. I like what you propose, but the problem would be that the opportunistic allocation has to happen strictly before the guaranteed allocation for that to work. At the current patch, I am doing the guaranteed allocation first, since it is non-blocking. As you say, as part of YARN-5688 we should revisit the order of steps once both guaranteed and opportunistic allocations will be asynchronous. Makes sense? > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5823.001.patch, YARN-5823.002.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5833) Change default port for AMRMProxy
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5833: - Attachment: YARN-5883.001.patch Attaching patch. > Change default port for AMRMProxy > - > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-5883.001.patch > > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5823: - Attachment: YARN-5823.002.patch Attaching new patch -- fixing findbug and checkstyle issues. > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-5823.001.patch, YARN-5823.002.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5377) TestQueuingContainerManager.testKillMultipleOpportunisticContainers fails in trunk
[ https://issues.apache.org/jira/browse/YARN-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634701#comment-15634701 ] Konstantinos Karanasos edited comment on YARN-5377 at 11/3/16 11:59 PM: The problem with the test is that the container was moving fast from the DONE to the CONTAINER_CLEANUP_AFTER_KILL state, and the DONE state was not observed by the {{waitForNMContainerState}} method of the {{BaseContainerManagerTest}}. I added a new {{waitForNMContainerState}} method that takes as input a list of final container states, instead of a single one like before. When any of the states of this list is reached, the {{waitForNMContainerState}} exits successfully. was (Author: kkaranasos): The problem with the test is that the container was moving fast from the DONE to the CONTAINER_CLEANUP_AFTER_KILL, and the DONE state was not observed by the {{waitForNMContainerState}} method of the {{BaseContainerManagerTest}}. I added a new {{waitForNMContainerState}} method that takes as input a list of final container states, instead of a single one like before. When any of the states of this list is reached, the {{waitForNMContainerState}} exits successfully. > TestQueuingContainerManager.testKillMultipleOpportunisticContainers fails in > trunk > -- > > Key: YARN-5377 > URL: https://issues.apache.org/jira/browse/YARN-5377 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Konstantinos Karanasos > Attachments: YARN-5377.001.patch > > > Test case fails jenkin build > [link|https://builds.apache.org/job/PreCommit-YARN-Build/12228/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt] > {noformat} > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 134.586 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager > testKillMultipleOpportunisticContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager) > Time elapsed: 32.134 sec <<< FAILURE! > java.lang.AssertionError: ContainerState is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForNMContainerState(BaseContainerManagerTest.java:363) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager.testKillMultipleOpportunisticContainers(TestQueuingContainerManager.java:470) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5377) TestQueuingContainerManager.testKillMultipleOpportunisticContainers fails in trunk
[ https://issues.apache.org/jira/browse/YARN-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5377: - Attachment: YARN-5377.001.patch The problem with the test is that the container was moving fast from the DONE to the CONTAINER_CLEANUP_AFTER_KILL, and the DONE state was not observed by the {{waitForNMContainerState}} method of the {{BaseContainerManagerTest}}. I added a new {{waitForNMContainerState}} method that takes as input a list of final container states, instead of a single one like before. When any of the states of this list is reached, the {{waitForNMContainerState}} exits successfully. > TestQueuingContainerManager.testKillMultipleOpportunisticContainers fails in > trunk > -- > > Key: YARN-5377 > URL: https://issues.apache.org/jira/browse/YARN-5377 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Konstantinos Karanasos > Attachments: YARN-5377.001.patch > > > Test case fails jenkin build > [link|https://builds.apache.org/job/PreCommit-YARN-Build/12228/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt] > {noformat} > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 134.586 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager > testKillMultipleOpportunisticContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager) > Time elapsed: 32.134 sec <<< FAILURE! > java.lang.AssertionError: ContainerState is not correct (timedout) > expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForNMContainerState(BaseContainerManagerTest.java:363) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager.testKillMultipleOpportunisticContainers(TestQueuingContainerManager.java:470) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5833) Change default port for AMRMProxy
Konstantinos Karanasos created YARN-5833: Summary: Change default port for AMRMProxy Key: YARN-5833 URL: https://issues.apache.org/jira/browse/YARN-5833 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos The default port for the AMRMProxy coincides with the one for the Collector Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5833) Change default port for AMRMProxy
[ https://issues.apache.org/jira/browse/YARN-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos reassigned YARN-5833: Assignee: Konstantinos Karanasos > Change default port for AMRMProxy > - > > Key: YARN-5833 > URL: https://issues.apache.org/jira/browse/YARN-5833 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > The default port for the AMRMProxy coincides with the one for the Collector > Service (port 8048). Will use a different port for the AMRMProxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634527#comment-15634527 ] Konstantinos Karanasos commented on YARN-2995: -- Regarding the remaining issues: * The checkstyle issue is about a method that takes more than 7 parameters in one of the test classes. That was already the case for that method; I just added some more parameters. * The javadoc issue, as I already explained in a comment above, is related to Java 8 complaining about using '_' as identifiers. This is used in multiple places in the Web UI classes, and should be treated in a separate JIRA. * The unit test issue regarding {{TestQueuingContainerManager}} is unrelated to the present JIRA and is tracked in YARN-5377. * There is a build issue in sls when running the corresponding tests, but it might be a Jenkins issue, since it builds fine for me locally. Just kicked-off Jenkins again to see if the problem persists. > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos >Priority: Blocker > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch, YARN-2995.004.patch, all-nodes.png, all-nodes.png, > opp-container.png > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5823: - Attachment: YARN-5823.001.patch Attaching patch. > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-5823.001.patch > > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5823: - Description: At the moment, when an {{AllocateRequest}} contains only opportunistic {{ResourceRequests}}, the updated NMTokens are not properly added to the {{AllocateResponse}}. In such a case the AM does not get back the needed NMTokens that are required to start the opportunistic containers at the respective nodes. was: At the moment, when an {{AllocateRequest}} containers only opportunistic {{ResourceRequests}}, the updated NMTokens are not properly added to the {{AllocateResponse}}. In such a case the AM does not get back the needed NMTokens that are required to start the opportunistic containers at the respective nodes. > Update NMTokens in case of requests with only opportunistic containers > -- > > Key: YARN-5823 > URL: https://issues.apache.org/jira/browse/YARN-5823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > > At the moment, when an {{AllocateRequest}} contains only opportunistic > {{ResourceRequests}}, the updated NMTokens are not properly added to the > {{AllocateResponse}}. > In such a case the AM does not get back the needed NMTokens that are required > to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5823) Update NMTokens in case of requests with only opportunistic containers
Konstantinos Karanasos created YARN-5823: Summary: Update NMTokens in case of requests with only opportunistic containers Key: YARN-5823 URL: https://issues.apache.org/jira/browse/YARN-5823 Project: Hadoop YARN Issue Type: Bug Reporter: Konstantinos Karanasos Assignee: Konstantinos Karanasos At the moment, when an {{AllocateRequest}} containers only opportunistic {{ResourceRequests}}, the updated NMTokens are not properly added to the {{AllocateResponse}}. In such a case the AM does not get back the needed NMTokens that are required to start the opportunistic containers at the respective nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: YARN-2995.004.patch Adding new version of the patch. Rebased against trunk, fixed some more issues, and addressed the unit test failures. Note that there is a javadoc issue regarding using '_' as an identifier" (related to Java 8). I did not fix that, because it is actually used in multiple classes in the Web UI, and I followed the same style as in the rest of the code. I assume this should be fixed in all places at some point. > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch, YARN-2995.004.patch, all-nodes.png, all-nodes.png, > opp-container.png > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: all-nodes.png Attaching new screenshot after some final fixes. > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch, all-nodes.png, all-nodes.png, opp-container.png > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: opp-container.png all-nodes.png > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch, all-nodes.png, opp-container.png > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: (was: all-nodes.png) > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: (was: opp-container.png) > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: opp-container.png all-nodes.png Attaching two screenshots. The one is from the nodes page, showing an instance of the cluster with both guaranteed and opportunistic containers running, as well as some additional containers queued at the node. The second shows the details of a specific container, where the execution type is added ("OPPORTUNISTIC" in the specific case). > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch, all-nodes.png, opp-container.png > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: YARN-2995.003.patch Adding new version of the patch. Fixed some more problems, the checkstyle issues, and added the execution type information at the container's page. The unit test that was failing looks unrelated. > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch, > YARN-2995.003.patch > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of "aclSubmitApps" is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623286#comment-15623286 ] Konstantinos Karanasos commented on YARN-3645: -- Thanks for the new patch, [~gliptak]. I see there is one test failing. Can you please check if that is related? Otherwise, the patch looks good to me. bq. Elements "aclSubmitApps", "aclAdministerApps", "aclAdministerReservations", "aclListReservations", "aclSubmitReservations" do not call trim() in the current code. Are these also expected to call trim()? [~kasha] If those properties should also call trim(), then we can push trim() inside the readFieldText() method to simplify the code. Other than that, and after double-checking that the test is not related, let's commit the patch. > ResourceManager can't start success if attribute value of "aclSubmitApps" is > null in fair-scheduler.xml > > > Key: YARN-3645 > URL: https://issues.apache.org/jira/browse/YARN-3645 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.0.0-alpha2 >Reporter: zhoulinlin >Assignee: Gabor Liptak > Labels: oct16-easy > Attachments: YARN-3645.1.patch, YARN-3645.2.patch, YARN-3645.3.patch, > YARN-3645.4.patch, YARN-3645.5.patch, YARN-3645.patch > > > The "aclSubmitApps" is configured in fair-scheduler.xml like below: > > > > The resourcemanager log: > {noformat} > 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state INITED; cause: > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) > Caused by: java.io.IOException: Failed to initialize FairScheduler > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 7 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) > ... 9 more > 2015-05-14 12:59:48,623 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to standby state > 2015-05-14 12:59:48,623 INFO > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin > transitionToStandbyIn > 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When > stopping the service ResourceManager : java.lang.NullPointerException > java.lang.NullPointerException > at > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at >
[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of "aclSubmitApps" is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613452#comment-15613452 ] Konstantinos Karanasos commented on YARN-3645: -- I just checked your patch, [~gliptak]. The checks you added seem useful, let's try to close this. Can you please rebase the patch to current trunk? Also, some additional comments: # Since we are calling {{text.trim()}} in all cases, let's add it inside the {{readFieldText()}} method, i.e., you can do {{return firstChild.getData().trim()}}. # Instead of passing each time the tag name to the {{readFieldText()}}, you can instead use a single {{Element field}} paramater, and call the {{field.getTagName()}} to get the tag name inside the {{readFieldText()}}. > ResourceManager can't start success if attribute value of "aclSubmitApps" is > null in fair-scheduler.xml > > > Key: YARN-3645 > URL: https://issues.apache.org/jira/browse/YARN-3645 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.2 >Reporter: zhoulinlin >Assignee: Gabor Liptak > Labels: oct16-easy > Attachments: YARN-3645.1.patch, YARN-3645.2.patch, YARN-3645.3.patch, > YARN-3645.4.patch, YARN-3645.patch > > > The "aclSubmitApps" is configured in fair-scheduler.xml like below: > > > > The resourcemanager log: > {noformat} > 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state INITED; cause: > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) > Caused by: java.io.IOException: Failed to initialize FairScheduler > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 7 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) > ... 9 more > 2015-05-14 12:59:48,623 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to standby state > 2015-05-14 12:59:48,623 INFO > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin > transitionToStandbyIn > 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When > stopping the service ResourceManager : java.lang.NullPointerException > java.lang.NullPointerException > at > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at >
[jira] [Commented] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613433#comment-15613433 ] Konstantinos Karanasos commented on YARN-3679: -- [~xgong], let's try to close this patch... Do you think it is still applicable/useful? I guess it is not needed for the 3.0 version (since we will be using the new version of the Timeline Server), but is it needed for branch-2? > Add documentation for timeline server filter ordering > - > > Key: YARN-3679 > URL: https://issues.apache.org/jira/browse/YARN-3679 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Mit Desai >Assignee: Mit Desai > Labels: oct16-easy > Attachments: YARN-3679.patch > > > Currently the auth filter is before static user filter by default. After > YARN-3624, the filter order is no longer reversed. So the pseudo auth's > allowing anonymous config is useless with both filters loaded in the new > order, because static user will be created before presenting it to auth > filter. The user can remove static user filter from the config to get > anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613433#comment-15613433 ] Konstantinos Karanasos edited comment on YARN-3679 at 10/27/16 10:14 PM: - [~xgong], let's try to close this JIRA... Do you think it is still applicable/useful? I guess it is not needed for the 3.0 version (since we will be using the new version of the Timeline Server), but is it needed for branch-2? was (Author: kkaranasos): [~xgong], let's try to close this patch... Do you think it is still applicable/useful? I guess it is not needed for the 3.0 version (since we will be using the new version of the Timeline Server), but is it needed for branch-2? > Add documentation for timeline server filter ordering > - > > Key: YARN-3679 > URL: https://issues.apache.org/jira/browse/YARN-3679 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Mit Desai >Assignee: Mit Desai > Labels: oct16-easy > Attachments: YARN-3679.patch > > > Currently the auth filter is before static user filter by default. After > YARN-3624, the filter order is no longer reversed. So the pseudo auth's > allowing anonymous config is useless with both filters loaded in the new > order, because static user will be created before presenting it to auth > filter. The user can remove static user filter from the config to get > anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-3679: - Component/s: timelineserver > Add documentation for timeline server filter ordering > - > > Key: YARN-3679 > URL: https://issues.apache.org/jira/browse/YARN-3679 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Mit Desai >Assignee: Mit Desai > Labels: oct16-easy > Attachments: YARN-3679.patch > > > Currently the auth filter is before static user filter by default. After > YARN-3624, the filter order is no longer reversed. So the pseudo auth's > allowing anonymous config is useless with both filters loaded in the new > order, because static user will be created before presenting it to auth > filter. The user can remove static user filter from the config to get > anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-3679: - Labels: oct16-easy (was: ) > Add documentation for timeline server filter ordering > - > > Key: YARN-3679 > URL: https://issues.apache.org/jira/browse/YARN-3679 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Mit Desai >Assignee: Mit Desai > Labels: oct16-easy > Attachments: YARN-3679.patch > > > Currently the auth filter is before static user filter by default. After > YARN-3624, the filter order is no longer reversed. So the pseudo auth's > allowing anonymous config is useless with both filters loaded in the new > order, because static user will be created before presenting it to auth > filter. The user can remove static user filter from the config to get > anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3645) ResourceManager can't start success if attribute value of "aclSubmitApps" is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-3645: - Labels: oct16-easy (was: ) > ResourceManager can't start success if attribute value of "aclSubmitApps" is > null in fair-scheduler.xml > > > Key: YARN-3645 > URL: https://issues.apache.org/jira/browse/YARN-3645 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.2 >Reporter: zhoulinlin >Assignee: Gabor Liptak > Labels: oct16-easy > Attachments: YARN-3645.1.patch, YARN-3645.2.patch, YARN-3645.3.patch, > YARN-3645.4.patch, YARN-3645.patch > > > The "aclSubmitApps" is configured in fair-scheduler.xml like below: > > > > The resourcemanager log: > 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state INITED; cause: > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) > Caused by: java.io.IOException: Failed to initialize FairScheduler > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 7 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) > ... 9 more > 2015-05-14 12:59:48,623 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning > to standby state > 2015-05-14 12:59:48,623 INFO > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin > transitionToStandbyIn > 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When > stopping the service ResourceManager : java.lang.NullPointerException > java.lang.NullPointerException > at > com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) > 2015-05-14 12:59:48,623 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed > to initialize FairScheduler > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) >
[jira] [Updated] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines
[ https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2965: - Labels: oct16-hard (was: ) > Enhance Node Managers to monitor and report the resource usage on machines > -- > > Key: YARN-2965 > URL: https://issues.apache.org/jira/browse/YARN-2965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Robert Grandl >Assignee: Inigo Goiri > Labels: oct16-hard > Attachments: YARN-2965.000.patch, YARN-2965.001.patch, > YARN-2965.002.patch, ddoc_RT.docx > > > This JIRA is about augmenting Node Managers to monitor the resource usage on > the machine, aggregates these reports and exposes them to the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1743) Statically generate event diagrams across components
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-1743: - Description: We propose to statically generate the event diagrams across components. This is similar to the generation of diagrams with state transitions within a component that we already do today. The goal is to be able to visualize the interactions through events across different components. was: Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. > Statically generate event diagrams across components > > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation, oct16-hard > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743-3.patch, YARN-1743.patch > > > We propose to statically generate the event diagrams across components. > This is similar to the generation of diagrams with state transitions within a > component that we already do today. > The goal is to be able to visualize the interactions through events across > different components. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1743) Statically generate event diagrams across components
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-1743: - Summary: Statically generate event diagrams across components (was: Decorate event transitions and the event-types with their behaviour) > Statically generate event diagrams across components > > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation, oct16-hard > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743-3.patch, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612858#comment-15612858 ] Konstantinos Karanasos commented on YARN-1743: -- We had an offline discussion with [~chris.douglas] and [~vinodkv]. We agreed that the current status of this JIRA is not very useful in the sense that we need to manually annotate all event type (currently only the {{ApplicationEventType}} is annotated in the patch). Then, we will have to manually maintain those annotations with the risk that they will become inconsistent very soon. I am cancelling the current patch and will repurpose the JIRA to statically generate a graph of the transitions for each event type, similar to the way we generate the graph for the state transitions. > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation, oct16-hard > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743-3.patch, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-1743: - Issue Type: New Feature (was: Bug) > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation, oct16-hard > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743-3.patch, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-1743: - Labels: documentation oct16-hard (was: documentation) > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation, oct16-hard > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743-3.patch, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2618) Avoid over-allocation of disk resources
[ https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2618: - Labels: BB2015-05-TBR oct16-hard (was: BB2015-05-TBR) > Avoid over-allocation of disk resources > --- > > Key: YARN-2618 > URL: https://issues.apache.org/jira/browse/YARN-2618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wei Yan >Assignee: Wei Yan > Labels: BB2015-05-TBR, oct16-hard > Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, > YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch > > > Subtask of YARN-2139. > This should include > - Add API support for introducing disk I/O as the 3rd type resource. > - NM should report this information to the RM > - RM should consider this to avoid over-allocation -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2618) Avoid over-allocation of disk resources
[ https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2618: - Component/s: resourcemanager > Avoid over-allocation of disk resources > --- > > Key: YARN-2618 > URL: https://issues.apache.org/jira/browse/YARN-2618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wei Yan >Assignee: Wei Yan > Labels: BB2015-05-TBR > Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, > YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch > > > Subtask of YARN-2139. > This should include > - Add API support for introducing disk I/O as the 3rd type resource. > - NM should report this information to the RM > - RM should consider this to avoid over-allocation -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3518) default rm/am expire interval should not be smaller than default resourcemanager connect wait time
[ https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-3518: - Summary: default rm/am expire interval should not be smaller than default resourcemanager connect wait time (was: default rm/am expire interval should not less than default resourcemanager connect wait time) > default rm/am expire interval should not be smaller than default > resourcemanager connect wait time > -- > > Key: YARN-3518 > URL: https://issues.apache.org/jira/browse/YARN-3518 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: sandflee >Assignee: sandflee > Labels: oct16-easy > Attachments: YARN-3518.001.patch, YARN-3518.002.patch, > YARN-3518.003.patch, YARN-3518.004.patch > > > take am for example, if am can't connect to RM, after am expire (600s), RM > relaunch am, and there will be two am at the same time util resourcemanager > connect max wait time(900s) passed. > DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS = 15 * 60 * 1000; > DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60; > DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2876: - Component/s: resourcemanager fairscheduler > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Reporter: Siqi Li >Assignee: Siqi Li > Labels: oct16-easy > Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, > YARN-2876.v3.patch, YARN-2876.v4.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2876: - Labels: oct16-easy (was: ) > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Reporter: Siqi Li >Assignee: Siqi Li > Labels: oct16-easy > Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, > YARN-2876.v3.patch, YARN-2876.v4.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: YARN-2995.002.patch Uploading new version of the patch. Rebasing against trunk, addressing [~asuresh]'s comments, fixing existing test cases, adding new test cases. > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch, YARN-2995.002.patch > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5688) Make allocation of opportunistic containers asynchronous
Konstantinos Karanasos created YARN-5688: Summary: Make allocation of opportunistic containers asynchronous Key: YARN-5688 URL: https://issues.apache.org/jira/browse/YARN-5688 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos In the current implementation of the {{OpportunisticContainerAllocatorAMService}}, we synchronously perform the allocation of opportunistic containers. This results in "blocking" the service at the RM when scheduling the opportunistic containers. The {{OpportunisticContainerAllocator}} should instead asynchronously run as a separate thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5687) Refactor TestOpportunisticContainerAllocation to extend TestAMRMClient
Konstantinos Karanasos created YARN-5687: Summary: Refactor TestOpportunisticContainerAllocation to extend TestAMRMClient Key: YARN-5687 URL: https://issues.apache.org/jira/browse/YARN-5687 Project: Hadoop YARN Issue Type: Sub-task Reporter: Konstantinos Karanasos Since {{TestOpportunisticContainerAllocation}} shares a lot of code with the {{TestAMRMClient}}, we should refactor the former, making it a subclass of the latter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5486) Update OpportunisticContainerAllocatorAMService::allocate method to handle OPPORTUNISTIC container requests
[ https://issues.apache.org/jira/browse/YARN-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5486: - Attachment: YARN-5486.004.patch Adding new patch. Fixed remaining checkstyle issues and [~asuresh]'s comments. The testcase is not failing locally for me and does not seem related. [~asuresh], I will create JIRAs to track the two issues you mentioned. Good point about the opportunistic container allocation. We should make it asynchronous. > Update OpportunisticContainerAllocatorAMService::allocate method to handle > OPPORTUNISTIC container requests > --- > > Key: YARN-5486 > URL: https://issues.apache.org/jira/browse/YARN-5486 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Arun Suresh >Assignee: Konstantinos Karanasos > Attachments: YARN-5486.001.patch, YARN-5486.002.patch, > YARN-5486.003.patch, YARN-5486.004.patch > > > YARN-5457 refactors the Distributed Scheduling framework to move the > container allocator to yarn-server-common. > This JIRA proposes to update the allocate method in the new AM service to use > the OpportunisticContainerAllocator to allocate opportunistic containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types
[ https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-2995: - Attachment: YARN-2995.001.patch Adding first version of the patch. > Enhance UI to show cluster resource utilization of various container types > -- > > Key: YARN-2995 > URL: https://issues.apache.org/jira/browse/YARN-2995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: YARN-2995.001.patch > > > This JIRA proposes to extend the Resource manager UI to show how cluster > resources are being used to run *guaranteed start* and *queueable* > containers. For example, a graph that shows over time, the fraction of > running containers that are *guaranteed start* and the fraction of running > containers that are *queueable*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5486) Update OpportunisticContainerAllocatorAMService::allocate method to handle OPPORTUNISTIC container requests
[ https://issues.apache.org/jira/browse/YARN-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5486: - Attachment: YARN-5486.003.patch Uploading new version of patch, fixing compile, unit test and checkstyle issues. > Update OpportunisticContainerAllocatorAMService::allocate method to handle > OPPORTUNISTIC container requests > --- > > Key: YARN-5486 > URL: https://issues.apache.org/jira/browse/YARN-5486 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Arun Suresh >Assignee: Konstantinos Karanasos > Attachments: YARN-5486.001.patch, YARN-5486.002.patch, > YARN-5486.003.patch > > > YARN-5457 refactors the Distributed Scheduling framework to move the > container allocator to yarn-server-common. > This JIRA proposes to update the allocate method in the new AM service to use > the OpportunisticContainerAllocator to allocate opportunistic containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5486) Update OpportunisticContainerAllocatorAMService::allocate method to handle OPPORTUNISTIC container requests
[ https://issues.apache.org/jira/browse/YARN-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5486: - Attachment: YARN-5486.002.patch Rebasing against trunk and adding new patch, also including some more changes/fixes. Thanks for the feedback, [~subru]. Regarding the _LinkedHashMap_ in the {{OpportunisticContainerContext}}, we actually need it to keep the ordering of the nodes. Least loaded nodes should come first when iterating over the hashmap, as they should be preferred when placing opportunistic containers. I added checks for the ContainerTypes in {{TestOpportunisticContainersAllocation}}, as you suggested. I suggest to keep the existing sleep logic for now, if it's OK, since it does not seem to make the code much cleaner in the particular cases it is used (also chatted with [~chris.douglas] about this). > Update OpportunisticContainerAllocatorAMService::allocate method to handle > OPPORTUNISTIC container requests > --- > > Key: YARN-5486 > URL: https://issues.apache.org/jira/browse/YARN-5486 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Arun Suresh >Assignee: Konstantinos Karanasos > Attachments: YARN-5486.001.patch, YARN-5486.002.patch > > > YARN-5457 refactors the Distributed Scheduling framework to move the > container allocator to yarn-server-common. > This JIRA proposes to update the allocate method in the new AM service to use > the OpportunisticContainerAllocator to allocate opportunistic containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5646) Documentation for scheduling of OPPORTUNISTIC containers
Konstantinos Karanasos created YARN-5646: Summary: Documentation for scheduling of OPPORTUNISTIC containers Key: YARN-5646 URL: https://issues.apache.org/jira/browse/YARN-5646 Project: Hadoop YARN Issue Type: Task Reporter: Konstantinos Karanasos Assignee: Konstantinos Karanasos This is for adding documentation regarding the scheduling of OPPORTUNISTIC containers. It includes both the centralized (YARN-5220) and the distributed (YARN-2877) scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5542) Scheduling of opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5542: - Description: This JIRA groups all efforts related to the scheduling of opportunistic containers. It includes the scheduling of opportunistic container through the central RM (YARN-5220), through distributed scheduling (YARN-2877), as well as the scheduling of containers based on actual node utilization (YARN-1011) and the container promotion/demotion (YARN-5085). was: This JIRA groups all efforts related to the scheduling of opportunistic containers. It includes the scheduling of opportunistic container through the central RM (YARN-5220), through distributed scheduling (YARN-2877), as well as the scheduling of containers based on actual node utilization (YARN-1011). > Scheduling of opportunistic containers > -- > > Key: YARN-5542 > URL: https://issues.apache.org/jira/browse/YARN-5542 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos > > This JIRA groups all efforts related to the scheduling of opportunistic > containers. > It includes the scheduling of opportunistic container through the central RM > (YARN-5220), through distributed scheduling (YARN-2877), as well as the > scheduling of containers based on actual node utilization (YARN-1011) and the > container promotion/demotion (YARN-5085). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5542) Scheduling of opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428894#comment-15428894 ] Konstantinos Karanasos commented on YARN-5542: -- We had some initial discussions with [~kasha] and [~elgoiri]. We will upload a design document, summarizing the whole effort. > Scheduling of opportunistic containers > -- > > Key: YARN-5542 > URL: https://issues.apache.org/jira/browse/YARN-5542 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos > > This JIRA groups all efforts related to the scheduling of opportunistic > containers. > It includes the scheduling of opportunistic container through the central RM > (YARN-5220), through distributed scheduling (YARN-2877), as well as the > scheduling of containers based on actual node utilization (YARN-1011). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5542) Scheduling of opportunistic containers
Konstantinos Karanasos created YARN-5542: Summary: Scheduling of opportunistic containers Key: YARN-5542 URL: https://issues.apache.org/jira/browse/YARN-5542 Project: Hadoop YARN Issue Type: New Feature Reporter: Konstantinos Karanasos This JIRA groups all efforts related to the scheduling of opportunistic containers. It includes the scheduling of opportunistic container through the central RM (YARN-5220), through distributed scheduling (YARN-2877), as well as the scheduling of containers based on actual node utilization (YARN-1011). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5541) Handling of opportunistic containers in the NM
Konstantinos Karanasos created YARN-5541: Summary: Handling of opportunistic containers in the NM Key: YARN-5541 URL: https://issues.apache.org/jira/browse/YARN-5541 Project: Hadoop YARN Issue Type: New Feature Reporter: Konstantinos Karanasos I am creating this JIRA in order to group all tasks related to the management of opportunistic containers in the NMs, such as the queuing of containers, the pausing of containers and the prioritization of queued containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5457) Refactor DistributedScheduling framework to pull out common functionality
[ https://issues.apache.org/jira/browse/YARN-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412841#comment-15412841 ] Konstantinos Karanasos commented on YARN-5457: -- Looks good to me too, thanks [~asuresh]. (Very minor suggestion: it might look better to rename {{OpportunisticContainersAllocatingAMService}} to {{OpportunisticContainersAllocatorAMService}} or simply {{OpportunisticContainersAMService}}). > Refactor DistributedScheduling framework to pull out common functionality > - > > Key: YARN-5457 > URL: https://issues.apache.org/jira/browse/YARN-5457 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-5457.001.patch, YARN-5457.002.patch, > YARN-5457.003.patch > > > Opening this JIRA to track the some refactoring missed in YARN-5113: -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410310#comment-15410310 ] Konstantinos Karanasos commented on YARN-4902: -- [~leftnoteasy]: bq. I can understand your proposal may look different from my guess above, we can discuss more once you have a more concrete design for that. Yes, let's discuss about service planning once we add more details in the design document -- it will be easier for other people to get involved in the discussion too. bq. I'm not care too much about if we should support cardinality via GUTS API or support anti-affinity via cardinality syntaxes. We should choose a more generic/extensible API which can support both. Sounds good, we can continue the discussion in YARN-5478. > [Umbrella] Generalized and unified scheduling-strategies in YARN > > > Key: YARN-4902 > URL: https://issues.apache.org/jira/browse/YARN-4902 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan > Attachments: Generalized and unified scheduling-strategies in YARN > -v0.pdf, LRA-scheduling-design.v0.pdf, YARN-5468.prototype.patch > > > Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's > scheduling API for applications to use. The ResourceRequest mechanism is a > powerful API for applications (specifically ApplicationMasters) to indicate > to YARN what size of containers are needed, and where in the cluster etc. > However a host of new feature requirements are making the API increasingly > more and more complex and difficult to understand by users and making it very > complicated to implement within the code-base. > This JIRA aims to generalize and unify all such scheduling-strategies in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410270#comment-15410270 ] Konstantinos Karanasos commented on YARN-4902: -- bq. LRA planning looks like an implementation. LRA planning is much more than an implementation. Think of it as planning multiple applications at once. This is something that the scheduler cannot do, no matter what its implementation is. Please give a look at YARN-1051 to see a similar use case for planning/admission control but in a constraint-free context. I can give more details as I update the document. In any case, that does not block any of the changes that are required in the scheduler per se to support constraints. bq. For cardinality, could you share a more detailed use case for that? As you mention, an example would be to limit the number of hbase-masters in a node/rack or even the number of AMs in a node. You could do it with resource isolation, but especially network isolation is really hard to get right, so until we reach that point, I think it would be great for applications to be able to express such constraints. bq. It seems to me that cardinality is a special case of anti-affinity. I would say that it is the other way around: affinity and anti-affinity is a special case of cardinality. If you say there is cardinality 1 for that node, it means you have anti-affinity for that node. I agree that you can currently express it with your proposal, so we are just suggesting an alternative way that would be more succinct and we will not need to have different types of constraints, but just a single one. > [Umbrella] Generalized and unified scheduling-strategies in YARN > > > Key: YARN-4902 > URL: https://issues.apache.org/jira/browse/YARN-4902 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan > Attachments: Generalized and unified scheduling-strategies in YARN > -v0.pdf, LRA-scheduling-design.v0.pdf, YARN-5468.prototype.patch > > > Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's > scheduling API for applications to use. The ResourceRequest mechanism is a > powerful API for applications (specifically ApplicationMasters) to indicate > to YARN what size of containers are needed, and where in the cluster etc. > However a host of new feature requirements are making the API increasingly > more and more complex and difficult to understand by users and making it very > complicated to implement within the code-base. > This JIRA aims to generalize and unify all such scheduling-strategies in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410209#comment-15410209 ] Konstantinos Karanasos commented on YARN-4902: -- Thanks for checking the design doc and the patch, and for the feedback, [~leftnoteasy]. Please find below some thoughts regarding the points your raised and some additional information. bq. From the requirement's perspective, I didn't see new things, please remind me if I missed anything Agreed that our basic requirements are similar, which is good because it means we are aligned. Some of the notions we are using might coincide with yours but have a different name (e.g., dynamic vs. allocation tags, although the scope of our dynamic tags is global and not application specific like yours), by virtue of the fact that we were designing things at the same time. We can agree on a common naming, not a problem. What I would like to stretch as being different is mainly the LRA planning, some extensions to the constraints (along with a more succinct way of expressing them), as well as the ease of expressing inter-application constraints -- more details below. *Constraints* bq. The cardinality constraints is placement_set with maximum_concurrency constraint: see (4.3.3) Placement Strategy in my design doc. If I am not wrong, the maxiumum_concurrency in your document corresponds to a single allocation/resource-request. Our min and max cardinality is across applications. For instance, in order to say "don't put more than 5 hbase servers (from any possible application) in a rack". In general, as we showed in our design doc, you can use max and min cardinalities to also express affinity and anti-affinity constraints. This way we can have only a single type of constraints. What do you think? bq. Will this patch support anti-affinity / affinity between apps? I uploaded my latest POC patch to YARN-1042, it supports affinity/anti-affinity for inter/intra apps. We can easily extend it to support intra/inter resource request within the app. Yes, this is a major use case for us. The current patch can already support it. And this is why we want to make more use of the tags and of planning, since they would allow us to specify inter-app constraints without needing to know the app ID of the other job. bq. Major logic of this patch depends on node label manager dynamic tag changes. First of all, I'm not sure if NLM works efficiently when node label changes rapidly (we could update label on node when allocate / release every container). And I'm not sure how you plan to avoid malicious application add labels. For example if a distributed shell application claims it is a "hbase master" just for fun, how to enforce cardinality logics like "only put 10 HBase masters in the rack"? Good points. For the scalability we have not seen any problems so far (we update tags at allocate/release), but we have not run very large-scale experiments -- I will update you on that. For the malicious AM, I am not sure if the application would benefit from lying. But even if it does, we can use cluster-wide constraints to limit such AMs. Still, I agree more thought has to be given on this matter -- it's good you brought it up. *Scheduling* bq. It might be better to implement complex scheduling logics like affinity-between-apps and cardinality in a global scheduling way. (YARN-5139) We will be more than happy to use any advancement in the scheduler that is available! I totally believe that global scheduling (i.e., have an application-centric rather than node-centric scheduling) is much more appropriate and will give better results. We did not use it in our first patch, as it was not available, but we are happy to try it out. *Planning* bq. I'm not sure how LRA planner will look like, should it be a separate scheduler running in parallel? I didn't see your patch uses that approach. The idea here is to be able to do more holistic placement decisions across applications. What if you place your HBase service in a way that does not let a subsequent Heron app be placed in the cluster at all? We envision it to be outside of the scheduler, similar to the reservation system (YARN-1051). Applications will also be able to submit multiple applications at once, and specify constraints among them. It is not in the initial version of the patch. *Suggestions* bq. Could you take a look at global scheduling patch which I attached to YARN-5139 to see if it is possible to build new features added in your patch on top of the global scheduling framework? And also please share your thoughts about what's your overall feedbacks to the global scheduling framework like efficiency, extensibility, etc. I will check the global scheduler, and as I said above, I'd be happy to use it. bq. It will be better to design Java API for this ticket, both of our poc patches (this one and the
[jira] [Comment Edited] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408657#comment-15408657 ] Konstantinos Karanasos edited comment on YARN-4902 at 8/4/16 11:57 PM: --- I am uploading a design document that describes our vision for scheduling long-running applications (LRA). It is a very initial version, but I am sharing it, so that it helps drive the discussion. There are overlapping bits with this JIRA (after all, up to a point, it targets the same problem), but there are clearly new points, especially when it comes to LRA planning. As I had explained to [~leftnoteasy] offline during the Hadoop Summit, our focus is not on the scheduling given affinity/anti-affinity constraints, but on the LRA *planning*. We did a first implementation of affinity, anti-affinity and *cardinality* constraints, because it was required for us to proceed with the LRA planning and nothing was available at that time. [That said, we have already added support for cardinality and I think we have a different support for tags (but I need to take a closer look on YARN-1042) -- let's continue the discussion at that JIRA.] Given that Wangda marked YARN-5468 as duplicate, do you believe that the LRA planing belongs to this or another existing JIRA? As far as I can tell, it does not. Let me know what you think, so that we can use the proper JIRAs and avoid duplicate effort going forward. Thanks. was (Author: kkaranasos): I am uploading a design document that describes our vision for scheduling long-running applications (LRA). It is a very initial version, but I am sharing it, so that it helps drive the discussion. There are overlapping bits with this JIRA (after all, up to a point, it targets the same problem), but there are clearly new points, especially when it comes to LRA planning. As I had explained to [~leftnoteasy] offline during the Hadoop Summit, our focus is not on the scheduling given affinity/anti-affinity constraints, but on the LRA *planning*. We did a first implementation of affinity, anti-affinity and *cardinality* constraints, because it was required for us to proceed with the LRA planning and nothing was available at that time. [That said, we have already added support for cardinality and I think we have a different support for tags (but I need to take a closer look on YARN-1042) -- let's continue the discussion at that JIRA.] Given that Wangda marked YARN-5048 as duplicate, do you believe that the LRA planing belongs to this or another existing JIRA? As far as I can tell, it does not. Let me know what you think, so that we can use the proper JIRAs and avoid duplicate effort going forward. Thanks. > [Umbrella] Generalized and unified scheduling-strategies in YARN > > > Key: YARN-4902 > URL: https://issues.apache.org/jira/browse/YARN-4902 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan > Attachments: Generalized and unified scheduling-strategies in YARN > -v0.pdf, LRA-scheduling-design.v0.pdf, YARN-5468.prototype.patch > > > Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's > scheduling API for applications to use. The ResourceRequest mechanism is a > powerful API for applications (specifically ApplicationMasters) to indicate > to YARN what size of containers are needed, and where in the cluster etc. > However a host of new feature requirements are making the API increasingly > more and more complex and difficult to understand by users and making it very > complicated to implement within the code-base. > This JIRA aims to generalize and unify all such scheduling-strategies in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-4902: - Attachment: LRA-scheduling-design.v0.pdf I am uploading a design document that describes our vision for scheduling long-running applications (LRA). It is a very initial version, but I am sharing it, so that it helps drive the discussion. There are overlapping bits with this JIRA (after all, up to a point, it targets the same problem), but there are clearly new points, especially when it comes to LRA planning. As I had explained to [~leftnoteasy] offline during the Hadoop Summit, our focus is not on the scheduling given affinity/anti-affinity constraints, but on the LRA *planning*. We did a first implementation of affinity, anti-affinity and *cardinality* constraints, because it was required for us to proceed with the LRA planning and nothing was available at that time. [That said, we have already added support for cardinality and I think we have a different support for tags (but I need to take a closer look on YARN-1042) -- let's continue the discussion at that JIRA.] Given that Wangda marked YARN-5048 as duplicate, do you believe that the LRA planing belongs to this or another existing JIRA? As far as I can tell, it does not. Let me know what you think, so that we can use the proper JIRAs and avoid duplicate effort going forward. Thanks. > [Umbrella] Generalized and unified scheduling-strategies in YARN > > > Key: YARN-4902 > URL: https://issues.apache.org/jira/browse/YARN-4902 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan > Attachments: Generalized and unified scheduling-strategies in YARN > -v0.pdf, LRA-scheduling-design.v0.pdf, YARN-5468.prototype.patch > > > Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's > scheduling API for applications to use. The ResourceRequest mechanism is a > powerful API for applications (specifically ApplicationMasters) to indicate > to YARN what size of containers are needed, and where in the cluster etc. > However a host of new feature requirements are making the API increasingly > more and more complex and difficult to understand by users and making it very > complicated to implement within the code-base. > This JIRA aims to generalize and unify all such scheduling-strategies in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4902) [Umbrella] Generalized and unified scheduling-strategies in YARN
[ https://issues.apache.org/jira/browse/YARN-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-4902: - Attachment: YARN-5468.prototype.patch We have been working on a first prototype for handling constraints in scheduling. Following [~leftnoteasy]'s recommendation, I am uploading it in this JIRA, as it seems the most related. The patch is by [~pgaref]. This version of the prototype does not include our proposal about *planning* (rather than scheduling) of applications. We plan to update the patch with our proposal. > [Umbrella] Generalized and unified scheduling-strategies in YARN > > > Key: YARN-4902 > URL: https://issues.apache.org/jira/browse/YARN-4902 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Wangda Tan > Attachments: Generalized and unified scheduling-strategies in YARN > -v0.pdf, YARN-5468.prototype.patch > > > Apache Hadoop YARN's ResourceRequest mechanism is the core part of the YARN's > scheduling API for applications to use. The ResourceRequest mechanism is a > powerful API for applications (specifically ApplicationMasters) to indicate > to YARN what size of containers are needed, and where in the cluster etc. > However a host of new feature requirements are making the API increasingly > more and more complex and difficult to understand by users and making it very > complicated to implement within the code-base. > This JIRA aims to generalize and unify all such scheduling-strategies in YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org