[jira] [Created] (YARN-8964) UI2 should use clusters/{cluster name} for all ATSv2 REST APIs
Rohith Sharma K S created YARN-8964: --- Summary: UI2 should use clusters/{cluster name} for all ATSv2 REST APIs Key: YARN-8964 URL: https://issues.apache.org/jira/browse/YARN-8964 Project: Hadoop YARN Issue Type: Improvement Reporter: Rohith Sharma K S UI2 makes a REST call to TimelineReader without cluster name. It is advised to make a REST call with clusters/{cluster name} so that remote TimelineReader daemon could serve for different clusters. *Example*: *Current*: /ws/v2/timeline/flows/ *Change*: /ws/v2/timeline/*clusters/\{cluster name\}*/flows/ *yarn.resourcemanager.cluster-id *is configured with cluster. So, this config could be used to get cluster-id cc:/ [~sunilg] [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8961) [UI2] Flow Run End Time shows 'Invalid date'
[ https://issues.apache.org/jira/browse/YARN-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669677#comment-16669677 ] Akhil PB commented on YARN-8961: Hi [~charanh], Could you pls attach the flow runs response. The invalid date might be because of the empty/null date from the backend. Wanted to double check this. > [UI2] Flow Run End Time shows 'Invalid date' > > > Key: YARN-8961 > URL: https://issues.apache.org/jira/browse/YARN-8961 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Invalid_Date.png > > > End Time for Flow Runs is shown as *Invalid date* for runs that are in > progress. This should be shown as *N/A* just like how it is shown for 'CPU > VCores' and 'Memory Used'. Attached relevant screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669572#comment-16669572 ] Weiwei Yang commented on YARN-8902: --- v7 patch fixed the findbugs and checkstyle issues > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch, YARN-8902.007.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8902: -- Attachment: YARN-8902.007.patch > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch, YARN-8902.007.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8761) Service AM support for decommissioning component instances
[ https://issues.apache.org/jira/browse/YARN-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8761: - Attachment: YARN-8761.03.patch > Service AM support for decommissioning component instances > -- > > Key: YARN-8761 > URL: https://issues.apache.org/jira/browse/YARN-8761 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8761.01.patch, YARN-8761.02.patch, > YARN-8761.03.patch > > > The idea behind this feature is to have a flex down where specific component > instances are removed. Currently on a flex down, the service AM chooses for > removal the component instances with the highest IDs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8838) Add security check for container user is same as websocket user
[ https://issues.apache.org/jira/browse/YARN-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669437#comment-16669437 ] Eric Yang commented on YARN-8838: - Patch 002 allows YARN admin user to login to container, if yarn.acl.enable feature is enabled. > Add security check for container user is same as websocket user > --- > > Key: YARN-8838 > URL: https://issues.apache.org/jira/browse/YARN-8838 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: docker > Attachments: YARN-8838.001.patch, YARN-8838.002.patch > > > When user is authenticate via SPNEGO entry point, node manager must verify > the remote user is the same as the container user to start the web socket > session. One possible solution is to verify the web request user matches > yarn container local directory owne during onWebSocketConnect.. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8838) Add security check for container user is same as websocket user
[ https://issues.apache.org/jira/browse/YARN-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8838: Attachment: YARN-8838.002.patch > Add security check for container user is same as websocket user > --- > > Key: YARN-8838 > URL: https://issues.apache.org/jira/browse/YARN-8838 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: docker > Attachments: YARN-8838.001.patch, YARN-8838.002.patch > > > When user is authenticate via SPNEGO entry point, node manager must verify > the remote user is the same as the container user to start the web socket > session. One possible solution is to verify the web request user matches > yarn container local directory owne during onWebSocketConnect.. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8957) Add Serializable interface to ComponentContainers
[ https://issues.apache.org/jira/browse/YARN-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669411#comment-16669411 ] Zhankun Tang commented on YARN-8957: [~csingh] . Thanks for the review and share its original JIRA! > Add Serializable interface to ComponentContainers > - > > Key: YARN-8957 > URL: https://issues.apache.org/jira/browse/YARN-8957 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Minor > Attachments: YARN-8957-trunk.001.patch > > > In YARN service API: > public class ComponentContainers > { private static final long serialVersionUID = -1456748479118874991L; ... } > > seems should be > > public class ComponentContainers {color:#d04437}implements > Serializable{color} { > private static final long serialVersionUID = -1456748479118874991L; ... } -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8839) Define a protocol exchange between websocket client and server for interactive shell
[ https://issues.apache.org/jira/browse/YARN-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669405#comment-16669405 ] Eric Yang edited comment on YARN-8839 at 10/30/18 10:43 PM: In the current implementation, there is one control implemented. The structure of the protocol is leading by number one, and follow by a JSON string, for example, sending heartbeat looks like: {code} 1{} {code} This is used to ping server to check if there is any output. We can use the same format to set terminal size: {code} 1{cols:80, rows:25} {code} The rest of the data stream are treated as byte array. was (Author: eyang): In the current implementation, there is one control implemented. The structure of the protocol is leading by number one, and follow by a JSON string, for example, sending heartbeat looks like: 1{} This is used to ping server to check if there is any output. We can use the same format to set terminal size: 1{cols:80, rows:25} The rest of the data stream are treated as byte array. > Define a protocol exchange between websocket client and server for > interactive shell > > > Key: YARN-8839 > URL: https://issues.apache.org/jira/browse/YARN-8839 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: docker > > Running interactive shell is more than piping stdio from docker exec through > a web socket. For enabling terminal based program to run, there are certain > functions that work outside of stdio streams to the destination program. A > couple known functions to improve terminal usability: > # Resize terminal columns and rows > # Set title of the window > # Upload files via zmodem protocol > # Set terminal type > # Heartbeat (poll server side for more data) > # Send keystroke payload to server side > If we want to be on parity with commonly supported ssh terminal functions, we > need to develop a set of protocols between websocket client and server. > Client and server intercept the messages to enable functions that are > normally outside of the stdio streams. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8839) Define a protocol exchange between websocket client and server for interactive shell
[ https://issues.apache.org/jira/browse/YARN-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669405#comment-16669405 ] Eric Yang commented on YARN-8839: - In the current implementation, there is one control implemented. The structure of the protocol is leading by number one, and follow by a JSON string, for example, sending heartbeat looks like: 1{} This is used to ping server to check if there is any output. We can use the same format to set terminal size: 1{cols:80, rows:25} The rest of the data stream are treated as byte array. > Define a protocol exchange between websocket client and server for > interactive shell > > > Key: YARN-8839 > URL: https://issues.apache.org/jira/browse/YARN-8839 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: docker > > Running interactive shell is more than piping stdio from docker exec through > a web socket. For enabling terminal based program to run, there are certain > functions that work outside of stdio streams to the destination program. A > couple known functions to improve terminal usability: > # Resize terminal columns and rows > # Set title of the window > # Upload files via zmodem protocol > # Set terminal type > # Heartbeat (poll server side for more data) > # Send keystroke payload to server side > If we want to be on parity with commonly supported ssh terminal functions, we > need to develop a set of protocols between websocket client and server. > Client and server intercept the messages to enable functions that are > normally outside of the stdio streams. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8963) Add flag to disable interactive shell
Eric Yang created YARN-8963: --- Summary: Add flag to disable interactive shell Key: YARN-8963 URL: https://issues.apache.org/jira/browse/YARN-8963 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Yang For some production job, application admin might choose to disable debugging to production jobs to prevent developer or system admin from accessing the containers. It would be nice to add an environment variable flag to disable interactive shell during application submission. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8962) Add ability to use interactive shell with normal yarn container
Eric Yang created YARN-8962: --- Summary: Add ability to use interactive shell with normal yarn container Key: YARN-8962 URL: https://issues.apache.org/jira/browse/YARN-8962 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Yang This task is focusing on extending interactive shell capability to yarn container without docker. This will improve some aspect of debugging mapreduce or spark applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8867) Retrieve the status of resource localization
[ https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669395#comment-16669395 ] Eric Yang commented on YARN-8867: - [~csingh] Thank you for the update. Is this status going to show up in Yarn Service JSON or some other mechanism to surface to the end user? The status definition may also include a state for not yet started, like PENDING. The rest looks on like the right track. Thanks > Retrieve the status of resource localization > > > Key: YARN-8867 > URL: https://issues.apache.org/jira/browse/YARN-8867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8867.wip.patch > > > Refer YARN-3854. > Currently NM does not have an API to retrieve the status of localization. > Unless the client can know when the localization of a resource is complete > irrespective of the type of the resource, it cannot take any appropriate > action. > We need an API in {{ContainerManagementProtocol}} to retrieve the status on > the localization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669392#comment-16669392 ] Wangda Tan commented on YARN-8714: -- [~tangzhankun] , sure, please go ahead. > [Submarine] Support files/tarballs to be localized for a training job. > -- > > Key: YARN-8714 > URL: https://issues.apache.org/jira/browse/YARN-8714 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Priority: Major > > See > https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#heading=h.vkxp9edl11m7, > {{job run --localizations ...}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8867) Retrieve the status of resource localization
[ https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8867: Summary: Retrieve the status of resource localization (was: Retrieve the progress of resource localization) > Retrieve the status of resource localization > > > Key: YARN-8867 > URL: https://issues.apache.org/jira/browse/YARN-8867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8867.wip.patch > > > Refer YARN-3854. > Currently NM does not have an API to retrieve the progress of localization. > Unless the client can know when the localization of a resource is complete > irrespective of the type of the resource, it cannot take any appropriate > action. > We need an API in {{ContainerManagementProtocol}} to retrieve the progress on > the localization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8867) Retrieve the status of resource localization
[ https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669368#comment-16669368 ] Chandni Singh commented on YARN-8867: - [~eyang] I changed the title and description of the jira to avoid confusion. > Retrieve the status of resource localization > > > Key: YARN-8867 > URL: https://issues.apache.org/jira/browse/YARN-8867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8867.wip.patch > > > Refer YARN-3854. > Currently NM does not have an API to retrieve the status of localization. > Unless the client can know when the localization of a resource is complete > irrespective of the type of the resource, it cannot take any appropriate > action. > We need an API in {{ContainerManagementProtocol}} to retrieve the status on > the localization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8867) Retrieve the status of resource localization
[ https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8867: Description: Refer YARN-3854. Currently NM does not have an API to retrieve the status of localization. Unless the client can know when the localization of a resource is complete irrespective of the type of the resource, it cannot take any appropriate action. We need an API in {{ContainerManagementProtocol}} to retrieve the status on the localization. was: Refer YARN-3854. Currently NM does not have an API to retrieve the progress of localization. Unless the client can know when the localization of a resource is complete irrespective of the type of the resource, it cannot take any appropriate action. We need an API in {{ContainerManagementProtocol}} to retrieve the progress on the localization. > Retrieve the status of resource localization > > > Key: YARN-8867 > URL: https://issues.apache.org/jira/browse/YARN-8867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8867.wip.patch > > > Refer YARN-3854. > Currently NM does not have an API to retrieve the status of localization. > Unless the client can know when the localization of a resource is complete > irrespective of the type of the resource, it cannot take any appropriate > action. > We need an API in {{ContainerManagementProtocol}} to retrieve the status on > the localization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8867) Retrieve the progress of resource localization
[ https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669361#comment-16669361 ] Eric Yang commented on YARN-8867: - The JIRA title says progress, instead, what would be reported are status of the localization. Either change the title of the JIRA to match implementation or change localization status to progress with numeric percentage for accuracy. > Retrieve the progress of resource localization > -- > > Key: YARN-8867 > URL: https://issues.apache.org/jira/browse/YARN-8867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8867.wip.patch > > > Refer YARN-3854. > Currently NM does not have an API to retrieve the progress of localization. > Unless the client can know when the localization of a resource is complete > irrespective of the type of the resource, it cannot take any appropriate > action. > We need an API in {{ContainerManagementProtocol}} to retrieve the progress on > the localization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8867) Retrieve the progress of resource localization
[ https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669342#comment-16669342 ] Chandni Singh commented on YARN-8867: - [~eyang] Thanks for reviewing. Retrieving the progress is a good point. I just think it can be done as a next step. As you stated, this patch adds the basic protocol to query the status of localized resources. With this, the status would only say if the resource localization in - IN_PROGRESS, COMPLETED, or FAILED. Adding more information in {{LocalizationStatus}} like progress could be handled separately. Let me know if that is okay? > Retrieve the progress of resource localization > -- > > Key: YARN-8867 > URL: https://issues.apache.org/jira/browse/YARN-8867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8867.wip.patch > > > Refer YARN-3854. > Currently NM does not have an API to retrieve the progress of localization. > Unless the client can know when the localization of a resource is complete > irrespective of the type of the resource, it cannot take any appropriate > action. > We need an API in {{ContainerManagementProtocol}} to retrieve the progress on > the localization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8932) ResourceUtilization cpu is misused in oversubscription as a percentage
[ https://issues.apache.org/jira/browse/YARN-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8932: - Attachment: YARN-8932-YARN-1011.02.patch > ResourceUtilization cpu is misused in oversubscription as a percentage > -- > > Key: YARN-8932 > URL: https://issues.apache.org/jira/browse/YARN-8932 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8932-YARN-1011.00.patch, > YARN-8932-YARN-1011.01.patch, YARN-8932-YARN-1011.02.patch > > > The ResourceUtilization javadoc mistakenly documents the cpu as a percentage > represented by a float number in [0, 1.0f], however it is used as the # of > vcores used in reality. > See javadoc and discussion in YARN-8911. > /** > * Get CPU utilization. > * > * @return CPU utilization normalized to 1 CPU > */ > @Public > @Unstable > public abstract float getCPU(); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8932) ResourceUtilization cpu is misused in oversubscription as a percentage
[ https://issues.apache.org/jira/browse/YARN-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669331#comment-16669331 ] Haibo Chen commented on YARN-8932: -- Thanks for the review, [~rkanter]. I have updated the patch to include a few more comments to explain what the numbers mean after the change. The cpu utilization numbers involved, before this patch, were percentage numbers. By multiple them by the total number of vcores, they are changed to absolute cpu utilization in terms of the number of vcores used. > ResourceUtilization cpu is misused in oversubscription as a percentage > -- > > Key: YARN-8932 > URL: https://issues.apache.org/jira/browse/YARN-8932 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8932-YARN-1011.00.patch, > YARN-8932-YARN-1011.01.patch, YARN-8932-YARN-1011.02.patch > > > The ResourceUtilization javadoc mistakenly documents the cpu as a percentage > represented by a float number in [0, 1.0f], however it is used as the # of > vcores used in reality. > See javadoc and discussion in YARN-8911. > /** > * Get CPU utilization. > * > * @return CPU utilization normalized to 1 CPU > */ > @Public > @Unstable > public abstract float getCPU(); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8867) Retrieve the progress of resource localization
[ https://issues.apache.org/jira/browse/YARN-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669325#comment-16669325 ] Eric Yang commented on YARN-8867: - [~csingh] The current wip patch looks ok for adding new communication protocol in yarn, node manager and mapreduce client to retrieve localization status. We probably want to define what constitutes a localization status. In CLI and UI, progress field seems to indicate a numeric number of percentage. The current communication protocol is a list of localization status. If we have container image to be localized, and it is represented by one of the localization status. It does not translate well to percentage values. Docker image download is also hard to be represented as percentage value because docker pull output does not show percentage output. Given those limitations from backend, it would be in our best interests to define what user will see as progression. If progress percentage is still the information that is going to be displayed, we may want to spend more time on how to calculate status to progress percentage. Otherwise, localizing a single container without other resource might look odd for UI to stuck on 0 percent and jump to 100 percent because there is only one localization status. > Retrieve the progress of resource localization > -- > > Key: YARN-8867 > URL: https://issues.apache.org/jira/browse/YARN-8867 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8867.wip.patch > > > Refer YARN-3854. > Currently NM does not have an API to retrieve the progress of localization. > Unless the client can know when the localization of a resource is complete > irrespective of the type of the resource, it cannot take any appropriate > action. > We need an API in {{ContainerManagementProtocol}} to retrieve the progress on > the localization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8932) ResourceUtilization cpu is misused in oversubscription as a percentage
[ https://issues.apache.org/jira/browse/YARN-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669288#comment-16669288 ] Robert Kanter commented on YARN-8932: - Thanks for the patch [~haibochen]. A minor comment: - In the tests, you update some of the constants. It would be good to put in comments to explain them. > ResourceUtilization cpu is misused in oversubscription as a percentage > -- > > Key: YARN-8932 > URL: https://issues.apache.org/jira/browse/YARN-8932 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8932-YARN-1011.00.patch, > YARN-8932-YARN-1011.01.patch > > > The ResourceUtilization javadoc mistakenly documents the cpu as a percentage > represented by a float number in [0, 1.0f], however it is used as the # of > vcores used in reality. > See javadoc and discussion in YARN-8911. > /** > * Get CPU utilization. > * > * @return CPU utilization normalized to 1 CPU > */ > @Public > @Unstable > public abstract float getCPU(); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669269#comment-16669269 ] Hadoop QA commented on YARN-8902: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 56s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 11 new + 58 unchanged - 0 fixed = 69 total (was 58) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 7s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}172m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Should org.apache.hadoop.yarn.server.resourcemanager.volume.csi.provisioner.VolumeProvisioningResults$VolumeProvisioningResult be a _static_ inner class? At VolumeProvisioningResults.java:inner class? At VolumeProvisioningResults.java:[lines 64-79] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8902 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946186/YARN-8902
[jira] [Commented] (YARN-8960) Can't get submarine service status using the command of "yarn app -status" under security environment
[ https://issues.apache.org/jira/browse/YARN-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669239#comment-16669239 ] Hadoop QA commented on YARN-8960: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine: The patch generated 7 new + 31 unchanged - 0 fixed = 38 total (was 31) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8960 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946198/YARN-8960.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b93bd98809b5 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 277a3d8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/22383/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22383/testReport/ | | Max. process+thread count | 329 (vs. ulimi
[jira] [Commented] (YARN-8897) LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability
[ https://issues.apache.org/jira/browse/YARN-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669196#comment-16669196 ] Hadoop QA commented on YARN-8897: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 18s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8897 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946184/YARN-8897-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 76000ac9f62a 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 277a3d8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22382/testReport/ | | Max. process+thread count | 294 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22382/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > LoadBasedRouterPolicy throws "NPE" in case of sub cluster una
[jira] [Commented] (YARN-8932) ResourceUtilization cpu is misused in oversubscription as a percentage
[ https://issues.apache.org/jira/browse/YARN-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669151#comment-16669151 ] Haibo Chen commented on YARN-8932: -- The unit test failure is unrelated. The code changes are not related to TestWorkPreservingRMRestart. I could not reproduce it locally, TestQueueManagementDynamicEditPolicy.testEditScheduler is tracked by YARN-8494 and TestNMProxy.testNMProxyRPCRetry has been failing in other jiras. > ResourceUtilization cpu is misused in oversubscription as a percentage > -- > > Key: YARN-8932 > URL: https://issues.apache.org/jira/browse/YARN-8932 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8932-YARN-1011.00.patch, > YARN-8932-YARN-1011.01.patch > > > The ResourceUtilization javadoc mistakenly documents the cpu as a percentage > represented by a float number in [0, 1.0f], however it is used as the # of > vcores used in reality. > See javadoc and discussion in YARN-8911. > /** > * Get CPU utilization. > * > * @return CPU utilization normalized to 1 CPU > */ > @Public > @Unstable > public abstract float getCPU(); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8955) Add a flag to use local docker image instead of getting latest from registry
[ https://issues.apache.org/jira/browse/YARN-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669119#comment-16669119 ] Chandni Singh commented on YARN-8955: - [~eyang] I would like to take it up since I am working on YARN-3854. Assigning it to myself. > Add a flag to use local docker image instead of getting latest from registry > > > Key: YARN-8955 > URL: https://issues.apache.org/jira/browse/YARN-8955 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some companies have security policy to use local docker images instead of > getting latest images from internet. When docker image is pulled in > localization phase, there are two possible out comes. The image is getting > the latest from trusted registries, or the image is a static local copy. > This task is to add a configuration flag to give priority to local image over > trusted registry image. > If a image already exist locally, node manager does not trigger docker pull > to get the latest image from trusted registries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8955) Add a flag to use local docker image instead of getting latest from registry
[ https://issues.apache.org/jira/browse/YARN-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh reassigned YARN-8955: --- Assignee: Chandni Singh > Add a flag to use local docker image instead of getting latest from registry > > > Key: YARN-8955 > URL: https://issues.apache.org/jira/browse/YARN-8955 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > Some companies have security policy to use local docker images instead of > getting latest images from internet. When docker image is pulled in > localization phase, there are two possible out comes. The image is getting > the latest from trusted registries, or the image is a static local copy. > This task is to add a configuration flag to give priority to local image over > trusted registry image. > If a image already exist locally, node manager does not trigger docker pull > to get the latest image from trusted registries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8957) Add Serializable interface to ComponentContainers
[ https://issues.apache.org/jira/browse/YARN-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669114#comment-16669114 ] Chandni Singh commented on YARN-8957: - +1 (non-binding). It should implement {{Serializable}}. {{ComponentContainers}} was added with https://issues.apache.org/jira/browse/YARN-8542. > Add Serializable interface to ComponentContainers > - > > Key: YARN-8957 > URL: https://issues.apache.org/jira/browse/YARN-8957 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Minor > Attachments: YARN-8957-trunk.001.patch > > > In YARN service API: > public class ComponentContainers > { private static final long serialVersionUID = -1456748479118874991L; ... } > > seems should be > > public class ComponentContainers {color:#d04437}implements > Serializable{color} { > private static final long serialVersionUID = -1456748479118874991L; ... } -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app
[ https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669111#comment-16669111 ] Tao Yang commented on YARN-8958: Attached v2 patch to fix UT failures (1) Set {{yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.MemoryRMStateStore}} for TestFairOrderingPolicy#testSchedulableEntitiesLeak to avoid RM recovering apps from state left by former test case. (2) TestCapacityScheduler#testAllocateReorder always have a problem that only activate one app but expect both two apps, it can pass before because app2 will be add into schedulable entities through calling CapacityScheduler#allocate explicitly in this test case (add app2 into entitiesToReorder then add it into schedulableEntities) even though app2 is still not activated. So that this problem is exposed because of this patch, and if set {{yarn.scheduler.capacity.maximum-am-resource-percent=1.0}} then both two apps can be activated in this test case, This test case can pass again. > Schedulable entities leak in fair ordering policy when recovering containers > between remove app attempt and remove app > -- > > Key: YARN-8958 > URL: https://issues.apache.org/jira/browse/YARN-8958 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8958.001.patch, YARN-8958.002.patch > > > We found a NPE in ClientRMService#getApplications when querying apps with > specified queue. The cause is that there is one app which can't be found by > calling RMContextImpl#getRMApps(is finished and swapped out of memory) but > still can be queried from fair ordering policy. > To reproduce schedulable entities leak in fair ordering policy: > (1) create app1 and launch container1 on node1 > (2) restart RM > (3) remove app1 attempt, app1 is removed from the schedulable entities. > (4) recover container1, then the state of contianer1 is changed to COMPLETED, > app1 is bring back to entitiesToReorder after container released, then app1 > will be added back into schedulable entities after calling > FairOrderingPolicy#getAssignmentIterator by scheduler. > (5) remove app1 > To solve this problem, we should make sure schedulableEntities can only be > affected by add or remove app attempt, new entity should not be added into > schedulableEntities by reordering process. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > schedulableEntities.add(schedulableEntity); > } > {code} > Related codes above can be improved as follow to make sure only existent > entity can be re-add into schedulableEntities. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > boolean exists = schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > if (exists) { > schedulableEntities.add(schedulableEntity); > } else { > LOG.info("Skip reordering non-existent schedulable entity: " > + schedulableEntity.getId()); > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6729) Clarify documentation on how to enable cgroup support
[ https://issues.apache.org/jira/browse/YARN-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-6729: -- Summary: Clarify documentation on how to enable cgroup support (was: NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used) > Clarify documentation on how to enable cgroup support > - > > Key: YARN-6729 > URL: https://issues.apache.org/jira/browse/YARN-6729 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Yufei Gu >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-6729-trunk.001.patch > > > NM percentage-physical-cpu-limit is not honored in > DefaultLCEResourcesHandler, which may cause container cpu usage calculation > issue. e.g. container vcore usage is potentially more than 100% if > percentage-physical-cpu-limit is set to a value less than 100. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8854) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/YARN-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8854: - Fix Version/s: 3.3.0 > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: YARN-8854 > URL: https://issues.apache.org/jira/browse/YARN-8854 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akhil PB >Assignee: Akhil PB >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-8854.001.patch, YARN-8854.002.patch, > YARN-8854.003.patch, YARN-8854.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8854) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/YARN-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8854: - Summary: Upgrade jquery datatable version references to v1.10.19 (was: [Hadoop YARN Common] Update jquery datatable version references) > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: YARN-8854 > URL: https://issues.apache.org/jira/browse/YARN-8854 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akhil PB >Assignee: Akhil PB >Priority: Critical > Attachments: YARN-8854.001.patch, YARN-8854.002.patch, > YARN-8854.003.patch, YARN-8854.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app
[ https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8958: --- Attachment: YARN-8958.002.patch > Schedulable entities leak in fair ordering policy when recovering containers > between remove app attempt and remove app > -- > > Key: YARN-8958 > URL: https://issues.apache.org/jira/browse/YARN-8958 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8958.001.patch, YARN-8958.002.patch > > > We found a NPE in ClientRMService#getApplications when querying apps with > specified queue. The cause is that there is one app which can't be found by > calling RMContextImpl#getRMApps(is finished and swapped out of memory) but > still can be queried from fair ordering policy. > To reproduce schedulable entities leak in fair ordering policy: > (1) create app1 and launch container1 on node1 > (2) restart RM > (3) remove app1 attempt, app1 is removed from the schedulable entities. > (4) recover container1, then the state of contianer1 is changed to COMPLETED, > app1 is bring back to entitiesToReorder after container released, then app1 > will be added back into schedulable entities after calling > FairOrderingPolicy#getAssignmentIterator by scheduler. > (5) remove app1 > To solve this problem, we should make sure schedulableEntities can only be > affected by add or remove app attempt, new entity should not be added into > schedulableEntities by reordering process. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > schedulableEntities.add(schedulableEntity); > } > {code} > Related codes above can be improved as follow to make sure only existent > entity can be re-add into schedulableEntities. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > boolean exists = schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > if (exists) { > schedulableEntities.add(schedulableEntity); > } else { > LOG.info("Skip reordering non-existent schedulable entity: " > + schedulableEntity.getId()); > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8905) [Router] Add JvmMetricsInfo and pause monitor
[ https://issues.apache.org/jira/browse/YARN-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669075#comment-16669075 ] Hadoop QA commented on YARN-8905: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 30s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8905 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946185/YARN-8905-005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4310e2a7660c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 62d98ca | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22380/testReport/ | | Max. process+thread count | 705 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22380/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [Router] Add JvmMetricsInfo and pause monitor > --
[jira] [Updated] (YARN-8778) Add Command Line interface to invoke interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8778: Attachment: YARN-8778.007.patch > Add Command Line interface to invoke interactive docker shell > - > > Key: YARN-8778 > URL: https://issues.apache.org/jira/browse/YARN-8778 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8778.001.patch, YARN-8778.002.patch, > YARN-8778.003.patch, YARN-8778.004.patch, YARN-8778.005.patch, > YARN-8778.006.patch, YARN-8778.007.patch > > > CLI will be the mandatory interface we are providing for a user to use > interactive docker shell feature. We will need to create a new class > “InteractiveDockerShellCLI” to read command line into the servlet and pass > all the way down to docker executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8776) Container Executor change to create stdin/stdout pipeline
[ https://issues.apache.org/jira/browse/YARN-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8776: Attachment: YARN-8776.006.patch > Container Executor change to create stdin/stdout pipeline > - > > Key: YARN-8776 > URL: https://issues.apache.org/jira/browse/YARN-8776 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8776.001.patch, YARN-8776.002.patch, > YARN-8776.003.patch, YARN-8776.004.patch, YARN-8776.005.patch, > YARN-8776.006.patch > > > The pipeline is built to connect the stdin/stdout channel from WebSocket > servlet through container-executor to docker executor. So when the WebSocket > servlet is started, we need to invoke container-executor “dockerExec” method > (which will be implemented) to create a new docker executor and use “docker > exec -it $ContainerId” command which executes an interactive bash shell on > the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8914: Attachment: YARN-8914.005.patch > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch, YARN-8914.002.patch, > YARN-8914.003.patch, YARN-8914.004.patch, YARN-8914.005.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app
[ https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668954#comment-16668954 ] Hadoop QA commented on YARN-8958: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 2s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}161m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.policy.TestFairOrderingPolicy | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8958 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946177/YARN-8958.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8e70d66ca204 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7757331 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22379/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22379/testReport/ | | Max. process+thread count | 894 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/had
[jira] [Updated] (YARN-8575) Avoid committing allocation proposal to unavailable nodes in async scheduling
[ https://issues.apache.org/jira/browse/YARN-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8575: --- Affects Version/s: (was: 3.1.2) (was: 3.2.0) > Avoid committing allocation proposal to unavailable nodes in async scheduling > - > > Key: YARN-8575 > URL: https://issues.apache.org/jira/browse/YARN-8575 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8575.001.patch, YARN-8575.002.patch > > > Recently we found a new error as follows: > {noformat} > ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > node to unreserve doesn't exist, nodeid: host1:45454 > {noformat} > Reproduce this problem: > (1) Create a reserve proposal for app1 on node1 > (2) node1 is successfully decommissioned and removed from node tracker > (3) Try to commit this outdated reserve proposal, it will be accepted and > applied. > This error may be occurred after decommissioning some NMs. The application > who print the error log will always have a reserved container on non-exist > (decommissioned) NM and the pending request will never be satisfied. > To solve this problem, scheduler should check node state in > FiCaSchedulerApp#accept to avoid committing outdated proposals on unusable > nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6729) NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used
[ https://issues.apache.org/jira/browse/YARN-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668663#comment-16668663 ] Shane Kumpf commented on YARN-6729: --- Thanks, [~tangzhankun]! I'm good with that approach. I'll commit this shortly unless there are other comments. {quote}One thing in my mind is that once we put "yarn.nodemanager.resource.cpu.enabled" into the document. Does it mean that these features are stable? Because you know, for the historical reason, that settings are marked "unstable". {quote} Enabling CgroupsLCEResourcesHandler follows the same code path, so I don't have much concern. We can discuss the annotations and removal of the deprecated code as part of YARN-8924. > NM percentage-physical-cpu-limit should be always 100 if > DefaultLCEResourcesHandler is used > --- > > Key: YARN-6729 > URL: https://issues.apache.org/jira/browse/YARN-6729 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Yufei Gu >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-6729-trunk.001.patch > > > NM percentage-physical-cpu-limit is not honored in > DefaultLCEResourcesHandler, which may cause container cpu usage calculation > issue. e.g. container vcore usage is potentially more than 100% if > percentage-physical-cpu-limit is set to a value less than 100. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8961) [UI2] Flow Run End Time shows 'Invalid date'
[ https://issues.apache.org/jira/browse/YARN-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan reassigned YARN-8961: Assignee: Akhil PB > [UI2] Flow Run End Time shows 'Invalid date' > > > Key: YARN-8961 > URL: https://issues.apache.org/jira/browse/YARN-8961 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Invalid_Date.png > > > End Time for Flow Runs is shown as *Invalid date* for runs that are in > progress. This should be shown as *N/A* just like how it is shown for 'CPU > VCores' and 'Memory Used'. Attached relevant screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8961) [UI2] Flow Run End Time shows 'Invalid date'
Charan Hebri created YARN-8961: -- Summary: [UI2] Flow Run End Time shows 'Invalid date' Key: YARN-8961 URL: https://issues.apache.org/jira/browse/YARN-8961 Project: Hadoop YARN Issue Type: Bug Reporter: Charan Hebri Attachments: Invalid_Date.png End Time for Flow Runs is shown as *Invalid date* for runs that are in progress. This should be shown as *N/A* just like how it is shown for 'CPU VCores' and 'Memory Used'. Attached relevant screenshot. cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8960) Can't get submarine service status using the command of "yarn app -status" under security environment
[ https://issues.apache.org/jira/browse/YARN-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zac Zhou updated YARN-8960: --- Attachment: YARN-8960.002.patch > Can't get submarine service status using the command of "yarn app -status" > under security environment > - > > Key: YARN-8960 > URL: https://issues.apache.org/jira/browse/YARN-8960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > Attachments: YARN-8960.001.patch, YARN-8960.002.patch > > > After submitting a submarine job, we tried to get service status using the > following command: > yarn app -status ${service_name} > But we got the following error: > HTTP error code : 500 > > The stack in resourcemanager log is : > ERROR org.apache.hadoop.yarn.service.webapp.ApiServer: Get service failed: {} > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748) > at > org.apache.hadoop.yarn.service.webapp.ApiServer.getServiceFromClient(ApiServer.java:800) > at > org.apache.hadoop.yarn.service.webapp.ApiServer.getService(ApiServer.java:186) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker > ._dispatch(AbstractResourceMethodDispatchProvider.java:205) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodD > ispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:179) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) > at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(Authentica
[jira] [Commented] (YARN-8960) Can't get submarine service status using the command of "yarn app -status" under security environment
[ https://issues.apache.org/jira/browse/YARN-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668550#comment-16668550 ] Zac Zhou commented on YARN-8960: The root cause is that the submarine service doesn't have keytab and principal configuration. we need to add parameters support that. A patch will be attached shortly > Can't get submarine service status using the command of "yarn app -status" > under security environment > - > > Key: YARN-8960 > URL: https://issues.apache.org/jira/browse/YARN-8960 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zac Zhou >Priority: Major > > After submitting a submarine job, we tried to get service status using the > following command: > yarn app -status ${service_name} > But we got the following error: > HTTP error code : 500 > > The stack in resourcemanager log is : > ERROR org.apache.hadoop.yarn.service.webapp.ApiServer: Get service failed: {} > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748) > at > org.apache.hadoop.yarn.service.webapp.ApiServer.getServiceFromClient(ApiServer.java:800) > at > org.apache.hadoop.yarn.service.webapp.ApiServer.getService(ApiServer.java:186) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker > ._dispatch(AbstractResourceMethodDispatchProvider.java:205) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodD > ispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:179) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) > at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter
[jira] [Created] (YARN-8960) Can't get submarine service status using the command of "yarn app -status" under security environment
Zac Zhou created YARN-8960: -- Summary: Can't get submarine service status using the command of "yarn app -status" under security environment Key: YARN-8960 URL: https://issues.apache.org/jira/browse/YARN-8960 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zac Zhou Assignee: Zac Zhou After submitting a submarine job, we tried to get service status using the following command: yarn app -status ${service_name} But we got the following error: HTTP error code : 500 The stack in resourcemanager log is : ERROR org.apache.hadoop.yarn.service.webapp.ApiServer: Get service failed: {} java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1748) at org.apache.hadoop.yarn.service.webapp.ApiServer.getServiceFromClient(ApiServer.java:800) at org.apache.hadoop.yarn.service.webapp.ApiServer.getService(ApiServer.java:186) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker ._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodD ispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:179) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:6 44) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:5 92) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1610) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty
[jira] [Commented] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668429#comment-16668429 ] Bibin A Chundatt commented on YARN-8959: for i in {1..100}; do result=`mvn test -Dtest=TestContainerResizing | grep BUILD`; echo $i$result ;cat target/surefire-reports/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.txt | grep TestContainerResizing ;mv target/surefire-reports target/surefire-reports$i; done > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T reassigned YARN-8959: --- Assignee: Bilwa S T > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8959) TestContainerResizing fails randomly
Bibin A Chundatt created YARN-8959: -- Summary: TestContainerResizing fails randomly Key: YARN-8959 URL: https://issues.apache.org/jira/browse/YARN-8959 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer {code} testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) Time elapsed: 0.348 s <<< FAILURE! java.lang.AssertionError: expected:<1024> but was:<3072> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) {code} org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted {code} testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) Time elapsed: 0.445 s <<< FAILURE! java.lang.AssertionError: expected:<1024> but was:<7168> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) {code} org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer {code} testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) Time elapsed: 0.321 s <<< FAILURE! java.lang.AssertionError: expected:<1024> but was:<2048> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668389#comment-16668389 ] Weiwei Yang commented on YARN-8902: --- [~leftnoteasy], [~sunilg], please help to review. > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668387#comment-16668387 ] Weiwei Yang commented on YARN-8902: --- v6 patch uploaded, major change is to remove the async processing in the volume processor, lets keep it simple in the 1st version. We do sync volume processing like following # AppMaster calls allocate() # the request gets to the volume-processor # if volume resource is found in this request, before entering next processor, do #4, #5, #6 # submit a provisioning task to the VolumeManager and wait for its execution, or timeout in 10 sec # once the provisioning task is done (all volumes are provisioned), then call next processor’s allocate() call # if task is failed (some volumes failed to be provisioned), throw a exception to notify the client Hope this makes sense. Thanks > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8902: -- Attachment: YARN-8902.006.patch > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8905) [Router] Add JvmMetricsInfo and pause monitor
[ https://issues.apache.org/jira/browse/YARN-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-8905: Attachment: YARN-8905-005.patch > [Router] Add JvmMetricsInfo and pause monitor > - > > Key: YARN-8905 > URL: https://issues.apache.org/jira/browse/YARN-8905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-8905-001.patch, YARN-8905-002.patch, > YARN-8905-003.patch, YARN-8905-004.patch, YARN-8905-005.patch > > > Similar to resourcemanager and nodemanager serivce we can add JvmMetricsInfo > to router service too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8897) LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability
[ https://issues.apache.org/jira/browse/YARN-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-8897: Attachment: YARN-8897-003.patch > LoadBasedRouterPolicy throws "NPE" in case of sub cluster unavailability > - > > Key: YARN-8897 > URL: https://issues.apache.org/jira/browse/YARN-8897 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router >Reporter: Akshay Agarwal >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-8897-001.patch, YARN-8897-002.patch, > YARN-8897-003.patch > > > If no sub clusters are available for "*Load Based Router Policy*" with > *cluster weight* as *1* in Router Based Federation Setup , throwing > "*NullPointerException*". > > *Exception Details:* > {code:java} > java.lang.NullPointerException: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.federation.policies.router.LoadBasedRouterPolicy.getHomeSubcluster(LoadBasedRouterPolicy.java:99) > at > org.apache.hadoop.yarn.server.federation.policies.RouterPolicyFacade.getHomeSubcluster(RouterPolicyFacade.java:204) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.submitApplication(FederationClientInterceptor.java:362) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:218) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:282) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:579) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:297) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy15.submitApplication(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:288) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:300) > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:331) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:254) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588) > at > org.apache.hadoop.exam
[jira] [Commented] (YARN-8905) [Router] Add JvmMetricsInfo and pause monitor
[ https://issues.apache.org/jira/browse/YARN-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668352#comment-16668352 ] Bibin A Chundatt commented on YARN-8905: [~BilwaST] Please fix checkstyle,whitespace and license issue > [Router] Add JvmMetricsInfo and pause monitor > - > > Key: YARN-8905 > URL: https://issues.apache.org/jira/browse/YARN-8905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-8905-001.patch, YARN-8905-002.patch, > YARN-8905-003.patch, YARN-8905-004.patch > > > Similar to resourcemanager and nodemanager serivce we can add JvmMetricsInfo > to router service too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app
[ https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8958: --- Attachment: YARN-8958.001.patch > Schedulable entities leak in fair ordering policy when recovering containers > between remove app attempt and remove app > -- > > Key: YARN-8958 > URL: https://issues.apache.org/jira/browse/YARN-8958 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8958.001.patch > > > We found a NPE in ClientRMService#getApplications when querying apps with > specified queue. The cause is that there is one app which can't be found by > calling RMContextImpl#getRMApps(is finished and swapped out of memory) but > still can be queried from fair ordering policy. > To reproduce schedulable entities leak in fair ordering policy: > (1) create app1 and launch container1 on node1 > (2) restart RM > (3) remove app1 attempt, app1 is removed from the schedulable entities. > (4) recover container1, then the state of contianer1 is changed to COMPLETED, > app1 is bring back to entitiesToReorder after container released, then app1 > will be added back into schedulable entities after calling > FairOrderingPolicy#getAssignmentIterator by scheduler. > (5) remove app1 > To solve this problem, we should make sure schedulableEntities can only be > affected by add or remove app attempt, new entity should not be added into > schedulableEntities by reordering process. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > schedulableEntities.add(schedulableEntity); > } > {code} > Related codes above can be improved as follow to make sure only existent > entity can be re-add into schedulableEntities. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > boolean exists = schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > if (exists) { > schedulableEntities.add(schedulableEntity); > } else { > LOG.info("Skip reordering non-existent schedulable entity: " > + schedulableEntity.getId()); > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app
[ https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668305#comment-16668305 ] Tao Yang commented on YARN-8958: Attached v1 patch for review. > Schedulable entities leak in fair ordering policy when recovering containers > between remove app attempt and remove app > -- > > Key: YARN-8958 > URL: https://issues.apache.org/jira/browse/YARN-8958 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8958.001.patch > > > We found a NPE in ClientRMService#getApplications when querying apps with > specified queue. The cause is that there is one app which can't be found by > calling RMContextImpl#getRMApps(is finished and swapped out of memory) but > still can be queried from fair ordering policy. > To reproduce schedulable entities leak in fair ordering policy: > (1) create app1 and launch container1 on node1 > (2) restart RM > (3) remove app1 attempt, app1 is removed from the schedulable entities. > (4) recover container1, then the state of contianer1 is changed to COMPLETED, > app1 is bring back to entitiesToReorder after container released, then app1 > will be added back into schedulable entities after calling > FairOrderingPolicy#getAssignmentIterator by scheduler. > (5) remove app1 > To solve this problem, we should make sure schedulableEntities can only be > affected by add or remove app attempt, new entity should not be added into > schedulableEntities by reordering process. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > schedulableEntities.add(schedulableEntity); > } > {code} > Related codes above can be improved as follow to make sure only existent > entity can be re-add into schedulableEntities. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > boolean exists = schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > if (exists) { > schedulableEntities.add(schedulableEntity); > } else { > LOG.info("Skip reordering non-existent schedulable entity: " > + schedulableEntity.getId()); > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app
Tao Yang created YARN-8958: -- Summary: Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app Key: YARN-8958 URL: https://issues.apache.org/jira/browse/YARN-8958 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.2.1 Reporter: Tao Yang Assignee: Tao Yang We found a NPE in ClientRMService#getApplications when querying apps with specified queue. The cause is that there is one app which can't be found by calling RMContextImpl#getRMApps(is finished and swapped out of memory) but still can be queried from fair ordering policy. To reproduce schedulable entities leak in fair ordering policy: (1) create app1 and launch container1 on node1 (2) restart RM (3) remove app1 attempt, app1 is removed from the schedulable entities. (4) recover container1, then the state of contianer1 is changed to COMPLETED, app1 is bring back to entitiesToReorder after container released, then app1 will be added back into schedulable entities after calling FairOrderingPolicy#getAssignmentIterator by scheduler. (5) remove app1 To solve this problem, we should make sure schedulableEntities can only be affected by add or remove app attempt, new entity should not be added into schedulableEntities by reordering process. {code:java} protected void reorderSchedulableEntity(S schedulableEntity) { //remove, update comparable data, and reinsert to update position in order schedulableEntities.remove(schedulableEntity); updateSchedulingResourceUsage( schedulableEntity.getSchedulingResourceUsage()); schedulableEntities.add(schedulableEntity); } {code} Related codes above can be improved as follow to make sure only existent entity can be re-add into schedulableEntities. {code:java} protected void reorderSchedulableEntity(S schedulableEntity) { //remove, update comparable data, and reinsert to update position in order boolean exists = schedulableEntities.remove(schedulableEntity); updateSchedulingResourceUsage( schedulableEntity.getSchedulingResourceUsage()); if (exists) { schedulableEntities.add(schedulableEntity); } else { LOG.info("Skip reordering non-existent schedulable entity: " + schedulableEntity.getId()); } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org