[jira] [Commented] (YARN-8539) TimelineWebService#getUser null leads to empty entities list
[ https://issues.apache.org/jira/browse/YARN-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544824#comment-16544824 ] Shen Yinjie commented on YARN-8539: --- secure cluster enabling Acls also has this issue. > TimelineWebService#getUser null leads to empty entities list > > > Key: YARN-8539 > URL: https://issues.apache.org/jira/browse/YARN-8539 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Reporter: Shen Yinjie >Priority: Major > > When we integrate tez-ui with timeline server and set yarn.acl.enabled=true. > tez-ui will invoke the timeline rest ** interface(ws/v1/timeline/TEZ_DAG_ID) > to get all dags . But tez-ui shows "no records available" . > after some digging, I find when tez-ui invoke > ".../ws/v1/timeline/TEZ_DAG_ID". > TimelineWebService#getUser(HttpServletRequest req) returns callerUgi = null > In TimelineACLsManager#checkAccess() > {code:java} > .. > if (callerUGI != null > && (adminAclsManager.isAdmin(callerUGI) || > callerUGI.getShortUserName().equals(owner) || > domainACL.isUserAllowed(callerUGI))) { > return true; > } > return false; > } > {code} > Finally, Tez ui get nothing because of couldn't pass this checkAccess(). > I also refer to the similar code in RMWebServices > {code} protected Boolean hasAccess(RMApp app, HttpServletRequest hsr) { > // Check for the authorization. > UserGroupInformation callerUGI = getCallerUserGroupInformation(hsr, true); > .. > if (callerUGI != null > && !(this.rm.getApplicationACLsManager().checkAccess(callerUGI, > ApplicationAccessType.VIEW_APP, app.getUser(), > app.getApplicationId()) > || this.rm.getQueueACLsManager().checkAccess(callerUGI, > QueueACL.ADMINISTER_QUEUE, app, hsr.getRemoteAddr(), > forwardedAddresses))) { > return false; > } > return true; > } > {code} > > when callerUgi= null, hasAcces() returns true. > So , I made a similar fix for TimelineWebServices. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8539) TimelineWebService#getUser null leads to empty entities list
[ https://issues.apache.org/jira/browse/YARN-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S resolved YARN-8539. - Resolution: Invalid [~shenyinjie] I believe you are facing this issue in non-secure cluster by enabling Acls. So this is issue with your cluster set up which doesn't contains proper filters to get user names. You should configure hadoop.http.filter.initializers=org.apache.hadoop.security.AuthenticationFilterInitializer in core-site.xml and use the REST API with query parameter i.e _?user.name=shenyinjie_. This should solve your issue I am closing the JIRA as invalid. If you still couldn't achieve it, feel free to reopen and discuss it. > TimelineWebService#getUser null leads to empty entities list > > > Key: YARN-8539 > URL: https://issues.apache.org/jira/browse/YARN-8539 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Reporter: Shen Yinjie >Priority: Major > > When we integrate tez-ui with timeline server and set yarn.acl.enabled=true. > tez-ui will invoke the timeline rest ** interface(ws/v1/timeline/TEZ_DAG_ID) > to get all dags . But tez-ui shows "no records available" . > after some digging, I find when tez-ui invoke > ".../ws/v1/timeline/TEZ_DAG_ID". > TimelineWebService#getUser(HttpServletRequest req) returns callerUgi = null > In TimelineACLsManager#checkAccess() > {code:java} > .. > if (callerUGI != null > && (adminAclsManager.isAdmin(callerUGI) || > callerUGI.getShortUserName().equals(owner) || > domainACL.isUserAllowed(callerUGI))) { > return true; > } > return false; > } > {code} > Finally, Tez ui get nothing because of couldn't pass this checkAccess(). > I also refer to the similar code in RMWebServices > {code} protected Boolean hasAccess(RMApp app, HttpServletRequest hsr) { > // Check for the authorization. > UserGroupInformation callerUGI = getCallerUserGroupInformation(hsr, true); > .. > if (callerUGI != null > && !(this.rm.getApplicationACLsManager().checkAccess(callerUGI, > ApplicationAccessType.VIEW_APP, app.getUser(), > app.getApplicationId()) > || this.rm.getQueueACLsManager().checkAccess(callerUGI, > QueueACL.ADMINISTER_QUEUE, app, hsr.getRemoteAddr(), > forwardedAddresses))) { > return false; > } > return true; > } > {code} > > when callerUgi= null, hasAcces() returns true. > So , I made a similar fix for TimelineWebServices. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8539) TimelineWebService#getUser null leads to empty entities list
[ https://issues.apache.org/jira/browse/YARN-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated YARN-8539: -- Summary: TimelineWebService#getUser null leads to empty entities list (was: TimelineWebService#getUser from HttpServletRequest may be null) > TimelineWebService#getUser null leads to empty entities list > > > Key: YARN-8539 > URL: https://issues.apache.org/jira/browse/YARN-8539 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Reporter: Shen Yinjie >Priority: Major > > When we integrate tez-ui with timeline server and set yarn.acl.enabled=true. > tez-ui will invoke the timeline rest ** interface(ws/v1/timeline/TEZ_DAG_ID) > to get all dags . But tez-ui shows "no records available" . > after some digging, I find when tez-ui invoke > ".../ws/v1/timeline/TEZ_DAG_ID". > TimelineWebService#getUser(HttpServletRequest req) returns callerUgi = null > In TimelineACLsManager#checkAccess() > {code:java} > .. > if (callerUGI != null > && (adminAclsManager.isAdmin(callerUGI) || > callerUGI.getShortUserName().equals(owner) || > domainACL.isUserAllowed(callerUGI))) { > return true; > } > return false; > } > {code} > Finally, Tez ui get nothing because of couldn't pass this checkAccess(). > I also refer to the similar code in RMWebServices > {code} protected Boolean hasAccess(RMApp app, HttpServletRequest hsr) { > // Check for the authorization. > UserGroupInformation callerUGI = getCallerUserGroupInformation(hsr, true); > .. > if (callerUGI != null > && !(this.rm.getApplicationACLsManager().checkAccess(callerUGI, > ApplicationAccessType.VIEW_APP, app.getUser(), > app.getApplicationId()) > || this.rm.getQueueACLsManager().checkAccess(callerUGI, > QueueACL.ADMINISTER_QUEUE, app, hsr.getRemoteAddr(), > forwardedAddresses))) { > return false; > } > return true; > } > {code} > > when callerUgi= null, hasAcces() returns true. > So , I made a similar fix for TimelineWebServices. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8539) TimelineWebService#getUser from HttpServletRequest may be null
Shen Yinjie created YARN-8539: - Summary: TimelineWebService#getUser from HttpServletRequest may be null Key: YARN-8539 URL: https://issues.apache.org/jira/browse/YARN-8539 Project: Hadoop YARN Issue Type: Bug Components: timelineservice Reporter: Shen Yinjie When we integrate tez-ui with timeline server and set yarn.acl.enabled=true. tez-ui will invoke the timeline rest ** interface(ws/v1/timeline/TEZ_DAG_ID) to get all dags . But tez-ui shows "no records available" . after some digging, I find when tez-ui invoke ".../ws/v1/timeline/TEZ_DAG_ID". TimelineWebService#getUser(HttpServletRequest req) returns callerUgi = null In TimelineACLsManager#checkAccess() {code:java} .. if (callerUGI != null && (adminAclsManager.isAdmin(callerUGI) || callerUGI.getShortUserName().equals(owner) || domainACL.isUserAllowed(callerUGI))) { return true; } return false; } {code} Finally, Tez ui get nothing because of couldn't pass this checkAccess(). I also refer to the similar code in RMWebServices {code} protected Boolean hasAccess(RMApp app, HttpServletRequest hsr) { // Check for the authorization. UserGroupInformation callerUGI = getCallerUserGroupInformation(hsr, true); .. if (callerUGI != null && !(this.rm.getApplicationACLsManager().checkAccess(callerUGI, ApplicationAccessType.VIEW_APP, app.getUser(), app.getApplicationId()) || this.rm.getQueueACLsManager().checkAccess(callerUGI, QueueACL.ADMINISTER_QUEUE, app, hsr.getRemoteAddr(), forwardedAddresses))) { return false; } return true; } {code} when callerUgi= null, hasAcces() returns true. So , I made a similar fix for TimelineWebServices. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8500) Use hbase shaded jars
[ https://issues.apache.org/jira/browse/YARN-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544806#comment-16544806 ] Vrushali C commented on YARN-8500: -- Wondering why jenkins has not picked up this patch > Use hbase shaded jars > - > > Key: YARN-8500 > URL: https://issues.apache.org/jira/browse/YARN-8500 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Vrushali C >Priority: Major > Attachments: YARN-8500.0001.patch > > > Move to using hbase shaded jars in atsv2 > Related jira YARN-7213 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544759#comment-16544759 ] Weiwei Yang edited comment on YARN-7494 at 7/16/18 3:32 AM: Hi [~sunilg] Please see my comments below: ApplicationSchedulingConfig * Is "yarn.scheduler.capacity.multi-node-sorting.policy" necessary? We can use "yarn.scheduler.capacity.root.multi-node-sorting.policy" to set global sorting policy for CS right? CapacitySchedulerConfiguration * line 2197 - 2201: the check is not necessary, policyClassName cannot be null as there is default value given. * line 2203: looks like the sorting interval is a global setting, is it better to set per-policy? E.g "yarn.scheduler.capacity.multi-node-sorting.policy..sorting-task.interval.ms". This could be a followup task if you agree with this suggestion. MultiNodeLookupPolicy * API: Iterator getPreferredNodeIterator(Collection nodes, String partition); It looks like the the purpose of adding first argument (a collection of nodes) is to support in-place sorting policy, however from the API level, it is confusing that in {{ResourceUsageBasedMultiNodeLookupPolicy}} this argument is not used at all. For consistency, should we make sure the iterator that {{getPreferredNodeIterator}} returns only iterates over a sub-set of the candidates collection? LocalityAppPlacementAllocator * line 81-85: can we add a debug message here to indicate what kind of policy this app placement allocator used? TestFifoScheduler/TestNMReconnect/TestQueueParsing/TestReservations/TestRMWebApp/TestUtils * I commented on this earlier, these UT classes were just modified to add a mock MultiNodeSortingManager to the context, I don't think this is necessary, can we remove them? Minor ones * are the changes in ActivitiesLogger/ActivitiesManager also for this JIRA? Seems they are for a different purpose, should we separate them to another ticket ? * There a lot of classes have import problems, such as unused imports and * import, please take a look and fix them. Thanks was (Author: cheersyang): Hi [~sunilg] ApplicationSchedulingConfig * Is "yarn.scheduler.capacity.multi-node-sorting.policy" necessary? We can use "yarn.scheduler.capacity.root.multi-node-sorting.policy" to set global sorting policy for CS right? CapacitySchedulerConfiguration * line 2197 - 2201: the check is not necessary, policyClassName cannot be null as there is default value given. * line 2203: looks like the sorting interval is a global setting, is it better to set per-policy? E.g "yarn.scheduler.capacity.multi-node-sorting.policy..sorting-task.interval.ms". This could be a followup task if you agree with this suggestion. MultiNodeLookupPolicy * API: Iterator getPreferredNodeIterator(Collection nodes, String partition); It looks like the the purpose of adding first argument (a collection of nodes) is to support in-place sorting policy, however from the API level, it is confusing that in \{{ResourceUsageBasedMultiNodeLookupPolicy}} this argument is not used at all. For consistency, should we make sure the iterator that \{{getPreferredNodeIterator}} returns only iterates over a sub-set of the candidates collection? LocalityAppPlacementAllocator * line 81-85: can we add a debug message here to indicate what kind of policy this app placement allocator used? TestFifoScheduler/TestNMReconnect/TestQueueParsing/TestReservations/TestRMWebApp/TestUtils * I commented on this earlier, these UT classes were just modified to add a mock MultiNodeSortingManager to the context, I don't think this is necessary, can we remove them? Minor ones * are the changes in ActivitiesLogger/ActivitiesManager also for this JIRA? Seems they are for a different purpose, should we separate them to another ticket ? * There a lot of classes have import problems, such as unused imports and * import, please take a look and fix them. Thanks > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.v0.patch, YARN-7494.v1.patch, > multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544759#comment-16544759 ] Weiwei Yang commented on YARN-7494: --- Hi [~sunilg] ApplicationSchedulingConfig * Is "yarn.scheduler.capacity.multi-node-sorting.policy" necessary? We can use "yarn.scheduler.capacity.root.multi-node-sorting.policy" to set global sorting policy for CS right? CapacitySchedulerConfiguration * line 2197 - 2201: the check is not necessary, policyClassName cannot be null as there is default value given. * line 2203: looks like the sorting interval is a global setting, is it better to set per-policy? E.g "yarn.scheduler.capacity.multi-node-sorting.policy..sorting-task.interval.ms". This could be a followup task if you agree with this suggestion. MultiNodeLookupPolicy * API: Iterator getPreferredNodeIterator(Collection nodes, String partition); It looks like the the purpose of adding first argument (a collection of nodes) is to support in-place sorting policy, however from the API level, it is confusing that in \{{ResourceUsageBasedMultiNodeLookupPolicy}} this argument is not used at all. For consistency, should we make sure the iterator that \{{getPreferredNodeIterator}} returns only iterates over a sub-set of the candidates collection? LocalityAppPlacementAllocator * line 81-85: can we add a debug message here to indicate what kind of policy this app placement allocator used? TestFifoScheduler/TestNMReconnect/TestQueueParsing/TestReservations/TestRMWebApp/TestUtils * I commented on this earlier, these UT classes were just modified to add a mock MultiNodeSortingManager to the context, I don't think this is necessary, can we remove them? Minor ones * are the changes in ActivitiesLogger/ActivitiesManager also for this JIRA? Seems they are for a different purpose, should we separate them to another ticket ? * There a lot of classes have import problems, such as unused imports and * import, please take a look and fix them. Thanks > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, > YARN-7494.009.patch, YARN-7494.v0.patch, YARN-7494.v1.patch, > multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544756#comment-16544756 ] Eric Yang edited comment on YARN-8523 at 7/16/18 3:22 AM: -- [~divayjindal], thank you for volunteering. You are welcome to work on this. Hadoop is built with community contributions. There are others who are also interested to work on this. This JIRA would be a great place for the collaboration. Submit your proposal and patches, and interested parties can discuss the details. was (Author: eyang): You are welcome to work on this. Hadoop is built with community contributions. There are others who are also interested to work on this. This JIRA would be a great place for the collaboration. Submit your proposal and patches, and interested parties can discuss the details. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8523) Interactive docker shell
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544756#comment-16544756 ] Eric Yang commented on YARN-8523: - You are welcome to work on this. Hadoop is built with community contributions. There are others who are also interested to work on this. This JIRA would be a great place for the collaboration. Submit your proposal and patches, and interested parties can discuss the details. > Interactive docker shell > > > Key: YARN-8523 > URL: https://issues.apache.org/jira/browse/YARN-8523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > Labels: Docker > > Some application might require interactive unix commands executions to carry > out operations. Container-executor can interface with docker exec to debug > or analyze docker containers while the application is running. It would be > nice to support an API to invoke docker exec to perform unix commands and > report back the output to application master. Application master can > distribute and aggregate execution of the commands to record in application > master log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544742#comment-16544742 ] Chen Yufei edited comment on YARN-8513 at 7/16/18 2:39 AM: --- [~yuanbo] I've uploaded jstack and top log when the problem appeared yesterday. jstack log is captured for 5 times thus 5 log files. [^top-during-lock.log] is captured when RM is not responding to requests. [^top-when-normal.log] is captured today and RM is running normally. was (Author: cyfdecyf): [~yuanbo] I've uploaded jstack and top log when the problem appears yesterday. jstack log are captured for 5 times thus 5 log files. [^top-during-lock.log] is captured when RM is not responding to requests. [^top-when-normal.log] is captured today and RM is running normally. > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.9.1 > Environment: Ubuntu 14.04.5 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544742#comment-16544742 ] Chen Yufei commented on YARN-8513: -- [~yuanbo] I've uploaded jstack and top log when the problem appears yesterday. jstack log are captured for 5 times thus 5 log files. [^top-during-lock.log] is captured when RM is not responding to requests. [^top-when-normal.log] is captured today and RM is running normally. > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.9.1 > Environment: Ubuntu 14.04.5 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Yufei updated YARN-8513: - Attachment: top-when-normal.log top-during-lock.log jstack-5.log jstack-4.log jstack-3.log jstack-2.log jstack-1.log > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.9.1 > Environment: Ubuntu 14.04.5 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8538) Fix valgrind leak check on container executor
[ https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544709#comment-16544709 ] genericqa commented on YARN-8538: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 39m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 55s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 8 new + 0 unchanged - 0 fixed = 8 total (was 0) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 28s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8538 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931716/YARN-8538.1.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 2f9e8f686111 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5074ca9 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | cc | https://builds.apache.org/job/PreCommit-YARN-Build/21257/artifact/out/diff-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21257/testReport/ | | Max. process+thread count | 302 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21257/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix valgrind leak check on container executor > - > > Key: YARN-8538 > URL: https://issues.apache.org/jira/browse/YARN-8538 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8538.1.patch > > > Running valgrind --leak-check=yes ./cetest gives us this: >
[jira] [Updated] (YARN-8538) Fix valgrind leak check on container executor
[ https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8538: - Attachment: YARN-8538.1.patch > Fix valgrind leak check on container executor > - > > Key: YARN-8538 > URL: https://issues.apache.org/jira/browse/YARN-8538 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8538.1.patch > > > Running valgrind --leak-check=yes ./cetest gives us this: > {noformat} > ==14094== LEAK SUMMARY: > ==14094== definitely lost: 964,351 bytes in 1,154 blocks > ==14094== indirectly lost: 75,506 bytes in 3,777 blocks > ==14094== possibly lost: 0 bytes in 0 blocks > ==14094== still reachable: 554 bytes in 22 blocks > ==14094== suppressed: 0 bytes in 0 blocks > ==14094== Reachable blocks (those to which a pointer was found) are not shown. > ==14094== To see them, rerun with: --leak-check=full --show-leak-kinds=all > ==14094== > ==14094== For counts of detected and suppressed errors, rerun with: -v > ==14094== ERROR SUMMARY: 373 errors from 373 contexts (suppressed: 0 from 0) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6482) TestSLSRunner runs but doesn't executed jobs (.json parsing issue)
[ https://issues.apache.org/jira/browse/YARN-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544471#comment-16544471 ] genericqa commented on YARN-6482: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 38m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 3s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.sls.TestSLSRunner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-6482 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931645/YARN-6482.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient | | uname | Linux 043624cffcd8 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 103f2ee | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21256/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21256/testReport/ | | Max. process+thread count | 457 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21256/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestSLSRunner runs but doesn't executed jobs (.json parsing issue) > -- > > Key: YARN-6482 > URL: https://issues.apache.org/jira/browse/YARN-6482 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Yuanbo Liu >Priority: Minor > Attachments: YARN-6482.001.patch > > >
[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544466#comment-16544466 ] Hudson commented on YARN-8434: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14576 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14576/]) YARN-8434. Update federation documentation of Nodemanager (bibinchundatt: rev 4523cc5637bc3558aa5796150b358ca8471773bb) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/Federation.md > Update federation documentation of Nodemanager configurations > - > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-8434.001.patch, YARN-8434.002.patch, > YARN-8434.003.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8498) Yarn NodeManager OOM Listener Fails Compilation on Ubuntu 18.04
[ https://issues.apache.org/jira/browse/YARN-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544409#comment-16544409 ] Bibin A Chundatt commented on YARN-8498: [~miklos.szeg...@cloudera.com] /[~haibo.chen] Could you help ? > Yarn NodeManager OOM Listener Fails Compilation on Ubuntu 18.04 > --- > > Key: YARN-8498 > URL: https://issues.apache.org/jira/browse/YARN-8498 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jack Bearden >Priority: Blocker > Labels: trunk > > While building this project, I ran into a few compilation errors here. The > first one was in this file: > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/oom-listener/impl/oom_listener_main.c > At the very end, during the compilation of the OOM test, it fails again: > > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/oom-listener/test/oom_listener_test_main.cc > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/oom-listener/test/oom_listener_test_main.cc:256:7: > error: ‘__WAIT_STATUS’ was not declared in this scope > __WAIT_STATUS mem_hog_status = {}; > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/oom-listener/test/oom_listener_test_main.cc:257:30: > error: ‘mem_hog_status’ was not declared in this scope > __pid_t exited0 = wait(mem_hog_status); > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/oom-listener/test/oom_listener_test_main.cc:275:21: > error: expected ‘;’ before ‘oom_listener_status’ > __WAIT_STATUS oom_listener_status = {}; > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/oom-listener/test/oom_listener_test_main.cc:276:30: > error: ‘oom_listener_status’ was not declared in this scope > __pid_t exited1 = wait(oom_listener_status); > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org