[jira] [Commented] (YARN-7912) While launching Native Service app from UI, consider service owner name from user.name query parameter
[ https://issues.apache.org/jira/browse/YARN-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802687#comment-17802687 ] Shilun Fan commented on YARN-7912: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > While launching Native Service app from UI, consider service owner name from > user.name query parameter > -- > > Key: YARN-7912 > URL: https://issues.apache.org/jira/browse/YARN-7912 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Sunil G >Priority: Major > > As per comments from [~eyang] in YARN-7827, > "For supporting knox, it would be good for javascript to detect the url > entering /ui2 and process [user.name|http://user.name/] property. If there > isn't one found, then proceed with ajax call to resource manager to find out > who is the current user to pass the parameter along the rest api calls." > This Jira will track to handle this. This is now pending feasibility check. > Thanks [~eyang] and [~jianhe] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7884) Race condition in registering YARN service in ZooKeeper
[ https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802688#comment-17802688 ] Shilun Fan commented on YARN-7884: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Race condition in registering YARN service in ZooKeeper > --- > > Key: YARN-7884 > URL: https://issues.apache.org/jira/browse/YARN-7884 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > In Kerberos enabled cluster, there seems to be a race condition for > registering YARN service. > Yarn-service znode creation seems to happen after AM started and reporting > back to update components information. For some reason, Yarnservice znode > should have access to create the znode, but reported NoAuth. > {code} > 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry > user accounts: sasl:hbase > 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default > system acls: > [1,s{'world,'anyone} > , 31,s{'sasl,'yarn} > , 31,s{'sasl,'jhs} > , 31,s{'sasl,'hdfs-demo} > , 31,s{'sasl,'rm} > , 31,s{'sasl,'hive} > ] > 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs > [31,s{'sasl,'hbase} > , 31,s{'sasl,'hbase} > ] > 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering > class org.apache.hadoop.yarn.service.component.ComponentEventType for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler > 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering > class > org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType > for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler > 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of > the thread pool size is 500 > 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service > as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) > 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: > class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - > Starting Socket Reader #1 for port 56859 > 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding > protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to > the server > 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server > Responder: starting > 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC > Server listener on 56859: starting > 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated > ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 > 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating > CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" > 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl > client: jaasClientEntry = Client, principal = > hbase/eyang-5.openstacklo...@example.com, keytab = > /etc/security/keytabs/hbase.service.keytab > 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to > ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 > 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering > appattempt_1517611904996_0001_01, abc into registry > 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 > containers from previous attempt. > 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not > read component paths: `/users/hbase/services/yarn-service/abc/components': No > such file or directory: KeeperErrorCode = NoNode for > /registry/users/hbase/services/yarn-service/abc/components > 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component sleeper > 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT > sleeper]: 2 instances. > 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT > sleeper] Transitioned from INIT to FLEXING on FLEX event. > 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - > Failed to register app abc in registry > org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: > `/registry/users/hbase/services/yarn-service/abc': Not authorized to access > path; ACLs: [ > 0x01: 'world,'anyone > 0x1f: 'sasl,'yarn > 0x1f: 'sasl,'jhs > 0x1f: 'sasl,'hdfs-demo > 0x1f: 'sasl,'rm > 0x1f: 'sasl,'hive > 0x1f: 'sasl,'hbase >
[jira] [Updated] (YARN-7884) Race condition in registering YARN service in ZooKeeper
[ https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-7884: - Target Version/s: 3.5.0 (was: 3.4.0) > Race condition in registering YARN service in ZooKeeper > --- > > Key: YARN-7884 > URL: https://issues.apache.org/jira/browse/YARN-7884 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > In Kerberos enabled cluster, there seems to be a race condition for > registering YARN service. > Yarn-service znode creation seems to happen after AM started and reporting > back to update components information. For some reason, Yarnservice znode > should have access to create the znode, but reported NoAuth. > {code} > 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry > user accounts: sasl:hbase > 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default > system acls: > [1,s{'world,'anyone} > , 31,s{'sasl,'yarn} > , 31,s{'sasl,'jhs} > , 31,s{'sasl,'hdfs-demo} > , 31,s{'sasl,'rm} > , 31,s{'sasl,'hive} > ] > 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs > [31,s{'sasl,'hbase} > , 31,s{'sasl,'hbase} > ] > 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering > class org.apache.hadoop.yarn.service.component.ComponentEventType for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler > 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering > class > org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType > for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler > 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of > the thread pool size is 500 > 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service > as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) > 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: > class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - > Starting Socket Reader #1 for port 56859 > 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding > protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to > the server > 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server > Responder: starting > 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC > Server listener on 56859: starting > 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated > ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 > 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating > CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" > 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl > client: jaasClientEntry = Client, principal = > hbase/eyang-5.openstacklo...@example.com, keytab = > /etc/security/keytabs/hbase.service.keytab > 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to > ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 > 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering > appattempt_1517611904996_0001_01, abc into registry > 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 > containers from previous attempt. > 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not > read component paths: `/users/hbase/services/yarn-service/abc/components': No > such file or directory: KeeperErrorCode = NoNode for > /registry/users/hbase/services/yarn-service/abc/components > 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component sleeper > 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT > sleeper]: 2 instances. > 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT > sleeper] Transitioned from INIT to FLEXING on FLEX event. > 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - > Failed to register app abc in registry > org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: > `/registry/users/hbase/services/yarn-service/abc': Not authorized to access > path; ACLs: [ > 0x01: 'world,'anyone > 0x1f: 'sasl,'yarn > 0x1f: 'sasl,'jhs > 0x1f: 'sasl,'hdfs-demo > 0x1f: 'sasl,'rm > 0x1f: 'sasl,'hive > 0x1f: 'sasl,'hbase > 0x1f: 'sasl,'hbase > ]: KeeperErrorCode = NoAuth for >
[jira] [Updated] (YARN-7844) Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX
[ https://issues.apache.org/jira/browse/YARN-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-7844: - Target Version/s: 3.5.0 (was: 3.4.0) > Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX > > > Key: YARN-7844 > URL: https://issues.apache.org/jira/browse/YARN-7844 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Major > Attachments: YARN-7844.000.patch, YARN-7844.001.patch > > > Currently FairScheduler's FSOpDurations records some scheduler operation > metrics: nodeUpdateCall, preemptCall, etc. We may need similar for > CapacityScheduler. Also, need to add more metrics there. This could help > monitor the RM scheduler performance, and get more insights whether scheduler > is under-pressure. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7882) Server side proxy for UI2 log viewer
[ https://issues.apache.org/jira/browse/YARN-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-7882: - Target Version/s: 3.5.0 (was: 3.4.0) > Server side proxy for UI2 log viewer > > > Key: YARN-7882 > URL: https://issues.apache.org/jira/browse/YARN-7882 > Project: Hadoop YARN > Issue Type: Bug > Components: security, timelineserver, yarn-ui-v2 >Affects Versions: 3.0.0 >Reporter: Eric Yang >Priority: Major > > When viewing container logs in UI2, the log files are directly fetched > through timeline server 2. Hadoop in simple security mode does not have > authenticator to make sure the user is authorized to view the log. The > general practice is to use knox or other security proxy to authenticate the > user and reserve proxy the request to Hadoop UI to ensure the information > does not leak through anonymous user. The current implementation of UI2 log > viewer uses ajax code to timeline server 2. This could prevent knox or > reverse proxy software from working properly with the new design. It would > be good to perform server side proxy to prevent browser from side step the > authentication check. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8149: - Target Version/s: 3.5.0 (was: 3.4.0) > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Major > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802684#comment-17802684 ] Shilun Fan commented on YARN-8149: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Revisit behavior of Re-Reservation in Capacity Scheduler > > > Key: YARN-8149 > URL: https://issues.apache.org/jira/browse/YARN-8149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Major > > Frankly speaking, I'm not sure why we need the re-reservation. The formula is > not that easy to understand: > Inside: > {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}} > {code:java} > starvation = re-reservation / (#reserved-container * > (1 - min(requested-resource / max-alloc, > max-alloc - min-alloc / max-alloc)) > should_allocate = starvation + requiredContainers - reservedContainers > > 0{code} > I think we should be able to remove the starvation computation, just to check > requiredContainers > reservedContainers should be enough. > In a large cluster, we can easily overflow re-reservation to MAX_INT, see > YARN-7636. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup
[ https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802686#comment-17802686 ] Shilun Fan commented on YARN-8012: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Support Unmanaged Container Cleanup > --- > > Key: YARN-8012 > URL: https://issues.apache.org/jira/browse/YARN-8012 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang >Priority: Major > Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, > YARN-8012-branch-2.7.1.001.patch > > > An *unmanaged container / leaked container* is a container which is no longer > managed by NM. Thus, it is cannot be managed / leaked by YARN, too. > *There are many cases a YARN managed container can become unmanaged, such as:* > * NM service is disabled or removed on the node. > * NM is unable to start up again on the node, such as depended > configuration, or resources cannot be ready. > * NM local leveldb store is corrupted or lost, such as bad disk sectors. > * NM has bugs, such as wrongly mark live container as complete. > Note, they are caused or things become worse if work-preserving NM restart > enabled, see YARN-1336 > *Bad impacts of unmanaged container, such as:* > # Resource cannot be managed for YARN on the node: > ** Cause YARN on the node resource leak > ** Cannot kill the container to release YARN resource on the node to free up > resource for other urgent computations on the node. > # Container and App killing is not eventually consistent for App user: > ** App which has bugs can still produce bad impacts to outside even if the > App is killed for a long time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8074) Support placement policy composite constraints in YARN Service
[ https://issues.apache.org/jira/browse/YARN-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8074: - Target Version/s: 3.5.0 (was: 3.4.0) > Support placement policy composite constraints in YARN Service > -- > > Key: YARN-8074 > URL: https://issues.apache.org/jira/browse/YARN-8074 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > > This is a follow up of YARN-7142 where we support more advanced placement > policy features like creating composite constraints by exposing expressions > in YARN Service specification. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup
[ https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8012: - Target Version/s: 3.5.0 (was: 3.4.0) > Support Unmanaged Container Cleanup > --- > > Key: YARN-8012 > URL: https://issues.apache.org/jira/browse/YARN-8012 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang >Priority: Major > Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, > YARN-8012-branch-2.7.1.001.patch > > > An *unmanaged container / leaked container* is a container which is no longer > managed by NM. Thus, it is cannot be managed / leaked by YARN, too. > *There are many cases a YARN managed container can become unmanaged, such as:* > * NM service is disabled or removed on the node. > * NM is unable to start up again on the node, such as depended > configuration, or resources cannot be ready. > * NM local leveldb store is corrupted or lost, such as bad disk sectors. > * NM has bugs, such as wrongly mark live container as complete. > Note, they are caused or things become worse if work-preserving NM restart > enabled, see YARN-1336 > *Bad impacts of unmanaged container, such as:* > # Resource cannot be managed for YARN on the node: > ** Cause YARN on the node resource leak > ** Cannot kill the container to release YARN resource on the node to free up > resource for other urgent computations on the node. > # Container and App killing is not eventually consistent for App user: > ** App which has bugs can still produce bad impacts to outside even if the > App is killed for a long time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7912) While launching Native Service app from UI, consider service owner name from user.name query parameter
[ https://issues.apache.org/jira/browse/YARN-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-7912: - Target Version/s: 3.5.0 (was: 3.4.0) > While launching Native Service app from UI, consider service owner name from > user.name query parameter > -- > > Key: YARN-7912 > URL: https://issues.apache.org/jira/browse/YARN-7912 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Sunil G >Priority: Major > > As per comments from [~eyang] in YARN-7827, > "For supporting knox, it would be good for javascript to detect the url > entering /ui2 and process [user.name|http://user.name/] property. If there > isn't one found, then proceed with ajax call to resource manager to find out > who is the current user to pass the parameter along the rest api calls." > This Jira will track to handle this. This is now pending feasibility check. > Thanks [~eyang] and [~jianhe] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8161) ServiceState FLEX should be removed
[ https://issues.apache.org/jira/browse/YARN-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8161: - Target Version/s: 3.5.0 (was: 3.4.0) > ServiceState FLEX should be removed > --- > > Key: YARN-8161 > URL: https://issues.apache.org/jira/browse/YARN-8161 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Gour Saha >Priority: Major > > ServiceState FLEX is not required to trigger flex up/down of containers and > should be removed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8192) Introduce container readiness check type
[ https://issues.apache.org/jira/browse/YARN-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802682#comment-17802682 ] Shilun Fan commented on YARN-8192: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Introduce container readiness check type > > > Key: YARN-8192 > URL: https://issues.apache.org/jira/browse/YARN-8192 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8192.1.patch, YARN-8192.2.patch > > > In some cases, the AM may not be able to perform a readiness check for a > container. For example, if a docker container is using a custom network type, > its IP may not be reachable from the AM. In this case, the AM could request a > new container to perform a readiness command, and use the exit status of the > container to determine whether the readiness check succeeded or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8256) Pluggable provider for node membership management
[ https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802681#comment-17802681 ] Shilun Fan commented on YARN-8256: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Pluggable provider for node membership management > - > > Key: YARN-8256 > URL: https://issues.apache.org/jira/browse/YARN-8256 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.8.3, 3.0.2 >Reporter: Dagang Wei >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > h1. Background > HDFS-7541 introduced a pluggable provider framework for node membership > management, which gives HDFS the flexibility to have different ways to manage > node membership for different needs. > [org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java] > is the class which provides the abstraction. Currently, there are 2 > implementations in the HDFS codebase: > 1) > [org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java] > which uses 2 config files which are defined by the properties dfs.hosts and > dfs.hosts.exclude. > 2) > [org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java] > which uses a single JSON file defined by the property dfs.hosts. > dfs.namenode.hosts.provider.classname is the property determining which > implementation is used > h1. Problem > YARN should be consistent with HDFS in terms of pluggable provider for node > membership management. The absence of it makes YARN impossible to have other > config sources, e.g., ZooKeeper, database, other config file formats, etc. > h1. Proposed solution > [org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java] > is the class for managing YARN node membership today. It uses > [HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java] > to read config files specified by the property > yarn.resourcemanager.nodes.include-path for nodes to include and > yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude. > The proposed solution is to > 1) introduce a new interface {color:#008000}HostsConfigManager{color} which > provides the abstraction for node membership management. Update > {color:#008000}NodeListManager{color} to depend on > {color:#008000}HostsConfigManager{color} instead of > {color:#008000}HostsFileReader{color}. Then create a wrapper class for > {color:#008000}HostsFileReader{color} which implements the interface. > 2) introduce a new config property > {color:#008000}yarn.resourcemanager.hosts-config.manager.class{color} for > specifying the implementation class. Set the default value to the wrapper > class of {color:#008000}HostsFileReader{color} for backward compatibility > between new code and old config. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8192) Introduce container readiness check type
[ https://issues.apache.org/jira/browse/YARN-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8192: - Target Version/s: 3.5.0 (was: 3.4.0) > Introduce container readiness check type > > > Key: YARN-8192 > URL: https://issues.apache.org/jira/browse/YARN-8192 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8192.1.patch, YARN-8192.2.patch > > > In some cases, the AM may not be able to perform a readiness check for a > container. For example, if a docker container is using a custom network type, > its IP may not be reachable from the AM. In this case, the AM could request a > new container to perform a readiness command, and use the exit status of the > container to determine whether the readiness check succeeded or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8256) Pluggable provider for node membership management
[ https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8256: - Target Version/s: 3.5.0 (was: 3.4.0) > Pluggable provider for node membership management > - > > Key: YARN-8256 > URL: https://issues.apache.org/jira/browse/YARN-8256 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.8.3, 3.0.2 >Reporter: Dagang Wei >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > h1. Background > HDFS-7541 introduced a pluggable provider framework for node membership > management, which gives HDFS the flexibility to have different ways to manage > node membership for different needs. > [org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java] > is the class which provides the abstraction. Currently, there are 2 > implementations in the HDFS codebase: > 1) > [org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java] > which uses 2 config files which are defined by the properties dfs.hosts and > dfs.hosts.exclude. > 2) > [org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java] > which uses a single JSON file defined by the property dfs.hosts. > dfs.namenode.hosts.provider.classname is the property determining which > implementation is used > h1. Problem > YARN should be consistent with HDFS in terms of pluggable provider for node > membership management. The absence of it makes YARN impossible to have other > config sources, e.g., ZooKeeper, database, other config file formats, etc. > h1. Proposed solution > [org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java] > is the class for managing YARN node membership today. It uses > [HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java] > to read config files specified by the property > yarn.resourcemanager.nodes.include-path for nodes to include and > yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude. > The proposed solution is to > 1) introduce a new interface {color:#008000}HostsConfigManager{color} which > provides the abstraction for node membership management. Update > {color:#008000}NodeListManager{color} to depend on > {color:#008000}HostsConfigManager{color} instead of > {color:#008000}HostsFileReader{color}. Then create a wrapper class for > {color:#008000}HostsFileReader{color} which implements the interface. > 2) introduce a new config property > {color:#008000}yarn.resourcemanager.hosts-config.manager.class{color} for > specifying the implementation class. Set the default value to the wrapper > class of {color:#008000}HostsFileReader{color} for backward compatibility > between new code and old config. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8258: - Target Version/s: 3.5.0 (was: 3.4.0) > YARN webappcontext for UI2 should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil G >Priority: Major > Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, > YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, > YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, > YARN-8258.007.patch, YARN-8258.008.patch, YARN-8258.009.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802680#comment-17802680 ] Shilun Fan commented on YARN-8258: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > YARN webappcontext for UI2 should inherit all filters from default context > -- > > Key: YARN-8258 > URL: https://issues.apache.org/jira/browse/YARN-8258 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sumana Sathish >Assignee: Sunil G >Priority: Major > Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, > YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, > YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, > YARN-8258.007.patch, YARN-8258.008.patch, YARN-8258.009.patch > > > Thanks [~ssath...@hortonworks.com] for finding this. > Ideally all filters from default context has to be inherited to UI2 context > as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8340) Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more resources enabled.
[ https://issues.apache.org/jira/browse/YARN-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8340: - Target Version/s: 3.5.0 (was: 3.4.0) > Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more > resources enabled. > - > > Key: YARN-8340 > URL: https://issues.apache.org/jira/browse/YARN-8340 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Refer to comment from [~eepayne] and discussion below that: > https://issues.apache.org/jira/browse/YARN-8292?focusedCommentId=16482689=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482689 > for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8340) Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more resources enabled.
[ https://issues.apache.org/jira/browse/YARN-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802679#comment-17802679 ] Shilun Fan commented on YARN-8340: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more > resources enabled. > - > > Key: YARN-8340 > URL: https://issues.apache.org/jira/browse/YARN-8340 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Priority: Critical > > Refer to comment from [~eepayne] and discussion below that: > https://issues.apache.org/jira/browse/YARN-8292?focusedCommentId=16482689=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482689 > for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8509: - Target Version/s: 3.5.0 (was: 3.4.0) > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8366) Expose debug log information when user intend to enable GPU without setting nvidia-smi path
[ https://issues.apache.org/jira/browse/YARN-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8366: - Target Version/s: 3.5.0 (was: 3.4.0) > Expose debug log information when user intend to enable GPU without setting > nvidia-smi path > --- > > Key: YARN-8366 > URL: https://issues.apache.org/jira/browse/YARN-8366 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > > Expose Debug information help user found the root cause of failure when user > don't make these two settings manually before enabling GPU on YARN > 1. yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables in > yarn-site.xml > 2. environment variable LD_LIBRARY_PATH -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8733) Readiness check for remote component
[ https://issues.apache.org/jira/browse/YARN-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8733: - Target Version/s: 3.5.0 (was: 3.4.0) > Readiness check for remote component > > > Key: YARN-8733 > URL: https://issues.apache.org/jira/browse/YARN-8733 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Billie Rinaldi >Priority: Major > > When a service is deploying, there can be remote component dependency between > services. For example, Hive server 2 can depend on Hive metastore, which > depends on a remote MySQL database. It would be great to have ability to > check the remote server and port to make sure MySQL is available before > deploying Hive LLAP service. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent
[ https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802677#comment-17802677 ] Shilun Fan commented on YARN-8509: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Total pending resource calculation in preemption should use user-limit factor > instead of minimum-user-limit-percent > --- > > Key: YARN-8509 > URL: https://issues.apache.org/jira/browse/YARN-8509 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: capacityscheduler > Attachments: YARN-8509.001.patch, YARN-8509.002.patch, > YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch > > > In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total > pending resource based on user-limit percent and user-limit factor which will > cap pending resource for each user to the minimum of user-limit pending and > actual pending. This will prevent queue from taking more pending resource to > achieve queue balance after all queue satisfied with its ideal allocation. > > We need to change the logic to let queue pending can go beyond userlimit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code
[ https://issues.apache.org/jira/browse/YARN-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802674#comment-17802674 ] Shilun Fan commented on YARN-8779: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Fix few discrepancies between YARN Service swagger spec and code > > > Key: YARN-8779 > URL: https://issues.apache.org/jira/browse/YARN-8779 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0, 3.1.1 >Reporter: Gour Saha >Priority: Major > > Following issues were identified in YARN Service swagger definition during an > effort to integrate with a running service by generating Java and Go > client-side stubs from the spec - > > 1. > *restartPolicy* is wrong and should be *restart_policy* > > 2. > A DELETE request to a non-existing service (or a previously existing but > deleted service) throws an ApiException instead of something like > NotFoundException (the equivalent of 404). Note, DELETE of an existing > service behaves fine. > > 3. > The response code of DELETE request is 200. The spec says 204. Since the > response has a payload, the spec should be updated to 200 instead of 204. > > 4. > _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method > does not return a Service object. Swagger definition has the below bug in GET > response of */app/v1/services/\{service_name}* - > {code:java} > type: object > items: > $ref: '#/definitions/Service' > {code} > It should be - > {code:java} > $ref: '#/definitions/Service' > {code} > > 5. > Serialization issues were seen in all enum classes - ServiceState.java, > ContainerState.java, ComponentState.java, PlacementType.java and > PlacementScope.java. > Java client threw the below exception for ServiceState - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Cannot construct instance of > `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one > Creator exists): no String-argument constructor/factory method to deserialize > from String value ('ACCEPTED') > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 121] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["state”]) > {code} > For Golang we saw this for ContainerState - > {code:java} > ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot > unmarshal string into Go struct field Container.state of type > yarnmodel.ContainerState > {code} > > 6. > *launch_time* actually returns an integer but swagger definition says date. > Hence, the following exception is seen on the client side - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or > string. > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 477] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”]) > {code} > > 8. > *user.name* query param with a valid value is required for all API calls to > an unsecure cluster. This is not defined in the spec. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8583) Inconsistency in YARN status command
[ https://issues.apache.org/jira/browse/YARN-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802676#comment-17802676 ] Shilun Fan commented on YARN-8583: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Inconsistency in YARN status command > > > Key: YARN-8583 > URL: https://issues.apache.org/jira/browse/YARN-8583 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Yang >Priority: Major > > YARN app -status command can report base on application ID or application > name with some usability limitation. Application ID is globally unique, and > it allows any user to query application status of any application. > Application name is not globally unique, and it will only work for querying > user's own application. This is somewhat restrictive for application > administrator, but allowing other user to query any other user's application > could consider a security hole as well. There are two possible options to > reduce the inconsistency: > Option 1. Block other user from query application status. This may improve > security in some sense, but it is an incompatible change. This is a simpler > change by matching the owner of the application, and decide to report or not > report. > Option 2. Add --user parameter to allow administrator to query application > name ran by other user. This is a bigger change because application metadata > is stored in user's own hdfs directory. There are security restriction that > need to be defined. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8583) Inconsistency in YARN status command
[ https://issues.apache.org/jira/browse/YARN-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8583: - Target Version/s: 3.5.0 (was: 3.4.0) > Inconsistency in YARN status command > > > Key: YARN-8583 > URL: https://issues.apache.org/jira/browse/YARN-8583 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Yang >Priority: Major > > YARN app -status command can report base on application ID or application > name with some usability limitation. Application ID is globally unique, and > it allows any user to query application status of any application. > Application name is not globally unique, and it will only work for querying > user's own application. This is somewhat restrictive for application > administrator, but allowing other user to query any other user's application > could consider a security hole as well. There are two possible options to > reduce the inconsistency: > Option 1. Block other user from query application status. This may improve > security in some sense, but it is an incompatible change. This is a simpler > change by matching the owner of the application, and decide to report or not > report. > Option 2. Add --user parameter to allow administrator to query application > name ran by other user. This is a bigger change because application metadata > is stored in user's own hdfs directory. There are security restriction that > need to be defined. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9415) Document FS placement rule changes from YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802670#comment-17802670 ] Shilun Fan commented on YARN-9415: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Document FS placement rule changes from YARN-8967 > - > > Key: YARN-9415 > URL: https://issues.apache.org/jira/browse/YARN-9415 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, fairscheduler >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > With the changes introduced by YARN-8967 we now allow parent rules on all > existing rules. This should be documented. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code
[ https://issues.apache.org/jira/browse/YARN-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8779: - Target Version/s: 3.5.0 (was: 3.4.0) > Fix few discrepancies between YARN Service swagger spec and code > > > Key: YARN-8779 > URL: https://issues.apache.org/jira/browse/YARN-8779 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0, 3.1.1 >Reporter: Gour Saha >Priority: Major > > Following issues were identified in YARN Service swagger definition during an > effort to integrate with a running service by generating Java and Go > client-side stubs from the spec - > > 1. > *restartPolicy* is wrong and should be *restart_policy* > > 2. > A DELETE request to a non-existing service (or a previously existing but > deleted service) throws an ApiException instead of something like > NotFoundException (the equivalent of 404). Note, DELETE of an existing > service behaves fine. > > 3. > The response code of DELETE request is 200. The spec says 204. Since the > response has a payload, the spec should be updated to 200 instead of 204. > > 4. > _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method > does not return a Service object. Swagger definition has the below bug in GET > response of */app/v1/services/\{service_name}* - > {code:java} > type: object > items: > $ref: '#/definitions/Service' > {code} > It should be - > {code:java} > $ref: '#/definitions/Service' > {code} > > 5. > Serialization issues were seen in all enum classes - ServiceState.java, > ContainerState.java, ComponentState.java, PlacementType.java and > PlacementScope.java. > Java client threw the below exception for ServiceState - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Cannot construct instance of > `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one > Creator exists): no String-argument constructor/factory method to deserialize > from String value ('ACCEPTED') > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 121] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["state”]) > {code} > For Golang we saw this for ContainerState - > {code:java} > ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot > unmarshal string into Go struct field Container.state of type > yarnmodel.ContainerState > {code} > > 6. > *launch_time* actually returns an integer but swagger definition says date. > Hence, the following exception is seen on the client side - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or > string. > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 477] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”]) > {code} > > 8. > *user.name* query param with a valid value is required for all API calls to > an unsecure cluster. This is not defined in the spec. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9256) Make ATSv2 compilation default with hbase.profile=2.0
[ https://issues.apache.org/jira/browse/YARN-9256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9256: - Target Version/s: 3.5.0 (was: 3.4.0) > Make ATSv2 compilation default with hbase.profile=2.0 > - > > Key: YARN-9256 > URL: https://issues.apache.org/jira/browse/YARN-9256 > Project: Hadoop YARN > Issue Type: Task >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9256.01.patch, YARN-9256.02.patch, > YARN-9256.03.patch > > > By default Hadoop compiles with hbase.profile one which corresponds to > hbase.version=1.4 for ATSv2. Change compilation to hbase.profile=2.0 by > default in trunk. > This JIRA is to discuss for any concerns. > cc:/ [~vrushalic] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8940) [CSI] Add volume as a top-level attribute in service spec
[ https://issues.apache.org/jira/browse/YARN-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802672#comment-17802672 ] Shilun Fan commented on YARN-8940: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > [CSI] Add volume as a top-level attribute in service spec > -- > > Key: YARN-8940 > URL: https://issues.apache.org/jira/browse/YARN-8940 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: CSI > > Initial thought: > {noformat} > { > "name": "volume example", > "version": "1.0.0", > "description": "a volume simple example", > "components" : > [ > { > "name": "", > "number_of_containers": 1, > "artifact": { > "id": "docker.io/centos:latest", > "type": "DOCKER" > }, > "launch_command": "sleep,120", > "configuration": { > "env": { > "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true" > } > }, > "resource": { > "cpus": 1, > "memory": "256", > }, > "volumes": [ > { > "volume" : { > "type": "s3_csi", > "id": "5504d4a8-b246-11e8-94c2-026b17aa1190", > "capability" : { > "min": "5Gi", > "max": "100Gi" > }, > "source_path": "s3://my_bucket/my", # optional for object stores > "mount_path": "/mnt/data", # required, the mount point in > docker container > "access_mode": "SINGLE_READ", # how the volume can be accessed > } > } > ] > } > } > ] > } > {noformat} > Open for discussion. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8928) TestRMAdminService is failing
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8928: - Target Version/s: 3.5.0 (was: 3.4.0) > TestRMAdminService is failing > - > > Key: YARN-8928 > URL: https://issues.apache.org/jira/browse/YARN-8928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Darrell Lowe >Assignee: David Mollitor >Priority: Major > Attachments: YARN-8928.1.patch, YARN-8928.2.patch, YARN-8928.3.patch > > > After HADOOP-15836 TestRMAdminService has started failing consistently. > Sample stacktraces to follow. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8940) [CSI] Add volume as a top-level attribute in service spec
[ https://issues.apache.org/jira/browse/YARN-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8940: - Target Version/s: 3.5.0 (was: 3.4.0) > [CSI] Add volume as a top-level attribute in service spec > -- > > Key: YARN-8940 > URL: https://issues.apache.org/jira/browse/YARN-8940 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: CSI > > Initial thought: > {noformat} > { > "name": "volume example", > "version": "1.0.0", > "description": "a volume simple example", > "components" : > [ > { > "name": "", > "number_of_containers": 1, > "artifact": { > "id": "docker.io/centos:latest", > "type": "DOCKER" > }, > "launch_command": "sleep,120", > "configuration": { > "env": { > "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true" > } > }, > "resource": { > "cpus": 1, > "memory": "256", > }, > "volumes": [ > { > "volume" : { > "type": "s3_csi", > "id": "5504d4a8-b246-11e8-94c2-026b17aa1190", > "capability" : { > "min": "5Gi", > "max": "100Gi" > }, > "source_path": "s3://my_bucket/my", # optional for object stores > "mount_path": "/mnt/data", # required, the mount point in > docker container > "access_mode": "SINGLE_READ", # how the volume can be accessed > } > } > ] > } > } > ] > } > {noformat} > Open for discussion. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9256) Make ATSv2 compilation default with hbase.profile=2.0
[ https://issues.apache.org/jira/browse/YARN-9256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802671#comment-17802671 ] Shilun Fan commented on YARN-9256: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Make ATSv2 compilation default with hbase.profile=2.0 > - > > Key: YARN-9256 > URL: https://issues.apache.org/jira/browse/YARN-9256 > Project: Hadoop YARN > Issue Type: Task >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9256.01.patch, YARN-9256.02.patch, > YARN-9256.03.patch > > > By default Hadoop compiles with hbase.profile one which corresponds to > hbase.version=1.4 for ATSv2. Change compilation to hbase.profile=2.0 by > default in trunk. > This JIRA is to discuss for any concerns. > cc:/ [~vrushalic] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9637) Make SLS wrapper class name configurable
[ https://issues.apache.org/jira/browse/YARN-9637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802668#comment-17802668 ] Shilun Fan commented on YARN-9637: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Make SLS wrapper class name configurable > > > Key: YARN-9637 > URL: https://issues.apache.org/jira/browse/YARN-9637 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.3.0 >Reporter: Erkin Alp Güney >Assignee: Adam Antal >Priority: Major > Labels: configuration-addition > Attachments: YARN-9637.001.patch > > > SLS currently has hardcoded lookup on which scheduler wrapper to load based > on scheduler, and it only knows about Fair and Capacity schedulers. Making it > configurable will accelerate development of new pluggable YARN schedulers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9415) Document FS placement rule changes from YARN-8967
[ https://issues.apache.org/jira/browse/YARN-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9415: - Target Version/s: 3.5.0 (was: 3.4.0) > Document FS placement rule changes from YARN-8967 > - > > Key: YARN-9415 > URL: https://issues.apache.org/jira/browse/YARN-9415 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation, fairscheduler >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > With the changes introduced by YARN-8967 we now allow parent rules on all > existing rules. This should be documented. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9490) applicationresourceusagereport return wrong number of reserved containers
[ https://issues.apache.org/jira/browse/YARN-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9490: - Target Version/s: 3.5.0 (was: 3.4.0) > applicationresourceusagereport return wrong number of reserved containers > - > > Key: YARN-9490 > URL: https://issues.apache.org/jira/browse/YARN-9490 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.3.0 >Reporter: yanbing zhang >Assignee: yanbing zhang >Priority: Minor > Attachments: YARN-9490.002.patch, YARN-9490.patch, > YARN-9490.patch1.patch > > > when getting an ApplicationResourceUsageReport instance from the class of > SchedulerApplicationAttempt, I found the input constructor > parameter(reservedContainers.size()) is wrong. because the type of this > variable is Map>, so > "reservedContainer.size()" is not the number of containers, but the number of > SchedulerRequestKey. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9637) Make SLS wrapper class name configurable
[ https://issues.apache.org/jira/browse/YARN-9637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9637: - Target Version/s: 3.5.0 (was: 3.4.0) > Make SLS wrapper class name configurable > > > Key: YARN-9637 > URL: https://issues.apache.org/jira/browse/YARN-9637 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.3.0 >Reporter: Erkin Alp Güney >Assignee: Adam Antal >Priority: Major > Labels: configuration-addition > Attachments: YARN-9637.001.patch > > > SLS currently has hardcoded lookup on which scheduler wrapper to load based > on scheduler, and it only knows about Fair and Capacity schedulers. Making it > configurable will accelerate development of new pluggable YARN schedulers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9652) Convert SchedulerQueueManager from a protocol-only type to a basic hierarchical queue implementation
[ https://issues.apache.org/jira/browse/YARN-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9652: - Target Version/s: 3.5.0 (was: 3.4.0) > Convert SchedulerQueueManager from a protocol-only type to a basic > hierarchical queue implementation > > > Key: YARN-9652 > URL: https://issues.apache.org/jira/browse/YARN-9652 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system, scheduler >Affects Versions: 3.3.0 >Reporter: Erkin Alp Güney >Priority: Major > > SchedulerQueueManager is currently an interface aka a protocol-only type. As > seen in the codebase, each scheduler implements the queue configuration and > management stuff over and over. If we convert it into a base concrete class > with simple implementation of hierarchical queue system (as in Fair and > Capacity schedulers), pluggable schedulers may be developed more easily. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9675) Expose log aggregation diagnostic messages through RM API
[ https://issues.apache.org/jira/browse/YARN-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802665#comment-17802665 ] Shilun Fan commented on YARN-9675: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Expose log aggregation diagnostic messages through RM API > - > > Key: YARN-9675 > URL: https://issues.apache.org/jira/browse/YARN-9675 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, log-aggregation, resourcemanager >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > The ResourceManager collects the log aggregation status reports from the > NodeManagers. Currently these reports are collected, but when app info API or > similar high-level REST is called, only an overall status is displayed > (RUNNING, RUNNING_WITH_FAILURES,FAILED etc.). > The diagnostic messages are only available through the old RM web UI, so our > internal tool currently crawls that page and extract the log aggregation > diagnostic and error messages from the raw HTML. This is not a good practice, > and more elegant API call may be preferable. It may be useful for others as > well since log aggregation related failures are usually hard to debug since > the lack of trace/debug messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9741) [JDK11] TestAHSWebServices.testAbout fails
[ https://issues.apache.org/jira/browse/YARN-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9741: - Target Version/s: 3.5.0 (was: 3.4.0) > [JDK11] TestAHSWebServices.testAbout fails > -- > > Key: YARN-9741 > URL: https://issues.apache.org/jira/browse/YARN-9741 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Affects Versions: 3.2.0 >Reporter: Adam Antal >Priority: Major > > On openjdk-11.0.2 TestAHSWebServices.testAbout[0] fails consistently with the > following stack trace: > {noformat} > [ERROR] Tests run: 40, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 7.9 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices > [ERROR] > testAbout[0](org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices) > Time elapsed: 0.241 s <<< FAILURE! > org.junit.ComparisonFailure: expected: but > was: > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.testAbout(TestAHSWebServices.java:333) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9675) Expose log aggregation diagnostic messages through RM API
[ https://issues.apache.org/jira/browse/YARN-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9675: - Target Version/s: 3.5.0 (was: 3.4.0) > Expose log aggregation diagnostic messages through RM API > - > > Key: YARN-9675 > URL: https://issues.apache.org/jira/browse/YARN-9675 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, log-aggregation, resourcemanager >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > The ResourceManager collects the log aggregation status reports from the > NodeManagers. Currently these reports are collected, but when app info API or > similar high-level REST is called, only an overall status is displayed > (RUNNING, RUNNING_WITH_FAILURES,FAILED etc.). > The diagnostic messages are only available through the old RM web UI, so our > internal tool currently crawls that page and extract the log aggregation > diagnostic and error messages from the raw HTML. This is not a good practice, > and more elegant API call may be preferable. It may be useful for others as > well since log aggregation related failures are usually hard to debug since > the lack of trace/debug messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9741) [JDK11] TestAHSWebServices.testAbout fails
[ https://issues.apache.org/jira/browse/YARN-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802664#comment-17802664 ] Shilun Fan commented on YARN-9741: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > [JDK11] TestAHSWebServices.testAbout fails > -- > > Key: YARN-9741 > URL: https://issues.apache.org/jira/browse/YARN-9741 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Affects Versions: 3.2.0 >Reporter: Adam Antal >Priority: Major > > On openjdk-11.0.2 TestAHSWebServices.testAbout[0] fails consistently with the > following stack trace: > {noformat} > [ERROR] Tests run: 40, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 7.9 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices > [ERROR] > testAbout[0](org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices) > Time elapsed: 0.241 s <<< FAILURE! > org.junit.ComparisonFailure: expected: but > was: > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.testAbout(TestAHSWebServices.java:333) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9807) ContainerAllocator re-creates RMContainer instance when allocate for ReservedContainer
[ https://issues.apache.org/jira/browse/YARN-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9807: - Target Version/s: 3.5.0 (was: 3.4.0) > ContainerAllocator re-creates RMContainer instance when allocate for > ReservedContainer > -- > > Key: YARN-9807 > URL: https://issues.apache.org/jira/browse/YARN-9807 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: YARN-9807.01.patch, YARN-9807.branch-2.01.patch > > > The ContainerAllocator re-creates the RMContainer instance when it is > allocated to the ReservedContainer. This will cause the RMContainer to lose > information from NEW to RESERVED. > {panel:title=RM Log} > 2019-08-28 18:42:30,320 [10645451] - INFO [SchedulerEventDispatcher:Event > Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 > Container Transitioned from NEW to RESERVED > 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event > Processor:AbstractContainerAllocator@126] - assignedContainer application > attempt=appattempt_1566978597856_2831_01 > container=container_e47_1566978597856_2831_01_07 > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@69f543b5 > clusterResource= type=NODE_LOCAL > requestedPartition=label_ndir_2 > 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event > Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 > Container Transitioned from NEW to ALLOCATED > {panel} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9708) Yarn Router Support DelegationToken
[ https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9708: - Target Version/s: 3.5.0 (was: 3.4.0) > Yarn Router Support DelegationToken > --- > > Key: YARN-9708 > URL: https://issues.apache.org/jira/browse/YARN-9708 > Project: Hadoop YARN > Issue Type: New Feature > Components: router >Affects Versions: 3.4.0 >Reporter: Xie YiFan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, > RMDelegationTokenSecretManager_storeNewMasterKey.svg, > RouterDelegationTokenSecretManager_storeNewMasterKey.svg > > > 1.we use router as proxy to manage multiple cluster which be independent of > each other in order to apply unified client. Thus, we implement our > customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other > cluster. > 2.Our production environment need kerberos. But router doesn't support > SecureLogin for now. > https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we > improvement it. > 3.Some framework like oozie would get Token via yarnclient#getDelegationToken > which router doesn't support. Our solution is that adding homeCluster to > ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would > be submitted with specified clusterid so that router knows which cluster to > submit this job. Router would get Token from one RM according to specified > clusterid when client call getDelegation meanwhile apply some mechanism to > save this token in memory. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9807) ContainerAllocator re-creates RMContainer instance when allocate for ReservedContainer
[ https://issues.apache.org/jira/browse/YARN-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802663#comment-17802663 ] Shilun Fan commented on YARN-9807: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > ContainerAllocator re-creates RMContainer instance when allocate for > ReservedContainer > -- > > Key: YARN-9807 > URL: https://issues.apache.org/jira/browse/YARN-9807 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: YARN-9807.01.patch, YARN-9807.branch-2.01.patch > > > The ContainerAllocator re-creates the RMContainer instance when it is > allocated to the ReservedContainer. This will cause the RMContainer to lose > information from NEW to RESERVED. > {panel:title=RM Log} > 2019-08-28 18:42:30,320 [10645451] - INFO [SchedulerEventDispatcher:Event > Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 > Container Transitioned from NEW to RESERVED > 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event > Processor:AbstractContainerAllocator@126] - assignedContainer application > attempt=appattempt_1566978597856_2831_01 > container=container_e47_1566978597856_2831_01_07 > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@69f543b5 > clusterResource= type=NODE_LOCAL > requestedPartition=label_ndir_2 > 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event > Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 > Container Transitioned from NEW to ALLOCATED > {panel} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9852) Allow multiple MiniYarnCluster to be used
[ https://issues.apache.org/jira/browse/YARN-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9852: - Target Version/s: 3.5.0 (was: 3.4.0) > Allow multiple MiniYarnCluster to be used > - > > Key: YARN-9852 > URL: https://issues.apache.org/jira/browse/YARN-9852 > Project: Hadoop YARN > Issue Type: New Feature > Components: test >Affects Versions: 3.2.1 >Reporter: Adam Antal >Priority: Major > > During implementing new HBase replication tests we observed that there are > problems in the communication between multiple MiniYarnCluster in one test > suite. I haven't seen any testcase in the Hadoop repository that uses > multiple clusters in one test, but seems like a logical request to allow > this. > In case this jira does not involve any code change (it's just mainly a > configuration issue), then I suggest to add a testcase that would demonstrate > such a suitable configuration. > Thanks for the consultation to [~bszabolcs] about this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9856) Remove log-aggregation related duplicate function
[ https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9856: - Target Version/s: 3.5.0 (was: 3.4.0) > Remove log-aggregation related duplicate function > - > > Key: YARN-9856 > URL: https://issues.apache.org/jira/browse/YARN-9856 > Project: Hadoop YARN > Issue Type: Task > Components: log-aggregation, yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Szilard Nemeth >Priority: Trivial > Attachments: YARN-9856.001.patch, YARN-9856.002.patch > > > [~snemeth] has noticed a duplication in two of the log-aggregation related > functions. > {quote}I noticed duplicated code in > org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, > duplicated in > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs. > [...] > {quote} > We should remove the duplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9856) Remove log-aggregation related duplicate function
[ https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802661#comment-17802661 ] Shilun Fan commented on YARN-9856: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Remove log-aggregation related duplicate function > - > > Key: YARN-9856 > URL: https://issues.apache.org/jira/browse/YARN-9856 > Project: Hadoop YARN > Issue Type: Task > Components: log-aggregation, yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Szilard Nemeth >Priority: Trivial > Attachments: YARN-9856.001.patch, YARN-9856.002.patch > > > [~snemeth] has noticed a duplication in two of the log-aggregation related > functions. > {quote}I noticed duplicated code in > org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, > duplicated in > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs. > [...] > {quote} > We should remove the duplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9852) Allow multiple MiniYarnCluster to be used
[ https://issues.apache.org/jira/browse/YARN-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802662#comment-17802662 ] Shilun Fan commented on YARN-9852: -- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Allow multiple MiniYarnCluster to be used > - > > Key: YARN-9852 > URL: https://issues.apache.org/jira/browse/YARN-9852 > Project: Hadoop YARN > Issue Type: New Feature > Components: test >Affects Versions: 3.2.1 >Reporter: Adam Antal >Priority: Major > > During implementing new HBase replication tests we observed that there are > problems in the communication between multiple MiniYarnCluster in one test > suite. I haven't seen any testcase in the Hadoop repository that uses > multiple clusters in one test, but seems like a logical request to allow > this. > In case this jira does not involve any code change (it's just mainly a > configuration issue), then I suggest to add a testcase that would demonstrate > such a suitable configuration. > Thanks for the consultation to [~bszabolcs] about this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10032) Implement regex querying of logs
[ https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10032: -- Target Version/s: 3.5.0 (was: 3.4.0) > Implement regex querying of logs > > > Key: YARN-10032 > URL: https://issues.apache.org/jira/browse/YARN-10032 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > After YARN-10031, we have query parameters to the log servlet's GET endpoint. > To demonstrate the new capabilities of the log servlet and how easy it will > be to add a functionality to all log servlets at the same time: let's add the > ability to search in the aggregated logs with a given regex. > A conceptual use case: > User run several MR jobs daily, but some of them fail to localize a > particular resource at first. We want to search in the logs of these Yarn > applications, and extract some data from them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10025) Various improvements in YARN log servlets
[ https://issues.apache.org/jira/browse/YARN-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10025: -- Target Version/s: 3.5.0 (was: 3.4.0) > Various improvements in YARN log servlets > - > > Key: YARN-10025 > URL: https://issues.apache.org/jira/browse/YARN-10025 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-10025 document.pdf > > > There are multiple ways how we can enhance the current log servlets in YARN. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10032) Implement regex querying of logs
[ https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802658#comment-17802658 ] Shilun Fan commented on YARN-10032: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Implement regex querying of logs > > > Key: YARN-10032 > URL: https://issues.apache.org/jira/browse/YARN-10032 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > After YARN-10031, we have query parameters to the log servlet's GET endpoint. > To demonstrate the new capabilities of the log servlet and how easy it will > be to add a functionality to all log servlets at the same time: let's add the > ability to search in the aggregated logs with a given regex. > A conceptual use case: > User run several MR jobs daily, but some of them fail to localize a > particular resource at first. We want to search in the logs of these Yarn > applications, and extract some data from them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10406) YARN log processor
[ https://issues.apache.org/jira/browse/YARN-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802653#comment-17802653 ] Shilun Fan commented on YARN-10406: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > YARN log processor > -- > > Key: YARN-10406 > URL: https://issues.apache.org/jira/browse/YARN-10406 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Adam Antal >Assignee: Hudáky Márton Gyula >Priority: Critical > > YARN currently does not have any utility that would enable cluster > administrators to display previous actions in a Hadoop YARN cluster in an > offline fashion. > HDFS has the > [OIV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html]/ > > [OEV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html] > which does not require a running cluster to look and modify the filesystem. > A corresponding tool would be very helpful in the context of YARN. > Since ATS is not widespread (is not available for older clusters) and there > isn't a single file or entity that would collect all the > application/container etc. related information, we thought our best option to > parse and process the output of the YARN daemon log files and reconstruct the > history of the cluster from that. We designed and implemented a CLI based > solution that after parsing the log file enables users to query app/container > related information (listing, filtering by certain properties) and search for > common errors like CE failures/error codes, AM preemption or stack traces. > The tool can be integrated into the YARN project as a sub-project. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation
[ https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802657#comment-17802657 ] Shilun Fan commented on YARN-10050: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > NodeManagerCGroupsMemory.md does not show up in the official documentation > -- > > Key: YARN-10050 > URL: https://issues.apache.org/jira/browse/YARN-10050 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Miklos Szegedi >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: YARN-10050.001.patch > > > I looked at this doc: > [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md] > It does not show up here: > [https://hadoop.apache.org/docs/stable/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10065) Support Placement Constraints for AM container allocations
[ https://issues.apache.org/jira/browse/YARN-10065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10065: -- Target Version/s: 3.5.0 (was: 3.4.0) > Support Placement Constraints for AM container allocations > -- > > Key: YARN-10065 > URL: https://issues.apache.org/jira/browse/YARN-10065 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: Daniel Velasquez >Priority: Major > > Currently ApplicationSubmissionContext API supports specifying a node label > expression for the AM resource request. It would be beneficial to have the > ability to specify Placement Constraints as well for the AM resource request. > We have a requirement to constrain AM containers on certain nodes e.g. AM > containers not on preemptible/spot cloud instances. It looks like node > attributes would fit our use case well. However, we currently don't have the > ability to specify this in the API for AM resource requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10065) Support Placement Constraints for AM container allocations
[ https://issues.apache.org/jira/browse/YARN-10065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802655#comment-17802655 ] Shilun Fan commented on YARN-10065: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Support Placement Constraints for AM container allocations > -- > > Key: YARN-10065 > URL: https://issues.apache.org/jira/browse/YARN-10065 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: Daniel Velasquez >Priority: Major > > Currently ApplicationSubmissionContext API supports specifying a node label > expression for the AM resource request. It would be beneficial to have the > ability to specify Placement Constraints as well for the AM resource request. > We have a requirement to constrain AM containers on certain nodes e.g. AM > containers not on preemptible/spot cloud instances. It looks like node > attributes would fit our use case well. However, we currently don't have the > ability to specify this in the API for AM resource requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10243) Rack-only localization constraint for MR AM is broken for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10243: -- Target Version/s: 3.5.0 (was: 3.4.0) > Rack-only localization constraint for MR AM is broken for CapacityScheduler > --- > > Key: YARN-10243 > URL: https://issues.apache.org/jira/browse/YARN-10243 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Bilwa S T >Priority: Major > > Reproduction: Start a MR sleep job with strict-locality configured for AM > ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If > CapacityScheduler is used, the job will hang (stuck in SCHEDULED state). > Root cause: if there are no other resources requested (like node locality or > other constraint), the scheduling opportunities counter will not be > incremented and the following piece of code always returns false (so we > always skip this constraint) resulting in an infinite loop: > {code:java} > // If we are here, we do need containers on this rack for RACK_LOCAL req > if (type == NodeType.RACK_LOCAL) { > // 'Delay' rack-local just a little bit... > long missedOpportunities = > application.getSchedulingOpportunities(schedulerKey); > return getActualNodeLocalityDelay() < missedOpportunities; > } > {code} > Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to > enforce this rule to be processed immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10243) Rack-only localization constraint for MR AM is broken for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802654#comment-17802654 ] Shilun Fan commented on YARN-10243: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Rack-only localization constraint for MR AM is broken for CapacityScheduler > --- > > Key: YARN-10243 > URL: https://issues.apache.org/jira/browse/YARN-10243 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Bilwa S T >Priority: Major > > Reproduction: Start a MR sleep job with strict-locality configured for AM > ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If > CapacityScheduler is used, the job will hang (stuck in SCHEDULED state). > Root cause: if there are no other resources requested (like node locality or > other constraint), the scheduling opportunities counter will not be > incremented and the following piece of code always returns false (so we > always skip this constraint) resulting in an infinite loop: > {code:java} > // If we are here, we do need containers on this rack for RACK_LOCAL req > if (type == NodeType.RACK_LOCAL) { > // 'Delay' rack-local just a little bit... > long missedOpportunities = > application.getSchedulingOpportunities(schedulerKey); > return getActualNodeLocalityDelay() < missedOpportunities; > } > {code} > Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to > enforce this rule to be processed immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation
[ https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10050: -- Target Version/s: 3.5.0 (was: 3.4.0) > NodeManagerCGroupsMemory.md does not show up in the official documentation > -- > > Key: YARN-10050 > URL: https://issues.apache.org/jira/browse/YARN-10050 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Miklos Szegedi >Assignee: Masatake Iwasaki >Priority: Minor > Attachments: YARN-10050.001.patch > > > I looked at this doc: > [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md] > It does not show up here: > [https://hadoop.apache.org/docs/stable/] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10059) Final states of failed-to-localize containers are not recorded in NM state store
[ https://issues.apache.org/jira/browse/YARN-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10059: -- Target Version/s: 3.5.0 (was: 3.4.0) > Final states of failed-to-localize containers are not recorded in NM state > store > > > Key: YARN-10059 > URL: https://issues.apache.org/jira/browse/YARN-10059 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-10059.001.patch > > > Currently we found an issue that many localizers of completed containers were > launched and exhausted memory/cpu of that machine after NM restarted, these > containers were all failed and completed when localizing on a non-existed > local directory which is caused by another problem, but their final states > weren't recorded in NM state store. > The process flow of a fail-to-localize container is as follow: > {noformat} > ResourceLocalizationService$LocalizerRunner#run > -> ContainerImpl$ResourceFailedTransition#transition handle LOCALIZING -> > LOCALIZATION_FAILED upon RESOURCE_FAILED > dispatch LocalizationEventType.CLEANUP_CONTAINER_RESOURCES > -> ResourceLocalizationService#handleCleanupContainerResources handle > CLEANUP_CONTAINER_RESOURCES > dispatch ContainerEventType.CONTAINER_RESOURCES_CLEANEDUP > -> ContainerImpl$LocalizationFailedToDoneTransition#transition > handle LOCALIZATION_FAILED -> DONE upon CONTAINER_RESOURCES_CLEANEDUP > {noformat} > There's no update for state store in this flow now, which is required to > avoid unnecessary localizations after NM restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10406) YARN log processor
[ https://issues.apache.org/jira/browse/YARN-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10406: -- Target Version/s: 3.5.0 (was: 3.4.0) > YARN log processor > -- > > Key: YARN-10406 > URL: https://issues.apache.org/jira/browse/YARN-10406 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Adam Antal >Assignee: Hudáky Márton Gyula >Priority: Critical > > YARN currently does not have any utility that would enable cluster > administrators to display previous actions in a Hadoop YARN cluster in an > offline fashion. > HDFS has the > [OIV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html]/ > > [OEV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html] > which does not require a running cluster to look and modify the filesystem. > A corresponding tool would be very helpful in the context of YARN. > Since ATS is not widespread (is not available for older clusters) and there > isn't a single file or entity that would collect all the > application/container etc. related information, we thought our best option to > parse and process the output of the YARN daemon log files and reconstruct the > history of the cluster from that. We designed and implemented a CLI based > solution that after parsing the log file enables users to query app/container > related information (listing, filtering by certain properties) and search for > common errors like CE failures/error codes, AM preemption or stack traces. > The tool can be integrated into the YARN project as a sub-project. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10474) [JDK 12] TestAsyncDispatcher fails
[ https://issues.apache.org/jira/browse/YARN-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802652#comment-17802652 ] Shilun Fan commented on YARN-10474: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > [JDK 12] TestAsyncDispatcher fails > -- > > Key: YARN-10474 > URL: https://issues.apache.org/jira/browse/YARN-10474 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Priority: Major > > Similar to HDFS-15580. Updating a final variable via reflection is not > allowed in Java 12+. > {noformat} > [INFO] Running org.apache.hadoop.yarn.event.TestAsyncDispatcher > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.953 > s <<< FAILURE! - in org.apache.hadoop.yarn.event.TestAsyncDispatcher > [ERROR] > testPrintDispatcherEventDetails(org.apache.hadoop.yarn.event.TestAsyncDispatcher) > Time elapsed: 0.114 s <<< ERROR! > java.lang.NoSuchFieldException: modifiers > at java.base/java.lang.Class.getDeclaredField(Class.java:2569) > at > org.apache.hadoop.yarn.event.TestAsyncDispatcher.testPrintDispatcherEventDetails(TestAsyncDispatcher.java:152) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:832) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10474) [JDK 12] TestAsyncDispatcher fails
[ https://issues.apache.org/jira/browse/YARN-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10474: -- Target Version/s: 3.5.0 (was: 3.4.0) > [JDK 12] TestAsyncDispatcher fails > -- > > Key: YARN-10474 > URL: https://issues.apache.org/jira/browse/YARN-10474 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Priority: Major > > Similar to HDFS-15580. Updating a final variable via reflection is not > allowed in Java 12+. > {noformat} > [INFO] Running org.apache.hadoop.yarn.event.TestAsyncDispatcher > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.953 > s <<< FAILURE! - in org.apache.hadoop.yarn.event.TestAsyncDispatcher > [ERROR] > testPrintDispatcherEventDetails(org.apache.hadoop.yarn.event.TestAsyncDispatcher) > Time elapsed: 0.114 s <<< ERROR! > java.lang.NoSuchFieldException: modifiers > at java.base/java.lang.Class.getDeclaredField(Class.java:2569) > at > org.apache.hadoop.yarn.event.TestAsyncDispatcher.testPrintDispatcherEventDetails(TestAsyncDispatcher.java:152) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:832) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.
[ https://issues.apache.org/jira/browse/YARN-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802650#comment-17802650 ] Shilun Fan commented on YARN-10546: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Limit application resource reservation on nodes for non-node/rack specific > requests shoud be supported in CS. > - > > Key: YARN-10546 > URL: https://issues.apache.org/jira/browse/YARN-10546 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Just as fixed in YARN-4270 about FairScheduler. > The capacityScheduler should also fixed it. > It is a big problem in production cluster, when it happended. > Also we should support fs convert to cs to support it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.
[ https://issues.apache.org/jira/browse/YARN-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10546: -- Target Version/s: 3.5.0 (was: 3.4.0) > Limit application resource reservation on nodes for non-node/rack specific > requests shoud be supported in CS. > - > > Key: YARN-10546 > URL: https://issues.apache.org/jira/browse/YARN-10546 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Just as fixed in YARN-4270 about FairScheduler. > The capacityScheduler should also fixed it. > It is a big problem in production cluster, when it happended. > Also we should support fs convert to cs to support it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10594) Split the debug log when execute privileged operation
[ https://issues.apache.org/jira/browse/YARN-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10594: -- Target Version/s: 3.5.0 (was: 3.4.0) > Split the debug log when execute privileged operation > - > > Key: YARN-10594 > URL: https://issues.apache.org/jira/browse/YARN-10594 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > > Before execute *exec.execute();* statement should print the command log > rather than after. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10514) Introduce a dominant resource based schedule policy to increase the resource utilization, avoid heavy cluster resource fragments.
[ https://issues.apache.org/jira/browse/YARN-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10514: -- Target Version/s: 3.5.0 (was: 3.4.0) > Introduce a dominant resource based schedule policy to increase the resource > utilization, avoid heavy cluster resource fragments. > - > > Key: YARN-10514 > URL: https://issues.apache.org/jira/browse/YARN-10514 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.0, 3.4.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10514.001.patch > > > When we schedule in multi node lookup policy for async scheduling, or just > use heartbeat update based scheduling, we both meet scheduling fragments. > When cpu-intensive jobs or gpu-intensive or memory-intensive etc, the cluster > will meet heavy waste of resources, so this issue will help to move scheduler > support dominant resource based schedule, to help our cluster get better > resource utilization, also in order to load balance nodemanager resource > distribution. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10514) Introduce a dominant resource based schedule policy to increase the resource utilization, avoid heavy cluster resource fragments.
[ https://issues.apache.org/jira/browse/YARN-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802651#comment-17802651 ] Shilun Fan commented on YARN-10514: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Introduce a dominant resource based schedule policy to increase the resource > utilization, avoid heavy cluster resource fragments. > - > > Key: YARN-10514 > URL: https://issues.apache.org/jira/browse/YARN-10514 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.0, 3.4.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10514.001.patch > > > When we schedule in multi node lookup policy for async scheduling, or just > use heartbeat update based scheduling, we both meet scheduling fragments. > When cpu-intensive jobs or gpu-intensive or memory-intensive etc, the cluster > will meet heavy waste of resources, so this issue will help to move scheduler > support dominant resource based schedule, to help our cluster get better > resource utilization, also in order to load balance nodemanager resource > distribution. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10594) Split the debug log when execute privileged operation
[ https://issues.apache.org/jira/browse/YARN-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802649#comment-17802649 ] Shilun Fan commented on YARN-10594: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Split the debug log when execute privileged operation > - > > Key: YARN-10594 > URL: https://issues.apache.org/jira/browse/YARN-10594 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > > Before execute *exec.execute();* statement should print the command log > rather than after. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10608) Extend yarn.nodemanager.delete.debug-delay-sec to support application level.
[ https://issues.apache.org/jira/browse/YARN-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802648#comment-17802648 ] Shilun Fan commented on YARN-10608: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Extend yarn.nodemanager.delete.debug-delay-sec to support application level. > > > Key: YARN-10608 > URL: https://issues.apache.org/jira/browse/YARN-10608 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > Now the yarn.nodemanager.delete.debug-delay-sec is a cluster level setting. > In our busy production cluster, we set it to 0 default for preventing local > log boom. > But when we need deep into some spark/MR etc jobs errors such as core dump, i > advice to support enable a job level setting for delay of deletion for local > logs for reproduce the error. > > [~wangda] [~tangzhankun] [~xgong] [~epayne] > If you any advice about this support? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10608) Extend yarn.nodemanager.delete.debug-delay-sec to support application level.
[ https://issues.apache.org/jira/browse/YARN-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10608: -- Target Version/s: 3.5.0 (was: 3.4.0) > Extend yarn.nodemanager.delete.debug-delay-sec to support application level. > > > Key: YARN-10608 > URL: https://issues.apache.org/jira/browse/YARN-10608 > Project: Hadoop YARN > Issue Type: New Feature > Components: log-aggregation >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > Now the yarn.nodemanager.delete.debug-delay-sec is a cluster level setting. > In our busy production cluster, we set it to 0 default for preventing local > log boom. > But when we need deep into some spark/MR etc jobs errors such as core dump, i > advice to support enable a job level setting for delay of deletion for local > logs for reproduce the error. > > [~wangda] [~tangzhankun] [~xgong] [~epayne] > If you any advice about this support? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10690) GPU related improvement for better usage.
[ https://issues.apache.org/jira/browse/YARN-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10690: -- Target Version/s: 3.5.0 (was: 3.4.0) > GPU related improvement for better usage. > - > > Key: YARN-10690 > URL: https://issues.apache.org/jira/browse/YARN-10690 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > This Jira will improve GPU for better usage. > cc [~bibinchundatt] [~pbacsko] [~ebadger] [~ztang] [~epayne] [~gandras] > [~bteke] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10764) Add rm dispatcher event metrics in SLS
[ https://issues.apache.org/jira/browse/YARN-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10764: -- Target Version/s: 3.5.0 (was: 3.4.0) > Add rm dispatcher event metrics in SLS > --- > > Key: YARN-10764 > URL: https://issues.apache.org/jira/browse/YARN-10764 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler-load-simulator >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > > We should use SLS to confirm if we can get performance improvement of event > consume time etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10772) Stable API GetApplicationsRequest#newInstance compatibility broken by YARN-8363
[ https://issues.apache.org/jira/browse/YARN-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10772: -- Target Version/s: 3.5.0 (was: 3.4.0) > Stable API GetApplicationsRequest#newInstance compatibility broken by > YARN-8363 > --- > > Key: YARN-10772 > URL: https://issues.apache.org/jira/browse/YARN-10772 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 3.2.0 >Reporter: Wei-Chiu Chuang >Priority: Major > > YARN-8363 migrated our usage of commons-lang to commons-lang3 in 3.2.0. > > Unfortunately, it changed the API signature of > {code:java} > /** > * > * The request from clients to get a report of Applications matching the > * giving application types in the cluster from the > * ResourceManager. > * > * > * @see ApplicationClientProtocol#getApplications(GetApplicationsRequest) > * > * Setting any of the parameters to null, would just disable that > * filter > * > * @param scope {@link ApplicationsRequestScope} to filter by > * @param users list of users to filter by > * @param queues list of scheduler queues to filter by > * @param applicationTypes types of applications > * @param applicationTags application tags to filter by > * @param applicationStates application states to filter by > * @param startRange range of application start times to filter by > * @param finishRange range of application finish times to filter by > * @param limit number of applications to limit to > * @return {@link GetApplicationsRequest} to be used with > * {@link ApplicationClientProtocol#getApplications(GetApplicationsRequest)} > */ > @Public > @Stable > public static GetApplicationsRequest newInstance( > ApplicationsRequestScope scope, > Set users, > Set queues, > Set applicationTypes, > Set applicationTags, > EnumSet applicationStates, > Range startRange, > Range finishRange, > Long limit) { {code} > The startRange and finishRange changed type from LongRange to Range. > It could cause problems when migrating applications, for example, from Hadoop > 3.1 to 3.3. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.
[ https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10738: -- Target Version/s: 3.5.0 (was: 3.4.0) > When multi thread scheduling with multi node, we should shuffle with a gap to > prevent hot accessing nodes. > -- > > Key: YARN-10738 > URL: https://issues.apache.org/jira/browse/YARN-10738 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Now the multi threading scheduling with multi node is not reasonable. > In large clusters, it will cause the hot accessing nodes, which will lead the > abnormal boom node. > Solution: > I think we should shuffle the sorted node (such the available resource sort > policy) with an interval. > I will solve the above problem, and avoid the hot accessing node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.
[ https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802645#comment-17802645 ] Shilun Fan commented on YARN-10738: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > When multi thread scheduling with multi node, we should shuffle with a gap to > prevent hot accessing nodes. > -- > > Key: YARN-10738 > URL: https://issues.apache.org/jira/browse/YARN-10738 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Now the multi threading scheduling with multi node is not reasonable. > In large clusters, it will cause the hot accessing nodes, which will lead the > abnormal boom node. > Solution: > I think we should shuffle the sorted node (such the available resource sort > policy) with an interval. > I will solve the above problem, and avoid the hot accessing node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10772) Stable API GetApplicationsRequest#newInstance compatibility broken by YARN-8363
[ https://issues.apache.org/jira/browse/YARN-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802643#comment-17802643 ] Shilun Fan commented on YARN-10772: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Stable API GetApplicationsRequest#newInstance compatibility broken by > YARN-8363 > --- > > Key: YARN-10772 > URL: https://issues.apache.org/jira/browse/YARN-10772 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 3.2.0 >Reporter: Wei-Chiu Chuang >Priority: Major > > YARN-8363 migrated our usage of commons-lang to commons-lang3 in 3.2.0. > > Unfortunately, it changed the API signature of > {code:java} > /** > * > * The request from clients to get a report of Applications matching the > * giving application types in the cluster from the > * ResourceManager. > * > * > * @see ApplicationClientProtocol#getApplications(GetApplicationsRequest) > * > * Setting any of the parameters to null, would just disable that > * filter > * > * @param scope {@link ApplicationsRequestScope} to filter by > * @param users list of users to filter by > * @param queues list of scheduler queues to filter by > * @param applicationTypes types of applications > * @param applicationTags application tags to filter by > * @param applicationStates application states to filter by > * @param startRange range of application start times to filter by > * @param finishRange range of application finish times to filter by > * @param limit number of applications to limit to > * @return {@link GetApplicationsRequest} to be used with > * {@link ApplicationClientProtocol#getApplications(GetApplicationsRequest)} > */ > @Public > @Stable > public static GetApplicationsRequest newInstance( > ApplicationsRequestScope scope, > Set users, > Set queues, > Set applicationTypes, > Set applicationTags, > EnumSet applicationStates, > Range startRange, > Range finishRange, > Long limit) { {code} > The startRange and finishRange changed type from LongRange to Range. > It could cause problems when migrating applications, for example, from Hadoop > 3.1 to 3.3. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11054) Alleviate LocalJobRunnerMetricName Conflicts
[ https://issues.apache.org/jira/browse/YARN-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802642#comment-17802642 ] Shilun Fan commented on YARN-11054: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Alleviate LocalJobRunnerMetricName Conflicts > > > Key: YARN-11054 > URL: https://issues.apache.org/jira/browse/YARN-11054 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Xingjun Hao >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1 > > Time Spent: 20m > Remaining Estimate: 0h > > In some scenarios, Sqoop will use LocalJobRuner (YarnLocal mode) to run a lot > of jobs, assuming 2 million jobs have been run, LocalJobRunner MetricName > generated by nextInt function is in the range of (0, 2147483647), > Then the probability of conflict is about 2000/2147483647 = 1/1000, which > means that an average of 1 task will fail for every 1000 jobs run. > If LocalJobRunner MetricName is generated by nextLong() whose range is (0, > 9223372036854775807), considering that Long's range is 1 billion times that > of Int, the probability of a new MetricName conflicts is also reduced by one > trillionth times. > the probability of conflict also goes to 1/5 from 1/1000. ( under the > situation that there are about 200million jobs have been run) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11054) Alleviate LocalJobRunnerMetricName Conflicts
[ https://issues.apache.org/jira/browse/YARN-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11054: -- Target Version/s: 3.5.0 (was: 3.4.0) > Alleviate LocalJobRunnerMetricName Conflicts > > > Key: YARN-11054 > URL: https://issues.apache.org/jira/browse/YARN-11054 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Xingjun Hao >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1 > > Time Spent: 20m > Remaining Estimate: 0h > > In some scenarios, Sqoop will use LocalJobRuner (YarnLocal mode) to run a lot > of jobs, assuming 2 million jobs have been run, LocalJobRunner MetricName > generated by nextInt function is in the range of (0, 2147483647), > Then the probability of conflict is about 2000/2147483647 = 1/1000, which > means that an average of 1 task will fail for every 1000 jobs run. > If LocalJobRunner MetricName is generated by nextLong() whose range is (0, > 9223372036854775807), considering that Long's range is 1 billion times that > of Int, the probability of a new MetricName conflicts is also reduced by one > trillionth times. > the probability of conflict also goes to 1/5 from 1/1000. ( under the > situation that there are about 200million jobs have been run) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.
[ https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802641#comment-17802641 ] Shilun Fan commented on YARN-11127: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Potential deadlock in AsyncDispatcher caused by RMNodeImpl, > SchedulerApplicationAttempt and RMAppImpl's lock contention. > > > Key: YARN-11127 > URL: https://issues.apache.org/jira/browse/YARN-11127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Labels: pull-request-available > Attachments: rm-dead-lock.png > > Time Spent: 4h > Remaining Estimate: 0h > > I found rm deadlock in our cluster. It's a low probability event. some > critical jstack information are below: > {code:java} > "RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 > waiting on condition [0x7f85dd00b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f9389aab478> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > - locked <0x7f88db78c5c8> (a > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > "IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 > tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f938976e818> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at >
[jira] [Updated] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.
[ https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11127: -- Target Version/s: 3.5.0 (was: 3.4.0) > Potential deadlock in AsyncDispatcher caused by RMNodeImpl, > SchedulerApplicationAttempt and RMAppImpl's lock contention. > > > Key: YARN-11127 > URL: https://issues.apache.org/jira/browse/YARN-11127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Chenyu Zheng >Assignee: Chenyu Zheng >Priority: Major > Labels: pull-request-available > Attachments: rm-dead-lock.png > > Time Spent: 4h > Remaining Estimate: 0h > > I found rm deadlock in our cluster. It's a low probability event. some > critical jstack information are below: > {code:java} > "RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 > waiting on condition [0x7f85dd00b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f9389aab478> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > - locked <0x7f88db78c5c8> (a > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > "IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 > tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f938976e818> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at >
[jira] [Updated] (YARN-11149) Add regression test cases for YARN-11073
[ https://issues.apache.org/jira/browse/YARN-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11149: -- Target Version/s: 3.5.0 (was: 3.4.0) > Add regression test cases for YARN-11073 > > > Key: YARN-11149 > URL: https://issues.apache.org/jira/browse/YARN-11149 > Project: Hadoop YARN > Issue Type: Test > Components: test >Reporter: Akira Ajisaka >Priority: Major > > Add regression test cases for YARN-11073 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11329) Refactor Router#startWepApp#setupSecurityAndFilters
[ https://issues.apache.org/jira/browse/YARN-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11329: -- Target Version/s: 3.5.0 (was: 3.4.0) > Refactor Router#startWepApp#setupSecurityAndFilters > --- > > Key: YARN-11329 > URL: https://issues.apache.org/jira/browse/YARN-11329 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > > When I read the code, I found that the Router uses > RMWebAppUtil#setupSecurityAndFilters of RM, which means that if the Router > Web wants to enable security-related functions, > it needs to set the parameters of the RM, which seems unreasonable. The > Router should have independent parameters to control Router Web security. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11321) Add RMWebServices#getSchedulerOverview
[ https://issues.apache.org/jira/browse/YARN-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11321: -- Target Version/s: 3.5.0 (was: 3.4.0) > Add RMWebServices#getSchedulerOverview > -- > > Key: YARN-11321 > URL: https://issues.apache.org/jira/browse/YARN-11321 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, resourcemanager, router >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > Add a new interface to display the basic information of scheduling, Such as > ||Col|| > |1.Scheduler Type| > |2.Scheduling Resource Type| > |3.Minimum Allocation| > |4.Maximum Allocation| > |5.Maximum Cluster Application Priority| > |6.Scheduler Busy %| > |7.RM Dispatcher EventQueue Size| > |8.Scheduler Dispatcher EventQueue Size| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11343) Improve FederationStateStore#ApplicationHomeSubCluster CRUD Methods
[ https://issues.apache.org/jira/browse/YARN-11343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11343: -- Target Version/s: 3.5.0 (was: 3.4.0) > Improve FederationStateStore#ApplicationHomeSubCluster CRUD Methods > --- > > Key: YARN-11343 > URL: https://issues.apache.org/jira/browse/YARN-11343 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > > Currently in MemoryFederationStateStore and ZookeeperFederationStateStore, we > only store the mapping relationship between Application and SubClusterId. > In the future, we need more attributes, such as the create time of the > Application, the state of the Application, and the router information of the > Application. We need to refactor the ApplicationHomeSubCluster CRUD Methods > to support the direct use of the ApplicationHomeSubCluster object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11444) Improve YARN md documentation format
[ https://issues.apache.org/jira/browse/YARN-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11444: -- Target Version/s: 3.5.0 (was: 3.4.0) > Improve YARN md documentation format > > > Key: YARN-11444 > URL: https://issues.apache.org/jira/browse/YARN-11444 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > 1. Improve the table format to make the readability better > 2. Modify some typo errors > 3. Modify the list number to display correctly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11335) [Federation] Added JDOFederationStateStore for easy database query.
[ https://issues.apache.org/jira/browse/YARN-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11335: -- Target Version/s: 3.5.0 (was: 3.4.0) > [Federation] Added JDOFederationStateStore for easy database query. > --- > > Key: YARN-11335 > URL: https://issues.apache.org/jira/browse/YARN-11335 > Project: Hadoop YARN > Issue Type: New Feature > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > > [Federation] Added JDOFederationStateStore for easy database query. > We currently have a solution like SQLFederationStateStore, but I think the > following places can be optimized: > 1. We hope that the database can support Oracle and Postgres, but it is very > difficult to provide stored procedure scripts for each database. > It involves a lot of testing and multi-version verification. If possible, we > hope to provide table building statements. > 2. I read the code and found that it is basically the operation of add, > delete, select and update a table. > This can be done for us using the ORM framework, such as DataNucleus. > I plan to provide JDOFederationStateStore which can easily replace > SQLFederationStateStore. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11491) [Federation] Use ZookeeperFederationStateStore as the DefaultStateStore
[ https://issues.apache.org/jira/browse/YARN-11491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11491: -- Target Version/s: 3.5.0 (was: 3.4.0) > [Federation] Use ZookeeperFederationStateStore as the DefaultStateStore > --- > > Key: YARN-11491 > URL: https://issues.apache.org/jira/browse/YARN-11491 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > We currently use MemoryStateStore as the default StateStore, in the > production environment, we should use ZookeeperFederationStateStore as the > default StateStore. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11458) Maven parallel build fails when Yarn UI v2 is enabled
[ https://issues.apache.org/jira/browse/YARN-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11458: -- Target Version/s: 3.3.9, 3.5.0 (was: 3.4.0, 3.3.9) > Maven parallel build fails when Yarn UI v2 is enabled > - > > Key: YARN-11458 > URL: https://issues.apache.org/jira/browse/YARN-11458 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn, yarn-ui-v2 >Affects Versions: 3.4.0, 3.3.5, 3.3.4, 3.3.9 > Environment: The problem occurs sporadically while using the Hadoop > development environment (Ubuntu) >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Critical > > Running a parallel build fails during assembly with the following error when > running either package or install: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-yarn) on > project hadoop-yarn-project: Failed to create assembly: Error creating > assembly archive hadoop-yarn-dist: > /workspace/hadoop-yarn-project/./hadoop-yarn/hadoop-yarn-ui/target/webapp/tmp/sass_compiler-input_base_path-iNYf9pEm.tmp/vendor/ember-qunit/ember-qunit.map > -> [Help 1]{code} > This appears to be a race condition introduce when `-Pyarn-ui` is used > because the `hadoop-yarn-project` does not have a dependency listed for > `yarn-ui`. > The command executed was: > {code:java} > $ mvn -nsu clean install -Pdist,native -DskipTests -Dtar > -Dmaven.javadoc.skip=true -T 2C {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11527) Upgrade node.js to 14.0.0 in YARN application catalog webapp
[ https://issues.apache.org/jira/browse/YARN-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11527: -- Target Version/s: 3.5.0 (was: 3.4.0) > Upgrade node.js to 14.0.0 in YARN application catalog webapp > > > Key: YARN-11527 > URL: https://issues.apache.org/jira/browse/YARN-11527 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.4.0 > Environment: Upgrade node.js to 14.0.0 in YARN application catalog > webapp >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > {code:java} > [INFO] yarn install v1.22.5 > [INFO] info No lockfile found. > [INFO] [1/4] Resolving packages... > [INFO] warning angular-route@1.6.10: For the actively supported Angular, see > https://www.npmjs.com/package/@angular/core. AngularJS support has officially > ended. For extended AngularJS support options, see > https://goo.gle/angularjs-path-forward. > [INFO] warning angular@1.6.10: For the actively supported Angular, see > https://www.npmjs.com/package/@angular/core. AngularJS support has officially > ended. For extended AngularJS support options, see > https://goo.gle/angularjs-path-forward. > [INFO] [2/4] Fetching packages... > [INFO] error triple-beam@1.4.1: The engine "node" is incompatible with this > module. Expected version ">= 14.0.0". Got "12.22.1" > [INFO] error Found incompatible module. > [INFO] error Found incompatible module.info Visit > https://yarnpkg.com/en/docs/cli/install for documentation about this command. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11527) Upgrade node.js to 14.0.0 in YARN application catalog webapp
[ https://issues.apache.org/jira/browse/YARN-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802636#comment-17802636 ] Shilun Fan commented on YARN-11527: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Upgrade node.js to 14.0.0 in YARN application catalog webapp > > > Key: YARN-11527 > URL: https://issues.apache.org/jira/browse/YARN-11527 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.4.0 > Environment: Upgrade node.js to 14.0.0 in YARN application catalog > webapp >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > {code:java} > [INFO] yarn install v1.22.5 > [INFO] info No lockfile found. > [INFO] [1/4] Resolving packages... > [INFO] warning angular-route@1.6.10: For the actively supported Angular, see > https://www.npmjs.com/package/@angular/core. AngularJS support has officially > ended. For extended AngularJS support options, see > https://goo.gle/angularjs-path-forward. > [INFO] warning angular@1.6.10: For the actively supported Angular, see > https://www.npmjs.com/package/@angular/core. AngularJS support has officially > ended. For extended AngularJS support options, see > https://goo.gle/angularjs-path-forward. > [INFO] [2/4] Fetching packages... > [INFO] error triple-beam@1.4.1: The engine "node" is incompatible with this > module. Expected version ">= 14.0.0". Got "12.22.1" > [INFO] error Found incompatible module. > [INFO] error Found incompatible module.info Visit > https://yarnpkg.com/en/docs/cli/install for documentation about this command. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11458) Maven parallel build fails when Yarn UI v2 is enabled
[ https://issues.apache.org/jira/browse/YARN-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802637#comment-17802637 ] Shilun Fan commented on YARN-11458: --- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Maven parallel build fails when Yarn UI v2 is enabled > - > > Key: YARN-11458 > URL: https://issues.apache.org/jira/browse/YARN-11458 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn, yarn-ui-v2 >Affects Versions: 3.4.0, 3.3.5, 3.3.4, 3.3.9 > Environment: The problem occurs sporadically while using the Hadoop > development environment (Ubuntu) >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Critical > > Running a parallel build fails during assembly with the following error when > running either package or install: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-yarn) on > project hadoop-yarn-project: Failed to create assembly: Error creating > assembly archive hadoop-yarn-dist: > /workspace/hadoop-yarn-project/./hadoop-yarn/hadoop-yarn-ui/target/webapp/tmp/sass_compiler-input_base_path-iNYf9pEm.tmp/vendor/ember-qunit/ember-qunit.map > -> [Help 1]{code} > This appears to be a race condition introduce when `-Pyarn-ui` is used > because the `hadoop-yarn-project` does not have a dependency listed for > `yarn-ui`. > The command executed was: > {code:java} > $ mvn -nsu clean install -Pdist,native -DskipTests -Dtar > -Dmaven.javadoc.skip=true -T 2C {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11631) [GPG] Add GPGWebServices
[ https://issues.apache.org/jira/browse/YARN-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11631: -- Target Version/s: 3.5.0 (was: 3.4.0) > [GPG] Add GPGWebServices > > > Key: YARN-11631 > URL: https://issues.apache.org/jira/browse/YARN-11631 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11529) Add metrics for ContainerMonitorImpl.
[ https://issues.apache.org/jira/browse/YARN-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved YARN-11529. --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Add metrics for ContainerMonitorImpl. > - > > Key: YARN-11529 > URL: https://issues.apache.org/jira/browse/YARN-11529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Assignee: Xianming Lei >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > In our production environment, we have ample machine resources and a > significant number of active Containers. However, the MonitoringThread in > ContainerMonitorImpl experiences significant latency during each execution. > To address this, it is highly recommended to incorporate metrics for > monitoring the duration of this time-consuming process. > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11635) Fix hadoop-yarn-server-nodemanager module Java Doc Errors.
[ https://issues.apache.org/jira/browse/YARN-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11635: -- Target Version/s: (was: 3.4.0) > Fix hadoop-yarn-server-nodemanager module Java Doc Errors. > -- > > Key: YARN-11635 > URL: https://issues.apache.org/jira/browse/YARN-11635 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > > I noticed that nodemanager had some java doc errors when compiling with > JDK11. In this jira, I will fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801963#comment-17801963 ] Shilun Fan commented on YARN-7592: -- I will continue to follow up on this issue in the next 1-2 days. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > > yarn.federation.failover.enabled should be documented in yarn-default.xml. I > am also not sure why it should be true by default and force the HA retry > policy in {{RMProxy#createRMProxy}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11632) [Doc] Add allow-partial-result description to Yarn Federation documentation
[ https://issues.apache.org/jira/browse/YARN-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved YARN-11632. --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > [Doc] Add allow-partial-result description to Yarn Federation documentation > --- > > Key: YARN-11632 > URL: https://issues.apache.org/jira/browse/YARN-11632 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Add allow-partial-result description to Yarn Federation documentation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11638) [GPG] GPG Support CLI.
Shilun Fan created YARN-11638: - Summary: [GPG] GPG Support CLI. Key: YARN-11638 URL: https://issues.apache.org/jira/browse/YARN-11638 Project: Hadoop YARN Issue Type: Sub-task Components: federation Affects Versions: 3.4.0 Reporter: Shilun Fan Assignee: Shilun Fan We will add a set of command lines to GPG so that GPG can better refresh the policy and provide some other convenient functions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799732#comment-17799732 ] Shilun Fan commented on YARN-7592: -- [~it_singer] Thank you for reporting this issue! I will reply as soon as possible on how to handle this issue. > yarn.federation.failover.enabled missing in yarn-default.xml > > > Key: YARN-7592 > URL: https://issues.apache.org/jira/browse/YARN-7592 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1 >Reporter: Gera Shegalov >Priority: Major > Attachments: IssueReproduce.patch > > > yarn.federation.failover.enabled should be documented in yarn-default.xml. I > am also not sure why it should be true by default and force the HA retry > policy in {{RMProxy#createRMProxy}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org