[jira] [Commented] (YARN-7912) While launching Native Service app from UI, consider service owner name from user.name query parameter

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802687#comment-17802687
 ] 

Shilun Fan commented on YARN-7912:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> While launching Native Service app from UI, consider service owner name from 
> user.name query parameter
> --
>
> Key: YARN-7912
> URL: https://issues.apache.org/jira/browse/YARN-7912
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Priority: Major
>
> As per comments from [~eyang] in YARN-7827, 
> "For supporting knox, it would be good for javascript to detect the url 
> entering /ui2 and process [user.name|http://user.name/] property.  If there 
> isn't one found, then proceed with ajax call to resource manager to find out 
> who is the current user to pass the parameter along the rest api calls."
> This Jira will track to handle this. This is now pending feasibility check.
> Thanks [~eyang] and [~jianhe]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7884) Race condition in registering YARN service in ZooKeeper

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802688#comment-17802688
 ] 

Shilun Fan commented on YARN-7884:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Race condition in registering YARN service in ZooKeeper
> ---
>
> Key: YARN-7884
> URL: https://issues.apache.org/jira/browse/YARN-7884
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Major
>
> In Kerberos enabled cluster, there seems to be a race condition for 
> registering YARN service.
> Yarn-service znode creation seems to happen after AM started and reporting 
> back to update components information.  For some reason, Yarnservice znode 
> should have access to create the znode, but reported NoAuth.
> {code}
> 2018-02-02 22:53:30,442 [main] INFO  service.ServiceScheduler - Set registry 
> user accounts: sasl:hbase
> 2018-02-02 22:53:30,471 [main] INFO  zk.RegistrySecurity - Registry default 
> system acls: 
> [1,s{'world,'anyone}
> , 31,s{'sasl,'yarn}
> , 31,s{'sasl,'jhs}
> , 31,s{'sasl,'hdfs-demo}
> , 31,s{'sasl,'rm}
> , 31,s{'sasl,'hive}
> ]
> 2018-02-02 22:53:30,472 [main] INFO  zk.RegistrySecurity - Registry User ACLs 
> [31,s{'sasl,'hbase}
> , 31,s{'sasl,'hbase}
> ]
> 2018-02-02 22:53:30,503 [main] INFO  event.AsyncDispatcher - Registering 
> class org.apache.hadoop.yarn.service.component.ComponentEventType for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
> 2018-02-02 22:53:30,504 [main] INFO  event.AsyncDispatcher - Registering 
> class 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType 
> for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
> 2018-02-02 22:53:30,528 [main] INFO  impl.NMClientAsyncImpl - Upper bound of 
> the thread pool size is 500
> 2018-02-02 22:53:30,531 [main] INFO  service.ServiceMaster - Starting service 
> as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS)
> 2018-02-02 22:53:30,545 [main] INFO  ipc.CallQueueManager - Using callQueue: 
> class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler
> 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO  ipc.Server - 
> Starting Socket Reader #1 for port 56859
> 2018-02-02 22:53:30,589 [main] INFO  pb.RpcServerFactoryPBImpl - Adding 
> protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to 
> the server
> 2018-02-02 22:53:30,606 [IPC Server Responder] INFO  ipc.Server - IPC Server 
> Responder: starting
> 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO  ipc.Server - IPC 
> Server listener on 56859: starting
> 2018-02-02 22:53:30,607 [main] INFO  service.ClientAMService - Instantiated 
> ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
> 2018-02-02 22:53:30,609 [main] INFO  zk.CuratorService - Creating 
> CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 
> 2018-02-02 22:53:30,615 [main] INFO  zk.RegistrySecurity - Enabling ZK sasl 
> client: jaasClientEntry = Client, principal = 
> hbase/eyang-5.openstacklo...@example.com, keytab = 
> /etc/security/keytabs/hbase.service.keytab
> 2018-02-02 22:53:30,752 [main] INFO  client.RMProxy - Connecting to 
> ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
> 2018-02-02 22:53:30,909 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1517611904996_0001_01, abc into registry
> 2018-02-02 22:53:30,911 [main] INFO  service.ServiceScheduler - Received 0 
> containers from previous attempt.
> 2018-02-02 22:53:31,072 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: `/users/hbase/services/yarn-service/abc/components': No 
> such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hbase/services/yarn-service/abc/components
> 2018-02-02 22:53:31,074 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component sleeper
> 2018-02-02 22:53:31,075 [main] INFO  component.Component - [INIT COMPONENT 
> sleeper]: 2 instances.
> 2018-02-02 22:53:31,094 [main] INFO  component.Component - [COMPONENT 
> sleeper] Transitioned from INIT to FLEXING on FLEX event.
> 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - 
> Failed to register app abc in registry
> org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: 
> `/registry/users/hbase/services/yarn-service/abc': Not authorized to access 
> path; ACLs: [
> 0x01: 'world,'anyone
>  0x1f: 'sasl,'yarn
>  0x1f: 'sasl,'jhs
>  0x1f: 'sasl,'hdfs-demo
>  0x1f: 'sasl,'rm
>  0x1f: 'sasl,'hive
>  0x1f: 'sasl,'hbase
>  

[jira] [Updated] (YARN-7884) Race condition in registering YARN service in ZooKeeper

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-7884:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Race condition in registering YARN service in ZooKeeper
> ---
>
> Key: YARN-7884
> URL: https://issues.apache.org/jira/browse/YARN-7884
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Major
>
> In Kerberos enabled cluster, there seems to be a race condition for 
> registering YARN service.
> Yarn-service znode creation seems to happen after AM started and reporting 
> back to update components information.  For some reason, Yarnservice znode 
> should have access to create the znode, but reported NoAuth.
> {code}
> 2018-02-02 22:53:30,442 [main] INFO  service.ServiceScheduler - Set registry 
> user accounts: sasl:hbase
> 2018-02-02 22:53:30,471 [main] INFO  zk.RegistrySecurity - Registry default 
> system acls: 
> [1,s{'world,'anyone}
> , 31,s{'sasl,'yarn}
> , 31,s{'sasl,'jhs}
> , 31,s{'sasl,'hdfs-demo}
> , 31,s{'sasl,'rm}
> , 31,s{'sasl,'hive}
> ]
> 2018-02-02 22:53:30,472 [main] INFO  zk.RegistrySecurity - Registry User ACLs 
> [31,s{'sasl,'hbase}
> , 31,s{'sasl,'hbase}
> ]
> 2018-02-02 22:53:30,503 [main] INFO  event.AsyncDispatcher - Registering 
> class org.apache.hadoop.yarn.service.component.ComponentEventType for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler
> 2018-02-02 22:53:30,504 [main] INFO  event.AsyncDispatcher - Registering 
> class 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType 
> for class 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler
> 2018-02-02 22:53:30,528 [main] INFO  impl.NMClientAsyncImpl - Upper bound of 
> the thread pool size is 500
> 2018-02-02 22:53:30,531 [main] INFO  service.ServiceMaster - Starting service 
> as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS)
> 2018-02-02 22:53:30,545 [main] INFO  ipc.CallQueueManager - Using callQueue: 
> class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: 
> class org.apache.hadoop.ipc.DefaultRpcScheduler
> 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO  ipc.Server - 
> Starting Socket Reader #1 for port 56859
> 2018-02-02 22:53:30,589 [main] INFO  pb.RpcServerFactoryPBImpl - Adding 
> protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to 
> the server
> 2018-02-02 22:53:30,606 [IPC Server Responder] INFO  ipc.Server - IPC Server 
> Responder: starting
> 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO  ipc.Server - IPC 
> Server listener on 56859: starting
> 2018-02-02 22:53:30,607 [main] INFO  service.ClientAMService - Instantiated 
> ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859
> 2018-02-02 22:53:30,609 [main] INFO  zk.CuratorService - Creating 
> CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 
> 2018-02-02 22:53:30,615 [main] INFO  zk.RegistrySecurity - Enabling ZK sasl 
> client: jaasClientEntry = Client, principal = 
> hbase/eyang-5.openstacklo...@example.com, keytab = 
> /etc/security/keytabs/hbase.service.keytab
> 2018-02-02 22:53:30,752 [main] INFO  client.RMProxy - Connecting to 
> ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032
> 2018-02-02 22:53:30,909 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1517611904996_0001_01, abc into registry
> 2018-02-02 22:53:30,911 [main] INFO  service.ServiceScheduler - Received 0 
> containers from previous attempt.
> 2018-02-02 22:53:31,072 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: `/users/hbase/services/yarn-service/abc/components': No 
> such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hbase/services/yarn-service/abc/components
> 2018-02-02 22:53:31,074 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component sleeper
> 2018-02-02 22:53:31,075 [main] INFO  component.Component - [INIT COMPONENT 
> sleeper]: 2 instances.
> 2018-02-02 22:53:31,094 [main] INFO  component.Component - [COMPONENT 
> sleeper] Transitioned from INIT to FLEXING on FLEX event.
> 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - 
> Failed to register app abc in registry
> org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: 
> `/registry/users/hbase/services/yarn-service/abc': Not authorized to access 
> path; ACLs: [
> 0x01: 'world,'anyone
>  0x1f: 'sasl,'yarn
>  0x1f: 'sasl,'jhs
>  0x1f: 'sasl,'hdfs-demo
>  0x1f: 'sasl,'rm
>  0x1f: 'sasl,'hive
>  0x1f: 'sasl,'hbase
>  0x1f: 'sasl,'hbase
>  ]: KeeperErrorCode = NoAuth for 
> 

[jira] [Updated] (YARN-7844) Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-7844:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX
> 
>
> Key: YARN-7844
> URL: https://issues.apache.org/jira/browse/YARN-7844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
> Attachments: YARN-7844.000.patch, YARN-7844.001.patch
>
>
> Currently FairScheduler's FSOpDurations records some scheduler operation 
> metrics: nodeUpdateCall, preemptCall, etc. We may need similar for 
> CapacityScheduler. Also, need to add more metrics there. This could help 
> monitor the RM scheduler performance, and get more insights whether scheduler 
> is under-pressure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7882) Server side proxy for UI2 log viewer

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-7882:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Server side proxy for UI2 log viewer
> 
>
> Key: YARN-7882
> URL: https://issues.apache.org/jira/browse/YARN-7882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security, timelineserver, yarn-ui-v2
>Affects Versions: 3.0.0
>Reporter: Eric Yang
>Priority: Major
>
> When viewing container logs in UI2, the log files are directly fetched 
> through timeline server 2.  Hadoop in simple security mode does not have 
> authenticator to make sure the user is authorized to view the log.  The 
> general practice is to use knox or other security proxy to authenticate the 
> user and reserve proxy the request to Hadoop UI to ensure the information 
> does not leak through anonymous user.  The current implementation of UI2 log 
> viewer uses ajax code to timeline server 2.  This could prevent knox or 
> reverse proxy software from working properly with the new design.  It would 
> be good to perform server side proxy to prevent browser from side step the 
> authentication check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8149:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Revisit behavior of Re-Reservation in Capacity Scheduler
> 
>
> Key: YARN-8149
> URL: https://issues.apache.org/jira/browse/YARN-8149
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Major
>
> Frankly speaking, I'm not sure why we need the re-reservation. The formula is 
> not that easy to understand:
> Inside: 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}}
> {code:java}
> starvation = re-reservation / (#reserved-container * 
>  (1 - min(requested-resource / max-alloc, 
>   max-alloc - min-alloc / max-alloc))
> should_allocate = starvation + requiredContainers - reservedContainers > 
> 0{code}
> I think we should be able to remove the starvation computation, just to check 
> requiredContainers > reservedContainers should be enough.
> In a large cluster, we can easily overflow re-reservation to MAX_INT, see 
> YARN-7636. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8149) Revisit behavior of Re-Reservation in Capacity Scheduler

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802684#comment-17802684
 ] 

Shilun Fan commented on YARN-8149:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Revisit behavior of Re-Reservation in Capacity Scheduler
> 
>
> Key: YARN-8149
> URL: https://issues.apache.org/jira/browse/YARN-8149
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Major
>
> Frankly speaking, I'm not sure why we need the re-reservation. The formula is 
> not that easy to understand:
> Inside: 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator#shouldAllocOrReserveNewContainer}}
> {code:java}
> starvation = re-reservation / (#reserved-container * 
>  (1 - min(requested-resource / max-alloc, 
>   max-alloc - min-alloc / max-alloc))
> should_allocate = starvation + requiredContainers - reservedContainers > 
> 0{code}
> I think we should be able to remove the starvation computation, just to check 
> requiredContainers > reservedContainers should be enough.
> In a large cluster, we can easily overflow re-reservation to MAX_INT, see 
> YARN-7636. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802686#comment-17802686
 ] 

Shilun Fan commented on YARN-8012:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8074) Support placement policy composite constraints in YARN Service

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8074:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Support placement policy composite constraints in YARN Service
> --
>
> Key: YARN-8074
> URL: https://issues.apache.org/jira/browse/YARN-8074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
>
> This is a follow up of YARN-7142 where we support more advanced placement 
> policy features like creating composite constraints by exposing expressions 
> in YARN Service specification.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8012:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7912) While launching Native Service app from UI, consider service owner name from user.name query parameter

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-7912:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> While launching Native Service app from UI, consider service owner name from 
> user.name query parameter
> --
>
> Key: YARN-7912
> URL: https://issues.apache.org/jira/browse/YARN-7912
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Priority: Major
>
> As per comments from [~eyang] in YARN-7827, 
> "For supporting knox, it would be good for javascript to detect the url 
> entering /ui2 and process [user.name|http://user.name/] property.  If there 
> isn't one found, then proceed with ajax call to resource manager to find out 
> who is the current user to pass the parameter along the rest api calls."
> This Jira will track to handle this. This is now pending feasibility check.
> Thanks [~eyang] and [~jianhe]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8161) ServiceState FLEX should be removed

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8161:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> ServiceState FLEX should be removed
> ---
>
> Key: YARN-8161
> URL: https://issues.apache.org/jira/browse/YARN-8161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Priority: Major
>
> ServiceState FLEX is not required to trigger flex up/down of containers and 
> should be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8192) Introduce container readiness check type

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802682#comment-17802682
 ] 

Shilun Fan commented on YARN-8192:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Introduce container readiness check type
> 
>
> Key: YARN-8192
> URL: https://issues.apache.org/jira/browse/YARN-8192
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8192.1.patch, YARN-8192.2.patch
>
>
> In some cases, the AM may not be able to perform a readiness check for a 
> container. For example, if a docker container is using a custom network type, 
> its IP may not be reachable from the AM. In this case, the AM could request a 
> new container to perform a readiness command, and use the exit status of the 
> container to determine whether the readiness check succeeded or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8256) Pluggable provider for node membership management

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802681#comment-17802681
 ] 

Shilun Fan commented on YARN-8256:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Pluggable provider for node membership management
> -
>
> Key: YARN-8256
> URL: https://issues.apache.org/jira/browse/YARN-8256
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.8.3, 3.0.2
>Reporter: Dagang Wei
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> h1. Background
> HDFS-7541 introduced a pluggable provider framework for node membership 
> management, which gives HDFS the flexibility to have different ways to manage 
> node membership for different needs.
> [org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java]
>  is the class which provides the abstraction. Currently, there are 2 
> implementations in the HDFS codebase:
> 1) 
> [org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java]
>  which uses 2 config files which are defined by the properties dfs.hosts and 
> dfs.hosts.exclude.
> 2) 
> [org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java]
>  which uses a single JSON file defined by the property dfs.hosts.
> dfs.namenode.hosts.provider.classname is the property determining which 
> implementation is used
> h1. Problem
> YARN should be consistent with HDFS in terms of pluggable provider for node 
> membership management. The absence of it makes YARN impossible to have other 
> config sources, e.g., ZooKeeper, database, other config file formats, etc.
> h1. Proposed solution
> [org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java]
>  is the class for managing YARN node membership today. It uses 
> [HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java]
>  to read config files specified by the property 
> yarn.resourcemanager.nodes.include-path for nodes to include and 
> yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude.
> The proposed solution is to
> 1) introduce a new interface {color:#008000}HostsConfigManager{color} which 
> provides the abstraction for node membership management. Update 
> {color:#008000}NodeListManager{color} to depend on 
> {color:#008000}HostsConfigManager{color} instead of 
> {color:#008000}HostsFileReader{color}. Then create a wrapper class for 
> {color:#008000}HostsFileReader{color} which implements the interface.
> 2) introduce a new config property 
> {color:#008000}yarn.resourcemanager.hosts-config.manager.class{color} for 
> specifying the implementation class. Set the default value to the wrapper 
> class of {color:#008000}HostsFileReader{color} for backward compatibility 
> between new code and old config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8192) Introduce container readiness check type

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8192:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Introduce container readiness check type
> 
>
> Key: YARN-8192
> URL: https://issues.apache.org/jira/browse/YARN-8192
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8192.1.patch, YARN-8192.2.patch
>
>
> In some cases, the AM may not be able to perform a readiness check for a 
> container. For example, if a docker container is using a custom network type, 
> its IP may not be reachable from the AM. In this case, the AM could request a 
> new container to perform a readiness command, and use the exit status of the 
> container to determine whether the readiness check succeeded or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8256) Pluggable provider for node membership management

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8256:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Pluggable provider for node membership management
> -
>
> Key: YARN-8256
> URL: https://issues.apache.org/jira/browse/YARN-8256
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.8.3, 3.0.2
>Reporter: Dagang Wei
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> h1. Background
> HDFS-7541 introduced a pluggable provider framework for node membership 
> management, which gives HDFS the flexibility to have different ways to manage 
> node membership for different needs.
> [org.apache.hadoop.hdfs.server.blockmanagement.HostConfigManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostConfigManager.java]
>  is the class which provides the abstraction. Currently, there are 2 
> implementations in the HDFS codebase:
> 1) 
> [org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java]
>  which uses 2 config files which are defined by the properties dfs.hosts and 
> dfs.hosts.exclude.
> 2) 
> [org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java]
>  which uses a single JSON file defined by the property dfs.hosts.
> dfs.namenode.hosts.provider.classname is the property determining which 
> implementation is used
> h1. Problem
> YARN should be consistent with HDFS in terms of pluggable provider for node 
> membership management. The absence of it makes YARN impossible to have other 
> config sources, e.g., ZooKeeper, database, other config file formats, etc.
> h1. Proposed solution
> [org.apache.hadoop.yarn.server.resourcemanager.NodesListManager|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java]
>  is the class for managing YARN node membership today. It uses 
> [HostsFileReader|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/HostsFileReader.java]
>  to read config files specified by the property 
> yarn.resourcemanager.nodes.include-path for nodes to include and 
> yarn.resourcemanager.nodes.nodes.exclude-path for nodes to exclude.
> The proposed solution is to
> 1) introduce a new interface {color:#008000}HostsConfigManager{color} which 
> provides the abstraction for node membership management. Update 
> {color:#008000}NodeListManager{color} to depend on 
> {color:#008000}HostsConfigManager{color} instead of 
> {color:#008000}HostsFileReader{color}. Then create a wrapper class for 
> {color:#008000}HostsFileReader{color} which implements the interface.
> 2) introduce a new config property 
> {color:#008000}yarn.resourcemanager.hosts-config.manager.class{color} for 
> specifying the implementation class. Set the default value to the wrapper 
> class of {color:#008000}HostsFileReader{color} for backward compatibility 
> between new code and old config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8258:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> YARN webappcontext for UI2 should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Major
> Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, 
> YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, 
> YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, 
> YARN-8258.007.patch, YARN-8258.008.patch, YARN-8258.009.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802680#comment-17802680
 ] 

Shilun Fan commented on YARN-8258:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> YARN webappcontext for UI2 should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Major
> Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, 
> YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, 
> YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, 
> YARN-8258.007.patch, YARN-8258.008.patch, YARN-8258.009.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8340) Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more resources enabled.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8340:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more 
> resources enabled.
> -
>
> Key: YARN-8340
> URL: https://issues.apache.org/jira/browse/YARN-8340
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Refer to comment from [~eepayne] and discussion below that: 
> https://issues.apache.org/jira/browse/YARN-8292?focusedCommentId=16482689=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482689
>  for details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8340) Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more resources enabled.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802679#comment-17802679
 ] 

Shilun Fan commented on YARN-8340:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more 
> resources enabled.
> -
>
> Key: YARN-8340
> URL: https://issues.apache.org/jira/browse/YARN-8340
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Refer to comment from [~eepayne] and discussion below that: 
> https://issues.apache.org/jira/browse/YARN-8292?focusedCommentId=16482689=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482689
>  for details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8509:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> ---
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: capacityscheduler
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8366) Expose debug log information when user intend to enable GPU without setting nvidia-smi path

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8366:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Expose debug log information when user intend to enable GPU without setting 
> nvidia-smi path
> ---
>
> Key: YARN-8366
> URL: https://issues.apache.org/jira/browse/YARN-8366
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>
> Expose Debug information help user found the root cause of failure when user 
> don't make these two settings manually before enabling GPU on YARN
> 1. yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables in 
> yarn-site.xml
> 2. environment variable LD_LIBRARY_PATH



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8733) Readiness check for remote component

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8733:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Readiness check for remote component
> 
>
> Key: YARN-8733
> URL: https://issues.apache.org/jira/browse/YARN-8733
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Major
>
> When a service is deploying, there can be remote component dependency between 
> services.  For example, Hive server 2 can depend on Hive metastore, which 
> depends on a remote MySQL database.  It would be great to have ability to 
> check the remote server and port to make sure MySQL is available before 
> deploying Hive LLAP service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802677#comment-17802677
 ] 

Shilun Fan commented on YARN-8509:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> ---
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: capacityscheduler
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802674#comment-17802674
 ] 

Shilun Fan commented on YARN-8779:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Fix few discrepancies between YARN Service swagger spec and code
> 
>
> Key: YARN-8779
> URL: https://issues.apache.org/jira/browse/YARN-8779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Gour Saha
>Priority: Major
>
> Following issues were identified in YARN Service swagger definition during an 
> effort to integrate with a running service by generating Java and Go 
> client-side stubs from the spec -
>  
> 1.
> *restartPolicy* is wrong and should be *restart_policy*
>  
> 2.
> A DELETE request to a non-existing service (or a previously existing but 
> deleted service) throws an ApiException instead of something like 
> NotFoundException (the equivalent of 404). Note, DELETE of an existing 
> service behaves fine.
>  
> 3.
> The response code of DELETE request is 200. The spec says 204. Since the 
> response has a payload, the spec should be updated to 200 instead of 204.
>  
> 4.
>  _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method 
> does not return a Service object. Swagger definition has the below bug in GET 
> response of */app/v1/services/\{service_name}* -
> {code:java}
> type: object
> items:
>   $ref: '#/definitions/Service'
> {code}
> It should be -
> {code:java}
> $ref: '#/definitions/Service'
> {code}
>  
> 5.
> Serialization issues were seen in all enum classes - ServiceState.java, 
> ContainerState.java, ComponentState.java, PlacementType.java and 
> PlacementScope.java.
> Java client threw the below exception for ServiceState -
> {code:java}
> Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: 
> Cannot construct instance of 
> `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one 
> Creator exists): no String-argument constructor/factory method to deserialize 
> from String value ('ACCEPTED')
>  at [Source: 
> (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
>  line: 1, column: 121] (through reference chain: 
> org.apache.cb.yarn.service.api.records.Service["state”])
> {code}
> For Golang we saw this for ContainerState -
> {code:java}
> ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot 
> unmarshal string into Go struct field Container.state of type 
> yarnmodel.ContainerState 
> {code}
>  
> 6.
> *launch_time* actually returns an integer but swagger definition says date. 
> Hence, the following exception is seen on the client side -
> {code:java}
> Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: 
> Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or 
> string.
>  at [Source: 
> (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
>  line: 1, column: 477] (through reference chain: 
> org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”])
> {code}
>  
> 8.
> *user.name* query param with a valid value is required for all API calls to 
> an unsecure cluster. This is not defined in the spec.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8583) Inconsistency in YARN status command

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802676#comment-17802676
 ] 

Shilun Fan commented on YARN-8583:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Inconsistency in YARN status command
> 
>
> Key: YARN-8583
> URL: https://issues.apache.org/jira/browse/YARN-8583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
>
> YARN app -status command can report base on application ID or application 
> name with some usability limitation.  Application ID is globally unique, and 
> it allows any user to query application status of any application.  
> Application name is not globally unique, and it will only work for querying 
> user's own application.  This is somewhat restrictive for application 
> administrator, but allowing other user to query any other user's application 
> could consider a security hole as well.  There are two possible options to 
> reduce the inconsistency:
> Option 1.  Block other user from query application status.  This may improve 
> security in some sense, but it is an incompatible change.  This is a simpler 
> change by matching the owner of the application, and decide to report or not 
> report.
> Option 2.  Add --user parameter to allow administrator to query application 
> name ran by other user.  This is a bigger change because application metadata 
> is stored in user's own hdfs directory.  There are security restriction that 
> need to be defined.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8583) Inconsistency in YARN status command

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8583:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Inconsistency in YARN status command
> 
>
> Key: YARN-8583
> URL: https://issues.apache.org/jira/browse/YARN-8583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
>
> YARN app -status command can report base on application ID or application 
> name with some usability limitation.  Application ID is globally unique, and 
> it allows any user to query application status of any application.  
> Application name is not globally unique, and it will only work for querying 
> user's own application.  This is somewhat restrictive for application 
> administrator, but allowing other user to query any other user's application 
> could consider a security hole as well.  There are two possible options to 
> reduce the inconsistency:
> Option 1.  Block other user from query application status.  This may improve 
> security in some sense, but it is an incompatible change.  This is a simpler 
> change by matching the owner of the application, and decide to report or not 
> report.
> Option 2.  Add --user parameter to allow administrator to query application 
> name ran by other user.  This is a bigger change because application metadata 
> is stored in user's own hdfs directory.  There are security restriction that 
> need to be defined.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9415) Document FS placement rule changes from YARN-8967

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802670#comment-17802670
 ] 

Shilun Fan commented on YARN-9415:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Document FS placement rule changes from YARN-8967
> -
>
> Key: YARN-9415
> URL: https://issues.apache.org/jira/browse/YARN-9415
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> With the changes introduced by YARN-8967 we now allow parent rules on all 
> existing rules. This should be documented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8779:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Fix few discrepancies between YARN Service swagger spec and code
> 
>
> Key: YARN-8779
> URL: https://issues.apache.org/jira/browse/YARN-8779
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Gour Saha
>Priority: Major
>
> Following issues were identified in YARN Service swagger definition during an 
> effort to integrate with a running service by generating Java and Go 
> client-side stubs from the spec -
>  
> 1.
> *restartPolicy* is wrong and should be *restart_policy*
>  
> 2.
> A DELETE request to a non-existing service (or a previously existing but 
> deleted service) throws an ApiException instead of something like 
> NotFoundException (the equivalent of 404). Note, DELETE of an existing 
> service behaves fine.
>  
> 3.
> The response code of DELETE request is 200. The spec says 204. Since the 
> response has a payload, the spec should be updated to 200 instead of 204.
>  
> 4.
>  _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method 
> does not return a Service object. Swagger definition has the below bug in GET 
> response of */app/v1/services/\{service_name}* -
> {code:java}
> type: object
> items:
>   $ref: '#/definitions/Service'
> {code}
> It should be -
> {code:java}
> $ref: '#/definitions/Service'
> {code}
>  
> 5.
> Serialization issues were seen in all enum classes - ServiceState.java, 
> ContainerState.java, ComponentState.java, PlacementType.java and 
> PlacementScope.java.
> Java client threw the below exception for ServiceState -
> {code:java}
> Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: 
> Cannot construct instance of 
> `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one 
> Creator exists): no String-argument constructor/factory method to deserialize 
> from String value ('ACCEPTED')
>  at [Source: 
> (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
>  line: 1, column: 121] (through reference chain: 
> org.apache.cb.yarn.service.api.records.Service["state”])
> {code}
> For Golang we saw this for ContainerState -
> {code:java}
> ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot 
> unmarshal string into Go struct field Container.state of type 
> yarnmodel.ContainerState 
> {code}
>  
> 6.
> *launch_time* actually returns an integer but swagger definition says date. 
> Hence, the following exception is seen on the client side -
> {code:java}
> Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: 
> Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or 
> string.
>  at [Source: 
> (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
>  line: 1, column: 477] (through reference chain: 
> org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”])
> {code}
>  
> 8.
> *user.name* query param with a valid value is required for all API calls to 
> an unsecure cluster. This is not defined in the spec.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9256) Make ATSv2 compilation default with hbase.profile=2.0

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9256:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Make ATSv2 compilation default with hbase.profile=2.0
> -
>
> Key: YARN-9256
> URL: https://issues.apache.org/jira/browse/YARN-9256
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9256.01.patch, YARN-9256.02.patch, 
> YARN-9256.03.patch
>
>
> By default Hadoop compiles with hbase.profile one which corresponds to 
> hbase.version=1.4 for ATSv2. Change compilation to hbase.profile=2.0 by 
> default in trunk. 
> This JIRA is to discuss for any concerns. 
> cc:/ [~vrushalic]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8940) [CSI] Add volume as a top-level attribute in service spec

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802672#comment-17802672
 ] 

Shilun Fan commented on YARN-8940:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> [CSI] Add volume as a top-level attribute in service spec 
> --
>
> Key: YARN-8940
> URL: https://issues.apache.org/jira/browse/YARN-8940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: CSI
>
> Initial thought:
> {noformat}
> {
>   "name": "volume example",
>   "version": "1.0.0",
>   "description": "a volume simple example",
>   "components" :
> [
>   {
> "name": "",
> "number_of_containers": 1,
> "artifact": {
>   "id": "docker.io/centos:latest",
>   "type": "DOCKER"
> },
> "launch_command": "sleep,120",
> "configuration": {
>   "env": {
> "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
>   }
> },
> "resource": {
>   "cpus": 1,
>   "memory": "256",
> },
> "volumes": [
>   {
> "volume" : {
>   "type": "s3_csi",
>   "id": "5504d4a8-b246-11e8-94c2-026b17aa1190",
>   "capability" : {
> "min": "5Gi",
> "max": "100Gi"
>   },
>   "source_path": "s3://my_bucket/my", # optional for object stores
>   "mount_path": "/mnt/data", # required, the mount point in 
> docker container
>   "access_mode": "SINGLE_READ", # how the volume can be accessed
> }
>   }
> ]
>   }
> }
>   ]
> }
> {noformat}
> Open for discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8928) TestRMAdminService is failing

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8928:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> TestRMAdminService is failing
> -
>
> Key: YARN-8928
> URL: https://issues.apache.org/jira/browse/YARN-8928
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Jason Darrell Lowe
>Assignee: David Mollitor
>Priority: Major
> Attachments: YARN-8928.1.patch, YARN-8928.2.patch, YARN-8928.3.patch
>
>
> After HADOOP-15836 TestRMAdminService has started failing consistently.  
> Sample stacktraces to follow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8940) [CSI] Add volume as a top-level attribute in service spec

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8940:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> [CSI] Add volume as a top-level attribute in service spec 
> --
>
> Key: YARN-8940
> URL: https://issues.apache.org/jira/browse/YARN-8940
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: CSI
>
> Initial thought:
> {noformat}
> {
>   "name": "volume example",
>   "version": "1.0.0",
>   "description": "a volume simple example",
>   "components" :
> [
>   {
> "name": "",
> "number_of_containers": 1,
> "artifact": {
>   "id": "docker.io/centos:latest",
>   "type": "DOCKER"
> },
> "launch_command": "sleep,120",
> "configuration": {
>   "env": {
> "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
>   }
> },
> "resource": {
>   "cpus": 1,
>   "memory": "256",
> },
> "volumes": [
>   {
> "volume" : {
>   "type": "s3_csi",
>   "id": "5504d4a8-b246-11e8-94c2-026b17aa1190",
>   "capability" : {
> "min": "5Gi",
> "max": "100Gi"
>   },
>   "source_path": "s3://my_bucket/my", # optional for object stores
>   "mount_path": "/mnt/data", # required, the mount point in 
> docker container
>   "access_mode": "SINGLE_READ", # how the volume can be accessed
> }
>   }
> ]
>   }
> }
>   ]
> }
> {noformat}
> Open for discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9256) Make ATSv2 compilation default with hbase.profile=2.0

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802671#comment-17802671
 ] 

Shilun Fan commented on YARN-9256:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Make ATSv2 compilation default with hbase.profile=2.0
> -
>
> Key: YARN-9256
> URL: https://issues.apache.org/jira/browse/YARN-9256
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9256.01.patch, YARN-9256.02.patch, 
> YARN-9256.03.patch
>
>
> By default Hadoop compiles with hbase.profile one which corresponds to 
> hbase.version=1.4 for ATSv2. Change compilation to hbase.profile=2.0 by 
> default in trunk. 
> This JIRA is to discuss for any concerns. 
> cc:/ [~vrushalic]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9637) Make SLS wrapper class name configurable

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802668#comment-17802668
 ] 

Shilun Fan commented on YARN-9637:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Make SLS wrapper class name configurable
> 
>
> Key: YARN-9637
> URL: https://issues.apache.org/jira/browse/YARN-9637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.3.0
>Reporter: Erkin Alp Güney
>Assignee: Adam Antal
>Priority: Major
>  Labels: configuration-addition
> Attachments: YARN-9637.001.patch
>
>
> SLS currently has hardcoded lookup on which scheduler wrapper to load based 
> on scheduler, and it only knows about Fair and Capacity schedulers. Making it 
> configurable will accelerate development of new pluggable YARN schedulers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9415) Document FS placement rule changes from YARN-8967

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9415:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Document FS placement rule changes from YARN-8967
> -
>
> Key: YARN-9415
> URL: https://issues.apache.org/jira/browse/YARN-9415
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> With the changes introduced by YARN-8967 we now allow parent rules on all 
> existing rules. This should be documented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9490) applicationresourceusagereport return wrong number of reserved containers

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9490:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> applicationresourceusagereport return wrong number of reserved containers
> -
>
> Key: YARN-9490
> URL: https://issues.apache.org/jira/browse/YARN-9490
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: yanbing zhang
>Assignee: yanbing zhang
>Priority: Minor
> Attachments: YARN-9490.002.patch, YARN-9490.patch, 
> YARN-9490.patch1.patch
>
>
> when getting an ApplicationResourceUsageReport instance from the class of 
> SchedulerApplicationAttempt, I found the input constructor 
> parameter(reservedContainers.size()) is wrong.  because the type of this 
> variable is Map>, so 
> "reservedContainer.size()" is not the number of containers, but the number of 
> SchedulerRequestKey.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9637) Make SLS wrapper class name configurable

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9637:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Make SLS wrapper class name configurable
> 
>
> Key: YARN-9637
> URL: https://issues.apache.org/jira/browse/YARN-9637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.3.0
>Reporter: Erkin Alp Güney
>Assignee: Adam Antal
>Priority: Major
>  Labels: configuration-addition
> Attachments: YARN-9637.001.patch
>
>
> SLS currently has hardcoded lookup on which scheduler wrapper to load based 
> on scheduler, and it only knows about Fair and Capacity schedulers. Making it 
> configurable will accelerate development of new pluggable YARN schedulers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9652) Convert SchedulerQueueManager from a protocol-only type to a basic hierarchical queue implementation

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9652:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Convert SchedulerQueueManager from a protocol-only type to a basic 
> hierarchical queue implementation
> 
>
> Key: YARN-9652
> URL: https://issues.apache.org/jira/browse/YARN-9652
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: reservation system, scheduler
>Affects Versions: 3.3.0
>Reporter: Erkin Alp Güney
>Priority: Major
>
> SchedulerQueueManager is currently an interface aka a protocol-only type. As 
> seen in the codebase, each scheduler implements the queue configuration and 
> management stuff over and over. If we convert it into a base concrete class 
> with simple implementation of hierarchical queue system (as in Fair and 
> Capacity schedulers), pluggable schedulers may be developed more easily.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9675) Expose log aggregation diagnostic messages through RM API

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802665#comment-17802665
 ] 

Shilun Fan commented on YARN-9675:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Expose log aggregation diagnostic messages through RM API
> -
>
> Key: YARN-9675
> URL: https://issues.apache.org/jira/browse/YARN-9675
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, log-aggregation, resourcemanager
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> The ResourceManager collects the log aggregation status reports from the 
> NodeManagers. Currently these reports are collected, but when app info API or 
> similar high-level REST is called, only an overall status is displayed 
> (RUNNING, RUNNING_WITH_FAILURES,FAILED etc.). 
> The diagnostic messages are only available through the old RM web UI, so our 
> internal tool currently crawls that page and extract the log aggregation 
> diagnostic and error messages from the raw HTML. This is not a good practice, 
> and more elegant API call may be preferable. It may be useful for others as 
> well since log aggregation related failures are usually hard to debug since 
> the lack of trace/debug messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9741) [JDK11] TestAHSWebServices.testAbout fails

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9741:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> [JDK11] TestAHSWebServices.testAbout fails
> --
>
> Key: YARN-9741
> URL: https://issues.apache.org/jira/browse/YARN-9741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> On openjdk-11.0.2 TestAHSWebServices.testAbout[0] fails consistently with the 
> following stack trace:
> {noformat}
> [ERROR] Tests run: 40, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 7.9 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
> [ERROR] 
> testAbout[0](org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices)
>   Time elapsed: 0.241 s  <<< FAILURE!
> org.junit.ComparisonFailure: expected: but 
> was:
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.testAbout(TestAHSWebServices.java:333)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9675) Expose log aggregation diagnostic messages through RM API

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9675:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Expose log aggregation diagnostic messages through RM API
> -
>
> Key: YARN-9675
> URL: https://issues.apache.org/jira/browse/YARN-9675
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, log-aggregation, resourcemanager
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> The ResourceManager collects the log aggregation status reports from the 
> NodeManagers. Currently these reports are collected, but when app info API or 
> similar high-level REST is called, only an overall status is displayed 
> (RUNNING, RUNNING_WITH_FAILURES,FAILED etc.). 
> The diagnostic messages are only available through the old RM web UI, so our 
> internal tool currently crawls that page and extract the log aggregation 
> diagnostic and error messages from the raw HTML. This is not a good practice, 
> and more elegant API call may be preferable. It may be useful for others as 
> well since log aggregation related failures are usually hard to debug since 
> the lack of trace/debug messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9741) [JDK11] TestAHSWebServices.testAbout fails

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802664#comment-17802664
 ] 

Shilun Fan commented on YARN-9741:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> [JDK11] TestAHSWebServices.testAbout fails
> --
>
> Key: YARN-9741
> URL: https://issues.apache.org/jira/browse/YARN-9741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> On openjdk-11.0.2 TestAHSWebServices.testAbout[0] fails consistently with the 
> following stack trace:
> {noformat}
> [ERROR] Tests run: 40, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 7.9 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
> [ERROR] 
> testAbout[0](org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices)
>   Time elapsed: 0.241 s  <<< FAILURE!
> org.junit.ComparisonFailure: expected: but 
> was:
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices.testAbout(TestAHSWebServices.java:333)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9807) ContainerAllocator re-creates RMContainer instance when allocate for ReservedContainer

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9807:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> ContainerAllocator re-creates RMContainer instance when allocate for 
> ReservedContainer
> --
>
> Key: YARN-9807
> URL: https://issues.apache.org/jira/browse/YARN-9807
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: YARN-9807.01.patch, YARN-9807.branch-2.01.patch
>
>
> The ContainerAllocator re-creates the RMContainer instance when it is 
> allocated to the ReservedContainer. This will cause the RMContainer to lose 
> information from NEW to RESERVED.
> {panel:title=RM Log}
> 2019-08-28 18:42:30,320 [10645451] - INFO [SchedulerEventDispatcher:Event 
> Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 
> Container Transitioned from NEW to RESERVED
> 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event 
> Processor:AbstractContainerAllocator@126] - assignedContainer application 
> attempt=appattempt_1566978597856_2831_01 
> container=container_e47_1566978597856_2831_01_07 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@69f543b5
>  clusterResource= type=NODE_LOCAL 
> requestedPartition=label_ndir_2
> 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event 
> Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 
> Container Transitioned from NEW to ALLOCATED
> {panel}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9708) Yarn Router Support DelegationToken

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9708:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Yarn Router Support DelegationToken
> ---
>
> Key: YARN-9708
> URL: https://issues.apache.org/jira/browse/YARN-9708
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: router
>Affects Versions: 3.4.0
>Reporter: Xie YiFan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, 
> RMDelegationTokenSecretManager_storeNewMasterKey.svg, 
> RouterDelegationTokenSecretManager_storeNewMasterKey.svg
>
>
> 1.we use router as proxy to manage multiple cluster which be independent of 
> each other in order to apply unified client. Thus, we implement our 
> customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other 
> cluster.
> 2.Our production environment need kerberos. But router doesn't support 
> SecureLogin for now.
> https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we 
> improvement it.
> 3.Some framework like oozie would get Token via yarnclient#getDelegationToken 
> which router doesn't support. Our solution is that adding homeCluster to 
> ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would 
> be submitted with specified clusterid so that router knows which cluster to 
> submit this job. Router would get Token from one RM according to specified 
> clusterid when client call getDelegation meanwhile apply some mechanism to 
> save this token in memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9807) ContainerAllocator re-creates RMContainer instance when allocate for ReservedContainer

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802663#comment-17802663
 ] 

Shilun Fan commented on YARN-9807:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> ContainerAllocator re-creates RMContainer instance when allocate for 
> ReservedContainer
> --
>
> Key: YARN-9807
> URL: https://issues.apache.org/jira/browse/YARN-9807
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: YARN-9807.01.patch, YARN-9807.branch-2.01.patch
>
>
> The ContainerAllocator re-creates the RMContainer instance when it is 
> allocated to the ReservedContainer. This will cause the RMContainer to lose 
> information from NEW to RESERVED.
> {panel:title=RM Log}
> 2019-08-28 18:42:30,320 [10645451] - INFO [SchedulerEventDispatcher:Event 
> Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 
> Container Transitioned from NEW to RESERVED
> 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event 
> Processor:AbstractContainerAllocator@126] - assignedContainer application 
> attempt=appattempt_1566978597856_2831_01 
> container=container_e47_1566978597856_2831_01_07 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@69f543b5
>  clusterResource= type=NODE_LOCAL 
> requestedPartition=label_ndir_2
> 2019-08-28 18:42:38,055 [10653186] - INFO [SchedulerEventDispatcher:Event 
> Processor:RMContainerImpl@486] - container_e47_1566978597856_2831_01_07 
> Container Transitioned from NEW to ALLOCATED
> {panel}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9852) Allow multiple MiniYarnCluster to be used

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9852:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Allow multiple MiniYarnCluster to be used
> -
>
> Key: YARN-9852
> URL: https://issues.apache.org/jira/browse/YARN-9852
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Priority: Major
>
> During implementing new HBase replication tests we observed that there are 
> problems in the communication between multiple MiniYarnCluster in one test 
> suite. I haven't seen any testcase in the Hadoop repository that uses 
> multiple clusters in one test, but seems like a logical request to allow 
> this. 
> In case this jira does not involve any code change (it's just mainly a 
> configuration issue), then I suggest to add a testcase that would demonstrate 
> such a suitable configuration.
> Thanks for the consultation to [~bszabolcs] about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9856) Remove log-aggregation related duplicate function

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9856:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Remove log-aggregation related duplicate function
> -
>
> Key: YARN-9856
> URL: https://issues.apache.org/jira/browse/YARN-9856
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Szilard Nemeth
>Priority: Trivial
> Attachments: YARN-9856.001.patch, YARN-9856.002.patch
>
>
> [~snemeth] has noticed a duplication in two of the log-aggregation related 
> functions.
> {quote}I noticed duplicated code in 
> org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, 
> duplicated in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs.
>  [...]
> {quote}
> We should remove the duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9856) Remove log-aggregation related duplicate function

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802661#comment-17802661
 ] 

Shilun Fan commented on YARN-9856:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Remove log-aggregation related duplicate function
> -
>
> Key: YARN-9856
> URL: https://issues.apache.org/jira/browse/YARN-9856
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Szilard Nemeth
>Priority: Trivial
> Attachments: YARN-9856.001.patch, YARN-9856.002.patch
>
>
> [~snemeth] has noticed a duplication in two of the log-aggregation related 
> functions.
> {quote}I noticed duplicated code in 
> org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, 
> duplicated in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs.
>  [...]
> {quote}
> We should remove the duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9852) Allow multiple MiniYarnCluster to be used

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802662#comment-17802662
 ] 

Shilun Fan commented on YARN-9852:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Allow multiple MiniYarnCluster to be used
> -
>
> Key: YARN-9852
> URL: https://issues.apache.org/jira/browse/YARN-9852
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: test
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Priority: Major
>
> During implementing new HBase replication tests we observed that there are 
> problems in the communication between multiple MiniYarnCluster in one test 
> suite. I haven't seen any testcase in the Hadoop repository that uses 
> multiple clusters in one test, but seems like a logical request to allow 
> this. 
> In case this jira does not involve any code change (it's just mainly a 
> configuration issue), then I suggest to add a testcase that would demonstrate 
> such a suitable configuration.
> Thanks for the consultation to [~bszabolcs] about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10032) Implement regex querying of logs

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10032:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Implement regex querying of logs
> 
>
> Key: YARN-10032
> URL: https://issues.apache.org/jira/browse/YARN-10032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> After YARN-10031, we have query parameters to the log servlet's GET endpoint.
> To demonstrate the new capabilities of the log servlet and how easy it will 
> be to add a functionality to all log servlets at the same time: let's add the 
> ability to search in the aggregated logs with a given regex.
> A conceptual use case:
> User run several MR jobs daily, but some of them fail to localize a 
> particular resource at first. We want to search in the logs of these Yarn 
> applications, and extract some data from them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10025) Various improvements in YARN log servlets

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10025:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Various improvements in YARN log servlets
> -
>
> Key: YARN-10025
> URL: https://issues.apache.org/jira/browse/YARN-10025
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10025 document.pdf
>
>
> There are multiple ways how we can enhance the current log servlets in YARN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10032) Implement regex querying of logs

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802658#comment-17802658
 ] 

Shilun Fan commented on YARN-10032:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Implement regex querying of logs
> 
>
> Key: YARN-10032
> URL: https://issues.apache.org/jira/browse/YARN-10032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> After YARN-10031, we have query parameters to the log servlet's GET endpoint.
> To demonstrate the new capabilities of the log servlet and how easy it will 
> be to add a functionality to all log servlets at the same time: let's add the 
> ability to search in the aggregated logs with a given regex.
> A conceptual use case:
> User run several MR jobs daily, but some of them fail to localize a 
> particular resource at first. We want to search in the logs of these Yarn 
> applications, and extract some data from them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10406) YARN log processor

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802653#comment-17802653
 ] 

Shilun Fan commented on YARN-10406:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> YARN log processor
> --
>
> Key: YARN-10406
> URL: https://issues.apache.org/jira/browse/YARN-10406
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Hudáky Márton Gyula
>Priority: Critical
>
> YARN currently does not have any utility that would enable cluster 
> administrators to display previous actions in a Hadoop YARN cluster in an 
> offline fashion. 
> HDFS has the 
> [OIV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html]/
>  
> [OEV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html]
>  which does not require a running cluster to look and modify the filesystem. 
> A corresponding tool would be very helpful in the context of YARN.
> Since ATS is not widespread (is not available for older clusters) and there 
> isn't a single file or entity that would collect all the 
> application/container etc. related information, we thought our best option to 
> parse and process the output of the YARN daemon log files and reconstruct the 
> history of the cluster from that. We designed and implemented a CLI based 
> solution that after parsing the log file enables users to query app/container 
> related information (listing, filtering by certain properties) and search for 
> common errors like CE failures/error codes, AM preemption or stack traces. 
> The tool can be integrated into the YARN project as a sub-project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802657#comment-17802657
 ] 

Shilun Fan commented on YARN-10050:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10065) Support Placement Constraints for AM container allocations

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10065:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Support Placement Constraints for AM container allocations
> --
>
> Key: YARN-10065
> URL: https://issues.apache.org/jira/browse/YARN-10065
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Daniel Velasquez
>Priority: Major
>
> Currently ApplicationSubmissionContext API supports specifying a node label 
> expression for the AM resource request. It would be beneficial to have the 
> ability to specify Placement Constraints as well for the AM resource request. 
> We have a requirement to constrain AM containers on certain nodes e.g. AM 
> containers not on preemptible/spot cloud instances. It looks like node 
> attributes would fit our use case well. However, we currently don't have the 
> ability to specify this in the API for AM resource requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10065) Support Placement Constraints for AM container allocations

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802655#comment-17802655
 ] 

Shilun Fan commented on YARN-10065:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Support Placement Constraints for AM container allocations
> --
>
> Key: YARN-10065
> URL: https://issues.apache.org/jira/browse/YARN-10065
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Daniel Velasquez
>Priority: Major
>
> Currently ApplicationSubmissionContext API supports specifying a node label 
> expression for the AM resource request. It would be beneficial to have the 
> ability to specify Placement Constraints as well for the AM resource request. 
> We have a requirement to constrain AM containers on certain nodes e.g. AM 
> containers not on preemptible/spot cloud instances. It looks like node 
> attributes would fit our use case well. However, we currently don't have the 
> ability to specify this in the API for AM resource requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10243) Rack-only localization constraint for MR AM is broken for CapacityScheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10243:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Rack-only localization constraint for MR AM is broken for CapacityScheduler
> ---
>
> Key: YARN-10243
> URL: https://issues.apache.org/jira/browse/YARN-10243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Bilwa S T
>Priority: Major
>
> Reproduction: Start a MR sleep job with strict-locality configured for AM 
> ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If 
> CapacityScheduler is used, the job will hang (stuck in SCHEDULED state). 
> Root cause: if there are no other resources requested (like node locality or 
> other constraint), the scheduling opportunities counter will not be 
> incremented and the following piece of code always returns false (so we 
> always skip this constraint) resulting in an infinite loop:
> {code:java}
> // If we are here, we do need containers on this rack for RACK_LOCAL req
> if (type == NodeType.RACK_LOCAL) {
>   // 'Delay' rack-local just a little bit...
>   long missedOpportunities =
>   application.getSchedulingOpportunities(schedulerKey);
>   return getActualNodeLocalityDelay() < missedOpportunities;
> }
> {code}
> Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to 
> enforce this rule to be processed immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10243) Rack-only localization constraint for MR AM is broken for CapacityScheduler

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802654#comment-17802654
 ] 

Shilun Fan commented on YARN-10243:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Rack-only localization constraint for MR AM is broken for CapacityScheduler
> ---
>
> Key: YARN-10243
> URL: https://issues.apache.org/jira/browse/YARN-10243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Bilwa S T
>Priority: Major
>
> Reproduction: Start a MR sleep job with strict-locality configured for AM 
> ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If 
> CapacityScheduler is used, the job will hang (stuck in SCHEDULED state). 
> Root cause: if there are no other resources requested (like node locality or 
> other constraint), the scheduling opportunities counter will not be 
> incremented and the following piece of code always returns false (so we 
> always skip this constraint) resulting in an infinite loop:
> {code:java}
> // If we are here, we do need containers on this rack for RACK_LOCAL req
> if (type == NodeType.RACK_LOCAL) {
>   // 'Delay' rack-local just a little bit...
>   long missedOpportunities =
>   application.getSchedulingOpportunities(schedulerKey);
>   return getActualNodeLocalityDelay() < missedOpportunities;
> }
> {code}
> Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to 
> enforce this rule to be processed immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10050:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> NodeManagerCGroupsMemory.md does not show up in the official documentation
> --
>
> Key: YARN-10050
> URL: https://issues.apache.org/jira/browse/YARN-10050
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Miklos Szegedi
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: YARN-10050.001.patch
>
>
> I looked at this doc:
> [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]
> It does not show up here:
> [https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10059) Final states of failed-to-localize containers are not recorded in NM state store

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10059:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Final states of failed-to-localize containers are not recorded in NM state 
> store
> 
>
> Key: YARN-10059
> URL: https://issues.apache.org/jira/browse/YARN-10059
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-10059.001.patch
>
>
> Currently we found an issue that many localizers of completed containers were 
> launched and exhausted memory/cpu of that machine after NM restarted, these 
> containers were all failed and completed when localizing on a non-existed 
> local directory which is caused by another problem, but their final states 
> weren't recorded in NM state store.
>  The process flow of a fail-to-localize container is as follow:
> {noformat}
> ResourceLocalizationService$LocalizerRunner#run
> -> ContainerImpl$ResourceFailedTransition#transition handle LOCALIZING -> 
> LOCALIZATION_FAILED upon RESOURCE_FAILED
>   dispatch LocalizationEventType.CLEANUP_CONTAINER_RESOURCES
>   -> ResourceLocalizationService#handleCleanupContainerResources  handle 
> CLEANUP_CONTAINER_RESOURCES
>   dispatch ContainerEventType.CONTAINER_RESOURCES_CLEANEDUP
>   -> ContainerImpl$LocalizationFailedToDoneTransition#transition  
> handle LOCALIZATION_FAILED -> DONE upon CONTAINER_RESOURCES_CLEANEDUP
> {noformat}
> There's no update for state store in this flow now, which is required to 
> avoid unnecessary localizations after NM restarts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10406) YARN log processor

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10406:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> YARN log processor
> --
>
> Key: YARN-10406
> URL: https://issues.apache.org/jira/browse/YARN-10406
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Hudáky Márton Gyula
>Priority: Critical
>
> YARN currently does not have any utility that would enable cluster 
> administrators to display previous actions in a Hadoop YARN cluster in an 
> offline fashion. 
> HDFS has the 
> [OIV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html]/
>  
> [OEV|https://hadoop.apache.org/docs/current3/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html]
>  which does not require a running cluster to look and modify the filesystem. 
> A corresponding tool would be very helpful in the context of YARN.
> Since ATS is not widespread (is not available for older clusters) and there 
> isn't a single file or entity that would collect all the 
> application/container etc. related information, we thought our best option to 
> parse and process the output of the YARN daemon log files and reconstruct the 
> history of the cluster from that. We designed and implemented a CLI based 
> solution that after parsing the log file enables users to query app/container 
> related information (listing, filtering by certain properties) and search for 
> common errors like CE failures/error codes, AM preemption or stack traces. 
> The tool can be integrated into the YARN project as a sub-project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10474) [JDK 12] TestAsyncDispatcher fails

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802652#comment-17802652
 ] 

Shilun Fan commented on YARN-10474:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> [JDK 12] TestAsyncDispatcher fails
> --
>
> Key: YARN-10474
> URL: https://issues.apache.org/jira/browse/YARN-10474
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Priority: Major
>
> Similar to HDFS-15580. Updating a final variable via reflection is not 
> allowed in Java 12+.
> {noformat}
> [INFO] Running org.apache.hadoop.yarn.event.TestAsyncDispatcher
> [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.953 
> s <<< FAILURE! - in org.apache.hadoop.yarn.event.TestAsyncDispatcher
> [ERROR] 
> testPrintDispatcherEventDetails(org.apache.hadoop.yarn.event.TestAsyncDispatcher)
>   Time elapsed: 0.114 s  <<< ERROR!
> java.lang.NoSuchFieldException: modifiers
>   at java.base/java.lang.Class.getDeclaredField(Class.java:2569)
>   at 
> org.apache.hadoop.yarn.event.TestAsyncDispatcher.testPrintDispatcherEventDetails(TestAsyncDispatcher.java:152)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10474) [JDK 12] TestAsyncDispatcher fails

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10474:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> [JDK 12] TestAsyncDispatcher fails
> --
>
> Key: YARN-10474
> URL: https://issues.apache.org/jira/browse/YARN-10474
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Priority: Major
>
> Similar to HDFS-15580. Updating a final variable via reflection is not 
> allowed in Java 12+.
> {noformat}
> [INFO] Running org.apache.hadoop.yarn.event.TestAsyncDispatcher
> [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.953 
> s <<< FAILURE! - in org.apache.hadoop.yarn.event.TestAsyncDispatcher
> [ERROR] 
> testPrintDispatcherEventDetails(org.apache.hadoop.yarn.event.TestAsyncDispatcher)
>   Time elapsed: 0.114 s  <<< ERROR!
> java.lang.NoSuchFieldException: modifiers
>   at java.base/java.lang.Class.getDeclaredField(Class.java:2569)
>   at 
> org.apache.hadoop.yarn.event.TestAsyncDispatcher.testPrintDispatcherEventDetails(TestAsyncDispatcher.java:152)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802650#comment-17802650
 ] 

Shilun Fan commented on YARN-10546:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Limit application resource reservation on nodes for non-node/rack specific 
> requests shoud be supported in CS.
> -
>
> Key: YARN-10546
> URL: https://issues.apache.org/jira/browse/YARN-10546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Just as fixed in YARN-4270  about FairScheduler.
> The capacityScheduler should also fixed it.
> It is a big problem in production cluster, when it happended.
> Also we should support fs convert to cs to support it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10546:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Limit application resource reservation on nodes for non-node/rack specific 
> requests shoud be supported in CS.
> -
>
> Key: YARN-10546
> URL: https://issues.apache.org/jira/browse/YARN-10546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Just as fixed in YARN-4270  about FairScheduler.
> The capacityScheduler should also fixed it.
> It is a big problem in production cluster, when it happended.
> Also we should support fs convert to cs to support it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10594) Split the debug log when execute privileged operation

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10594:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Split the debug log when execute privileged operation
> -
>
> Key: YARN-10594
> URL: https://issues.apache.org/jira/browse/YARN-10594
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
>
> Before execute *exec.execute();* statement should print the command log 
> rather than after.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10514) Introduce a dominant resource based schedule policy to increase the resource utilization, avoid heavy cluster resource fragments.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10514:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Introduce a dominant resource based schedule policy to increase the resource 
> utilization, avoid heavy cluster resource fragments.
> -
>
> Key: YARN-10514
> URL: https://issues.apache.org/jira/browse/YARN-10514
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10514.001.patch
>
>
> When we schedule in multi node lookup policy for async scheduling, or just 
> use heartbeat update based scheduling, we both meet scheduling fragments. 
> When cpu-intensive jobs or gpu-intensive or memory-intensive etc, the cluster 
> will meet heavy waste of resources, so this issue will help to move scheduler 
> support dominant resource based schedule, to help our cluster get better 
> resource utilization, also in order to load balance nodemanager resource 
> distribution.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10514) Introduce a dominant resource based schedule policy to increase the resource utilization, avoid heavy cluster resource fragments.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802651#comment-17802651
 ] 

Shilun Fan commented on YARN-10514:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Introduce a dominant resource based schedule policy to increase the resource 
> utilization, avoid heavy cluster resource fragments.
> -
>
> Key: YARN-10514
> URL: https://issues.apache.org/jira/browse/YARN-10514
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10514.001.patch
>
>
> When we schedule in multi node lookup policy for async scheduling, or just 
> use heartbeat update based scheduling, we both meet scheduling fragments. 
> When cpu-intensive jobs or gpu-intensive or memory-intensive etc, the cluster 
> will meet heavy waste of resources, so this issue will help to move scheduler 
> support dominant resource based schedule, to help our cluster get better 
> resource utilization, also in order to load balance nodemanager resource 
> distribution.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10594) Split the debug log when execute privileged operation

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802649#comment-17802649
 ] 

Shilun Fan commented on YARN-10594:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Split the debug log when execute privileged operation
> -
>
> Key: YARN-10594
> URL: https://issues.apache.org/jira/browse/YARN-10594
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
>
> Before execute *exec.execute();* statement should print the command log 
> rather than after.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10608) Extend yarn.nodemanager.delete.debug-delay-sec to support application level.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802648#comment-17802648
 ] 

Shilun Fan commented on YARN-10608:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Extend yarn.nodemanager.delete.debug-delay-sec to support application level.
> 
>
> Key: YARN-10608
> URL: https://issues.apache.org/jira/browse/YARN-10608
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> Now the yarn.nodemanager.delete.debug-delay-sec is a cluster level setting.
> In our busy production cluster, we set it to 0 default for preventing local 
> log boom.
> But when we need deep into some spark/MR etc jobs errors such as core dump, i 
> advice to support enable a job level setting for delay of deletion for local 
> logs for reproduce the error.
>  
> [~wangda] [~tangzhankun]  [~xgong] [~epayne]
> If you any advice about this support?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10608) Extend yarn.nodemanager.delete.debug-delay-sec to support application level.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10608:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Extend yarn.nodemanager.delete.debug-delay-sec to support application level.
> 
>
> Key: YARN-10608
> URL: https://issues.apache.org/jira/browse/YARN-10608
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> Now the yarn.nodemanager.delete.debug-delay-sec is a cluster level setting.
> In our busy production cluster, we set it to 0 default for preventing local 
> log boom.
> But when we need deep into some spark/MR etc jobs errors such as core dump, i 
> advice to support enable a job level setting for delay of deletion for local 
> logs for reproduce the error.
>  
> [~wangda] [~tangzhankun]  [~xgong] [~epayne]
> If you any advice about this support?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10690) GPU related improvement for better usage.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10690:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> GPU related improvement for better usage.
> -
>
> Key: YARN-10690
> URL: https://issues.apache.org/jira/browse/YARN-10690
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> This Jira will improve GPU for better usage.
>  cc [~bibinchundatt] [~pbacsko] [~ebadger] [~ztang]  [~epayne] [~gandras]  
> [~bteke]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10764) Add rm dispatcher event metrics in SLS

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10764:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Add rm dispatcher event metrics in SLS 
> ---
>
> Key: YARN-10764
> URL: https://issues.apache.org/jira/browse/YARN-10764
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler-load-simulator
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> We should use SLS to confirm if we can get performance improvement of event 
> consume time etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10772) Stable API GetApplicationsRequest#newInstance compatibility broken by YARN-8363

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10772:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Stable API GetApplicationsRequest#newInstance compatibility broken by 
> YARN-8363
> ---
>
> Key: YARN-10772
> URL: https://issues.apache.org/jira/browse/YARN-10772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.2.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> YARN-8363 migrated our usage of commons-lang to commons-lang3 in 3.2.0.
>  
> Unfortunately, it changed the API signature of 
> {code:java}
> /**
>  * 
>  * The request from clients to get a report of Applications matching the
>  * giving application types in the cluster from the
>  * ResourceManager.
>  * 
>  *
>  * @see ApplicationClientProtocol#getApplications(GetApplicationsRequest)
>  *
>  * Setting any of the parameters to null, would just disable that
>  * filter
>  *
>  * @param scope {@link ApplicationsRequestScope} to filter by
>  * @param users list of users to filter by
>  * @param queues list of scheduler queues to filter by
>  * @param applicationTypes types of applications
>  * @param applicationTags application tags to filter by
>  * @param applicationStates application states to filter by
>  * @param startRange range of application start times to filter by
>  * @param finishRange range of application finish times to filter by
>  * @param limit number of applications to limit to
>  * @return {@link GetApplicationsRequest} to be used with
>  * {@link ApplicationClientProtocol#getApplications(GetApplicationsRequest)}
>  */
> @Public
> @Stable
> public static GetApplicationsRequest newInstance(
> ApplicationsRequestScope scope,
> Set users,
> Set queues,
> Set applicationTypes,
> Set applicationTags,
> EnumSet applicationStates,
> Range startRange,
> Range finishRange,
> Long limit) { {code}
> The startRange and finishRange changed type from LongRange to Range.
> It could cause problems when migrating applications, for example, from Hadoop 
> 3.1 to 3.3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10738:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> When multi thread scheduling with multi node, we should shuffle with a gap to 
> prevent hot accessing nodes.
> --
>
> Key: YARN-10738
> URL: https://issues.apache.org/jira/browse/YARN-10738
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the 
> abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort 
> policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802645#comment-17802645
 ] 

Shilun Fan commented on YARN-10738:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> When multi thread scheduling with multi node, we should shuffle with a gap to 
> prevent hot accessing nodes.
> --
>
> Key: YARN-10738
> URL: https://issues.apache.org/jira/browse/YARN-10738
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the 
> abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort 
> policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10772) Stable API GetApplicationsRequest#newInstance compatibility broken by YARN-8363

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802643#comment-17802643
 ] 

Shilun Fan commented on YARN-10772:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Stable API GetApplicationsRequest#newInstance compatibility broken by 
> YARN-8363
> ---
>
> Key: YARN-10772
> URL: https://issues.apache.org/jira/browse/YARN-10772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.2.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> YARN-8363 migrated our usage of commons-lang to commons-lang3 in 3.2.0.
>  
> Unfortunately, it changed the API signature of 
> {code:java}
> /**
>  * 
>  * The request from clients to get a report of Applications matching the
>  * giving application types in the cluster from the
>  * ResourceManager.
>  * 
>  *
>  * @see ApplicationClientProtocol#getApplications(GetApplicationsRequest)
>  *
>  * Setting any of the parameters to null, would just disable that
>  * filter
>  *
>  * @param scope {@link ApplicationsRequestScope} to filter by
>  * @param users list of users to filter by
>  * @param queues list of scheduler queues to filter by
>  * @param applicationTypes types of applications
>  * @param applicationTags application tags to filter by
>  * @param applicationStates application states to filter by
>  * @param startRange range of application start times to filter by
>  * @param finishRange range of application finish times to filter by
>  * @param limit number of applications to limit to
>  * @return {@link GetApplicationsRequest} to be used with
>  * {@link ApplicationClientProtocol#getApplications(GetApplicationsRequest)}
>  */
> @Public
> @Stable
> public static GetApplicationsRequest newInstance(
> ApplicationsRequestScope scope,
> Set users,
> Set queues,
> Set applicationTypes,
> Set applicationTags,
> EnumSet applicationStates,
> Range startRange,
> Range finishRange,
> Long limit) { {code}
> The startRange and finishRange changed type from LongRange to Range.
> It could cause problems when migrating applications, for example, from Hadoop 
> 3.1 to 3.3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11054) Alleviate LocalJobRunnerMetricName Conflicts

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802642#comment-17802642
 ] 

Shilun Fan commented on YARN-11054:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Alleviate LocalJobRunnerMetricName Conflicts
> 
>
> Key: YARN-11054
> URL: https://issues.apache.org/jira/browse/YARN-11054
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Xingjun Hao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some scenarios, Sqoop will use LocalJobRuner (YarnLocal mode) to run a lot 
> of jobs,  assuming  2 million jobs have been run, LocalJobRunner MetricName 
> generated by nextInt function is in the range of (0, 2147483647),
> Then the probability of conflict is about 2000/2147483647 = 1/1000, which 
> means that an average of 1 task will fail for every 1000 jobs run.
> If LocalJobRunner MetricName is generated by nextLong() whose range is (0, 
> 9223372036854775807), considering that Long's range is 1 billion times that 
> of Int, the probability of a new MetricName conflicts is also reduced by one 
> trillionth times.
> the probability of conflict also goes to 1/5 from 1/1000. ( under the 
> situation that there are about 200million jobs have been run)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11054) Alleviate LocalJobRunnerMetricName Conflicts

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11054:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Alleviate LocalJobRunnerMetricName Conflicts
> 
>
> Key: YARN-11054
> URL: https://issues.apache.org/jira/browse/YARN-11054
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Xingjun Hao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some scenarios, Sqoop will use LocalJobRuner (YarnLocal mode) to run a lot 
> of jobs,  assuming  2 million jobs have been run, LocalJobRunner MetricName 
> generated by nextInt function is in the range of (0, 2147483647),
> Then the probability of conflict is about 2000/2147483647 = 1/1000, which 
> means that an average of 1 task will fail for every 1000 jobs run.
> If LocalJobRunner MetricName is generated by nextLong() whose range is (0, 
> 9223372036854775807), considering that Long's range is 1 billion times that 
> of Int, the probability of a new MetricName conflicts is also reduced by one 
> trillionth times.
> the probability of conflict also goes to 1/5 from 1/1000. ( under the 
> situation that there are about 200million jobs have been run)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802641#comment-17802641
 ] 

Shilun Fan commented on YARN-11127:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Potential deadlock in AsyncDispatcher caused by RMNodeImpl, 
> SchedulerApplicationAttempt and RMAppImpl's lock contention.
> 
>
> Key: YARN-11127
> URL: https://issues.apache.org/jira/browse/YARN-11127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
> Attachments: rm-dead-lock.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> I found rm deadlock in our cluster. It's a low probability event. some 
> critical jstack information are below: 
> {code:java}
> "RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 
> waiting on condition [0x7f85dd00b000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x7f9389aab478> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>         - locked <0x7f88db78c5c8> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
>         at java.lang.Thread.run(Thread.java:748)
> "IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 
> tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x7f938976e818> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> 

[jira] [Updated] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11127:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Potential deadlock in AsyncDispatcher caused by RMNodeImpl, 
> SchedulerApplicationAttempt and RMAppImpl's lock contention.
> 
>
> Key: YARN-11127
> URL: https://issues.apache.org/jira/browse/YARN-11127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
> Attachments: rm-dead-lock.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> I found rm deadlock in our cluster. It's a low probability event. some 
> critical jstack information are below: 
> {code:java}
> "RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 
> waiting on condition [0x7f85dd00b000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x7f9389aab478> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>         - locked <0x7f88db78c5c8> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
>         at java.lang.Thread.run(Thread.java:748)
> "IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 
> tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x7f938976e818> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> 

[jira] [Updated] (YARN-11149) Add regression test cases for YARN-11073

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11149:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Add regression test cases for YARN-11073
> 
>
> Key: YARN-11149
> URL: https://issues.apache.org/jira/browse/YARN-11149
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: Akira Ajisaka
>Priority: Major
>
> Add regression test cases for YARN-11073



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11329) Refactor Router#startWepApp#setupSecurityAndFilters

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11329:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Refactor Router#startWepApp#setupSecurityAndFilters
> ---
>
> Key: YARN-11329
> URL: https://issues.apache.org/jira/browse/YARN-11329
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> When I read the code, I found that the Router uses 
> RMWebAppUtil#setupSecurityAndFilters of RM, which means that if the Router 
> Web wants to enable security-related functions, 
> it needs to set the parameters of the RM, which seems unreasonable. The 
> Router should have independent parameters to control Router Web security.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11321) Add RMWebServices#getSchedulerOverview

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11321:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Add RMWebServices#getSchedulerOverview
> --
>
> Key: YARN-11321
> URL: https://issues.apache.org/jira/browse/YARN-11321
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, resourcemanager, router
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> Add a new interface to display the basic information of scheduling, Such as
> ||Col||
> |1.Scheduler Type|
> |2.Scheduling Resource Type|
> |3.Minimum Allocation|
> |4.Maximum Allocation|
> |5.Maximum Cluster Application Priority|
> |6.Scheduler Busy %|
> |7.RM Dispatcher EventQueue Size|
> |8.Scheduler Dispatcher EventQueue Size|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11343) Improve FederationStateStore#ApplicationHomeSubCluster CRUD Methods

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11343:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Improve FederationStateStore#ApplicationHomeSubCluster CRUD Methods
> ---
>
> Key: YARN-11343
> URL: https://issues.apache.org/jira/browse/YARN-11343
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> Currently in MemoryFederationStateStore and ZookeeperFederationStateStore, we 
> only store the mapping relationship between Application and SubClusterId.
> In the future, we need more attributes, such as the create time of the 
> Application, the state of the Application, and the router information of the 
> Application. We need to refactor the ApplicationHomeSubCluster CRUD Methods 
> to support the direct use of the ApplicationHomeSubCluster object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11444) Improve YARN md documentation format

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11444:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Improve YARN md documentation format
> 
>
> Key: YARN-11444
> URL: https://issues.apache.org/jira/browse/YARN-11444
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> 1. Improve the table format to make the readability better
> 2. Modify some typo errors
> 3. Modify the list number to display correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11335) [Federation] Added JDOFederationStateStore for easy database query.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11335:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> [Federation] Added JDOFederationStateStore for easy database query.
> ---
>
> Key: YARN-11335
> URL: https://issues.apache.org/jira/browse/YARN-11335
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> [Federation] Added JDOFederationStateStore for easy database query.
> We currently have a solution like SQLFederationStateStore, but I think the 
> following places can be optimized:
> 1. We hope that the database can support Oracle and Postgres, but it is very 
> difficult to provide stored procedure scripts for each database. 
> It involves a lot of testing and multi-version verification. If possible, we 
> hope to provide table building statements.
> 2. I read the code and found that it is basically the operation of add, 
> delete, select and update a table. 
> This can be done for us using the ORM framework, such as DataNucleus.
> I plan to provide JDOFederationStateStore which can easily replace 
> SQLFederationStateStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11491) [Federation] Use ZookeeperFederationStateStore as the DefaultStateStore

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11491:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> [Federation] Use ZookeeperFederationStateStore as the DefaultStateStore
> ---
>
> Key: YARN-11491
> URL: https://issues.apache.org/jira/browse/YARN-11491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> We currently use MemoryStateStore as the default StateStore, in the 
> production environment, we should use ZookeeperFederationStateStore as the 
> default StateStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11458) Maven parallel build fails when Yarn UI v2 is enabled

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11458:
--
Target Version/s: 3.3.9, 3.5.0  (was: 3.4.0, 3.3.9)

> Maven parallel build fails when Yarn UI v2 is enabled
> -
>
> Key: YARN-11458
> URL: https://issues.apache.org/jira/browse/YARN-11458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn, yarn-ui-v2
>Affects Versions: 3.4.0, 3.3.5, 3.3.4, 3.3.9
> Environment: The problem occurs sporadically while using the Hadoop 
> development environment (Ubuntu)
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Critical
>
> Running a parallel build fails during assembly with the following error when 
> running either package or install:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-yarn) on 
> project hadoop-yarn-project: Failed to create assembly: Error creating 
> assembly archive hadoop-yarn-dist: 
> /workspace/hadoop-yarn-project/./hadoop-yarn/hadoop-yarn-ui/target/webapp/tmp/sass_compiler-input_base_path-iNYf9pEm.tmp/vendor/ember-qunit/ember-qunit.map
>  -> [Help 1]{code}
> This appears to be a race condition introduce when `-Pyarn-ui` is used 
> because the `hadoop-yarn-project` does not have a dependency listed for 
> `yarn-ui`.
> The command executed was:
> {code:java}
> $ mvn -nsu clean install -Pdist,native -DskipTests -Dtar 
> -Dmaven.javadoc.skip=true -T 2C {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11527) Upgrade node.js to 14.0.0 in YARN application catalog webapp

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11527:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> Upgrade node.js to 14.0.0 in YARN application catalog webapp
> 
>
> Key: YARN-11527
> URL: https://issues.apache.org/jira/browse/YARN-11527
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
> Environment: Upgrade node.js to 14.0.0 in YARN application catalog 
> webapp
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [INFO] yarn install v1.22.5
> [INFO] info No lockfile found.
> [INFO] [1/4] Resolving packages...
> [INFO] warning angular-route@1.6.10: For the actively supported Angular, see 
> https://www.npmjs.com/package/@angular/core. AngularJS support has officially 
> ended. For extended AngularJS support options, see 
> https://goo.gle/angularjs-path-forward.
> [INFO] warning angular@1.6.10: For the actively supported Angular, see 
> https://www.npmjs.com/package/@angular/core. AngularJS support has officially 
> ended. For extended AngularJS support options, see 
> https://goo.gle/angularjs-path-forward.
> [INFO] [2/4] Fetching packages...
> [INFO] error triple-beam@1.4.1: The engine "node" is incompatible with this 
> module. Expected version ">= 14.0.0". Got "12.22.1"
> [INFO] error Found incompatible module.
> [INFO] error Found incompatible module.info Visit 
> https://yarnpkg.com/en/docs/cli/install for documentation about this command.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11527) Upgrade node.js to 14.0.0 in YARN application catalog webapp

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802636#comment-17802636
 ] 

Shilun Fan commented on YARN-11527:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Upgrade node.js to 14.0.0 in YARN application catalog webapp
> 
>
> Key: YARN-11527
> URL: https://issues.apache.org/jira/browse/YARN-11527
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
> Environment: Upgrade node.js to 14.0.0 in YARN application catalog 
> webapp
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [INFO] yarn install v1.22.5
> [INFO] info No lockfile found.
> [INFO] [1/4] Resolving packages...
> [INFO] warning angular-route@1.6.10: For the actively supported Angular, see 
> https://www.npmjs.com/package/@angular/core. AngularJS support has officially 
> ended. For extended AngularJS support options, see 
> https://goo.gle/angularjs-path-forward.
> [INFO] warning angular@1.6.10: For the actively supported Angular, see 
> https://www.npmjs.com/package/@angular/core. AngularJS support has officially 
> ended. For extended AngularJS support options, see 
> https://goo.gle/angularjs-path-forward.
> [INFO] [2/4] Fetching packages...
> [INFO] error triple-beam@1.4.1: The engine "node" is incompatible with this 
> module. Expected version ">= 14.0.0". Got "12.22.1"
> [INFO] error Found incompatible module.
> [INFO] error Found incompatible module.info Visit 
> https://yarnpkg.com/en/docs/cli/install for documentation about this command.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11458) Maven parallel build fails when Yarn UI v2 is enabled

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802637#comment-17802637
 ] 

Shilun Fan commented on YARN-11458:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Maven parallel build fails when Yarn UI v2 is enabled
> -
>
> Key: YARN-11458
> URL: https://issues.apache.org/jira/browse/YARN-11458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn, yarn-ui-v2
>Affects Versions: 3.4.0, 3.3.5, 3.3.4, 3.3.9
> Environment: The problem occurs sporadically while using the Hadoop 
> development environment (Ubuntu)
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Critical
>
> Running a parallel build fails during assembly with the following error when 
> running either package or install:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-yarn) on 
> project hadoop-yarn-project: Failed to create assembly: Error creating 
> assembly archive hadoop-yarn-dist: 
> /workspace/hadoop-yarn-project/./hadoop-yarn/hadoop-yarn-ui/target/webapp/tmp/sass_compiler-input_base_path-iNYf9pEm.tmp/vendor/ember-qunit/ember-qunit.map
>  -> [Help 1]{code}
> This appears to be a race condition introduce when `-Pyarn-ui` is used 
> because the `hadoop-yarn-project` does not have a dependency listed for 
> `yarn-ui`.
> The command executed was:
> {code:java}
> $ mvn -nsu clean install -Pdist,native -DskipTests -Dtar 
> -Dmaven.javadoc.skip=true -T 2C {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11631) [GPG] Add GPGWebServices

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11631:
--
Target Version/s: 3.5.0  (was: 3.4.0)

> [GPG] Add GPGWebServices
> 
>
> Key: YARN-11631
> URL: https://issues.apache.org/jira/browse/YARN-11631
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11529) Add metrics for ContainerMonitorImpl.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11529.
---
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Add metrics for ContainerMonitorImpl.
> -
>
> Key: YARN-11529
> URL: https://issues.apache.org/jira/browse/YARN-11529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: Xianming Lei
>Assignee: Xianming Lei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In our production environment, we have ample machine resources and a 
> significant number of active Containers. However, the MonitoringThread in 
> ContainerMonitorImpl experiences significant latency during each execution. 
> To address this, it is highly recommended to incorporate metrics for 
> monitoring the duration of this time-consuming process.
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11635) Fix hadoop-yarn-server-nodemanager module Java Doc Errors.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11635:
--
Target Version/s:   (was: 3.4.0)

> Fix hadoop-yarn-server-nodemanager module Java Doc Errors.
> --
>
> Key: YARN-11635
> URL: https://issues.apache.org/jira/browse/YARN-11635
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> I noticed that nodemanager had some java doc errors when compiling with 
> JDK11. In this jira, I will fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2024-01-02 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801963#comment-17801963
 ] 

Shilun Fan commented on YARN-7592:
--

I will continue to follow up on this issue in the next 1-2 days.

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11632) [Doc] Add allow-partial-result description to Yarn Federation documentation

2024-01-02 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11632.
---
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> [Doc] Add allow-partial-result description to Yarn Federation documentation
> ---
>
> Key: YARN-11632
> URL: https://issues.apache.org/jira/browse/YARN-11632
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Add allow-partial-result description to Yarn Federation documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11638) [GPG] GPG Support CLI.

2023-12-31 Thread Shilun Fan (Jira)
Shilun Fan created YARN-11638:
-

 Summary: [GPG] GPG Support CLI.
 Key: YARN-11638
 URL: https://issues.apache.org/jira/browse/YARN-11638
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: federation
Affects Versions: 3.4.0
Reporter: Shilun Fan
Assignee: Shilun Fan


We will add a set of command lines to GPG so that GPG can better refresh the 
policy and provide some other convenient functions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2023-12-22 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799732#comment-17799732
 ] 

Shilun Fan commented on YARN-7592:
--

[~it_singer] Thank you for reporting this issue! I will reply as soon as 
possible on how to handle this issue.

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    5   6   7   8   9   10   11   12   13   >