Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. >> I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: > Thanks for putting this up, Gabor Bota. > > I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. > ResourceManager reproducibly blocks on submitApplication while launching > example MR jobs. > Does anyone run into the same issue? > > The same configuration worked for 3.1.3. > I got no issue if RM-HA is disabled. > > > "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 > tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x85d37a40> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) > at > > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) > at > > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) > at > > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) > > > Masatake Iwasaki > > On 2020/06/26 22:51, Gabor Bota wrote: > > Hi folks, > > > > I have put together a release candidate (RC2) for Hadoop 3.1.4. > > > > The RC is available at: > http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ > > The RC tag in git is here: > > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 > > The maven artifacts are staged at > > https://repository.apache.org/content/repositories/orgapachehadoop-1269/ > > > > You can find my public key at: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C > > > > Please try the release and vote. The vote will run for 5 weekdays, > > until July 6. 2020. 23:00 CET. > > > > The release includes the revert of HDFS-14941, as it caused > > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. > > (https://issues.apache.org/jira/browse/HDFS-15421) > > The release includes HDFS-15323, as requested. > > (https://issues.apache.org/jira/browse/HDFS-15323) > > > > Thanks, > > Gabor > > > > - > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >
[jira] [Created] (YARN-10345) HsWebServices containerlogs does not honor ACLs for completed jobs
Prabhu Joseph created YARN-10345: Summary: HsWebServices containerlogs does not honor ACLs for completed jobs Key: YARN-10345 URL: https://issues.apache.org/jira/browse/YARN-10345 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.2.0, 3.4.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Attachments: Screen Shot 2020-07-08 at 12.54.21 PM.png HsWebServices containerlogs does not honor ACLs. User who does not have permission to view a job is allowed to view the job logs from YARN UI2 through HsWebServices. *Repro:* Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + HistoryServer runs as mapred 1. Run a sample MR job using systest user 2. Once the job is complete, access the job logs using hue user from YARN UI2. YARN CLI works fine. {code} [hue@pjoseph-cm-2 /]$ [hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002 WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. 20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at rmhostname:8032 Permission denied: user=hue, access=EXECUTE, inode="/tmp/logs/systest":systest:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10346) Add testcase for RMWebApp make external class pluggable
Bilwa S T created YARN-10346: Summary: Add testcase for RMWebApp make external class pluggable Key: YARN-10346 URL: https://issues.apache.org/jira/browse/YARN-10346 Project: Hadoop YARN Issue Type: Bug Reporter: Bilwa S T Assignee: Bilwa S T Add testcase for Jira YARN-8047 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1
Masatake Iwasaki created YARN-10347: --- Summary: Fix double locking in CapacityScheduler#reinitialize in branch-3.1 Key: YARN-10347 URL: https://issues.apache.org/jira/browse/YARN-10347 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 3.1.4 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Double locking blocks another threads in ResourceManager waiting for the lock. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Thanks Steve and Prabhu for the information. The cause turned out to be locking in CapacityScheduler#reinitialize. I think the method is called after transitioning to active stat if RM-HA is enabled. I filed YARN-10347 and created PR. Masatake Iwasaki On 2020/07/08 16:33, Prabhu Joseph wrote: Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: Thanks for putting this up, Gabor Bota. I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. ResourceManager reproducibly blocks on submitApplication while launching example MR jobs. Does anyone run into the same issue? The same configuration worked for 3.1.3. I got no issue if RM-HA is disabled. "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x85d37a40> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) Masatake Iwasaki On 2020/06/26 22:51, Gabor Bota wrote: Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional c
[jira] [Created] (YARN-10348) Allow RM to always cancel tokens after app completes
Jim Brennan created YARN-10348: -- Summary: Allow RM to always cancel tokens after app completes Key: YARN-10348 URL: https://issues.apache.org/jira/browse/YARN-10348 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.1.3, 2.10.0 Reporter: Jim Brennan Assignee: Jim Brennan (Note: this change was originally done on our internal branch by [~daryn]). The RM currently has an option for a client to specify disabling token cancellation when a job completes. This feature was an initial attempt to address the use case of a job launching sub-jobs (ie. oozie launcher) and the original job finishing prior to the sub-job(s) completion - ex. original job completion triggered premature cancellation of tokens needed by the sub-jobs. Many years ago, [~daryn] added a more robust implementation to ref count tokens ([YARN-3055]). This prevented premature cancellation of the token until all apps using the token complete, and invalidated the need for a client to specify cancel=false. Unfortunately the config option was not removed. We have seen cases where oozie "java actions" and some users were explicitly disabling token cancellation. This can lead to a buildup of defunct tokens that may overwhelm the ZK buffer used by the KDC's backing store. At which point the KMS fails to connect to ZK and is unable to issue/validate new tokens - rendering the KDC only able to authenticate pre-existing tokens. Production incidents have occurred due to the buffer size issue. To avoid these issues, the RM should have the option to ignore/override the client's request to not cancel tokens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/742/ No changes -1 overall The following subsystems voted -1: docker Powered by Apache Yetushttps://yetus.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.1.4 (RC2)
Hi Gabor Bota, I committed the fix of YARN-10347 to branch-3.1. I think this should be blocker for 3.1.4. Could you cherry-pick it to branch-3.1.4 and cut a new RC? Thanks, Masatake Iwasaki On 2020/07/08 23:31, Masatake Iwasaki wrote: Thanks Steve and Prabhu for the information. The cause turned out to be locking in CapacityScheduler#reinitialize. I think the method is called after transitioning to active stat if RM-HA is enabled. I filed YARN-10347 and created PR. Masatake Iwasaki On 2020/07/08 16:33, Prabhu Joseph wrote: Hi Masatake, The thread is waiting for a ReadLock, we need to check what the other thread holding WriteLock is blocked on. Can you get three consecutive complete jstack of ResourceManager during the issue. I got no issue if RM-HA is disabled. Looks RM is not able to access Zookeeper State Store. Can you check if there is any connectivity issue between RM and Zookeeper. Thanks, Prabhu Joseph On Mon, Jul 6, 2020 at 2:44 AM Masatake Iwasaki wrote: Thanks for putting this up, Gabor Bota. I'm testing the RC2 on 3 node docker cluster with NN-HA and RM-HA enabled. ResourceManager reproducibly blocks on submitApplication while launching example MR jobs. Does anyone run into the same issue? The same configuration worked for 3.1.3. I got no issue if RM-HA is disabled. "IPC Server handler 1 on default port 8032" #167 daemon prio=5 os_prio=0 tid=0x7fe91821ec50 nid=0x3b9 waiting on condition [0x7fe901bac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x85d37a40> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943) Masatake Iwasaki On 2020/06/26 22:51, Gabor Bota wrote: Hi folks, I have put together a release candidate (RC2) for Hadoop 3.1.4. The RC is available at: http://people.apache.org/~gabota/hadoop-3.1.4-RC2/ The RC tag in git is here: https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC2 The maven artifacts are staged at https://repository.apache.org/content/repositories/orgapachehadoop-1269/ You can find my public key at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C Please try the release and vote. The vote will run for 5 weekdays, until July 6. 2020. 23:00 CET. The release includes the revert of HDFS-14941, as it caused HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. (https://issues.apache.org/jira/browse/HDFS-15421) The release includes HDFS-15323, as requested. (https://issues.apache.org/jira/browse/HDFS-15323) Thanks, Gabor - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org -