Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree
+1 (non-binding) On Fri, Sep 20, 2019 at 8:01 AM Rakesh Radhakrishnan wrote: > +1 > > Rakesh > > On Fri, Sep 20, 2019 at 12:29 AM Aaron Fabbri wrote: > > > +1 (binding) > > > > Thanks to the Ozone folks for their efforts at maintaining good > separation > > with HDFS and common. I took a lot of heat for the unpopular opinion that > > they should be separate, so I am glad the process has worked out well > for > > both codebases. It looks like my concerns were addressed and I appreciate > > it. It is cool to see the evolution here. > > > > Aaron > > > > > > On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran > > > > > wrote: > > > > > in that case, > > > > > > +1 from me (binding) > > > > > > On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton wrote: > > > > > > > > one thing to consider here as you are giving up your ability to > make > > > > > changes in hadoop-* modules, including hadoop-common, and their > > > > > dependencies, in sync with your own code. That goes for filesystem > > > > contract > > > > > tests. > > > > > > > > > > are you happy with that? > > > > > > > > > > > > Yes. I think we can live with it. > > > > > > > > Fortunatelly the Hadoop parts which are used by Ozone (security + > rpc) > > > > are stable enough, we didn't need bigger changes until now (small > > > > patches are already included in 3.1/3.2). > > > > > > > > I think it's better to use released Hadoop bits in Ozone anyway, and > > > > worst (best?) case we can try to do more frequent patch releases from > > > > Hadoop (if required). > > > > > > > > > > > > m. > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree
+1 Rakesh On Fri, Sep 20, 2019 at 12:29 AM Aaron Fabbri wrote: > +1 (binding) > > Thanks to the Ozone folks for their efforts at maintaining good separation > with HDFS and common. I took a lot of heat for the unpopular opinion that > they should be separate, so I am glad the process has worked out well for > both codebases. It looks like my concerns were addressed and I appreciate > it. It is cool to see the evolution here. > > Aaron > > > On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran > > wrote: > > > in that case, > > > > +1 from me (binding) > > > > On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton wrote: > > > > > > one thing to consider here as you are giving up your ability to make > > > > changes in hadoop-* modules, including hadoop-common, and their > > > > dependencies, in sync with your own code. That goes for filesystem > > > contract > > > > tests. > > > > > > > > are you happy with that? > > > > > > > > > Yes. I think we can live with it. > > > > > > Fortunatelly the Hadoop parts which are used by Ozone (security + rpc) > > > are stable enough, we didn't need bigger changes until now (small > > > patches are already included in 3.1/3.2). > > > > > > I think it's better to use released Hadoop bits in Ozone anyway, and > > > worst (best?) case we can try to do more frequent patch releases from > > > Hadoop (if required). > > > > > > > > > m. > > > > > > > > > > > >
[jira] [Created] (YARN-9848) revert YARN-4946
Steven Rand created YARN-9848: - Summary: revert YARN-4946 Key: YARN-9848 URL: https://issues.apache.org/jira/browse/YARN-9848 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation, resourcemanager Reporter: Steven Rand In YARN-4946, we've been discussing a revert due to the potential for keeping more applications in the state store than desired, and the potential to greatly increase RM recovery times. I'm in favor of reverting the patch, but other ideas along the lines of YARN-9571 would work as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9847) ZKRMStateStore will cause zk connection loss when writing huge data into znode
Wang, Xinglong created YARN-9847: Summary: ZKRMStateStore will cause zk connection loss when writing huge data into znode Key: YARN-9847 URL: https://issues.apache.org/jira/browse/YARN-9847 Project: Hadoop YARN Issue Type: Improvement Reporter: Wang, Xinglong Assignee: Wang, Xinglong Recently, we encountered RM ZK connection issue due to RM was trying to write huge data into znode. This behavior will zk report Len error and then cause zk session connection loss. And eventually RM would crash due to zk connection issue. *The fix* In order to protect ResouceManager from crash due to this. This fix is trying to limit the size of data for attemp by limiting the diagnostic info when writing ApplicationAttemptStateData into znode. The size will be regulated by -Djute.maxbuffer set in yarn-env.sh. The same value will be also used by zookeeper server. *The story* ResourceManager Log {code:java} 2019-07-29 02:14:59,638 WARN org.apache.zookeeper.ClientCnxn: Session 0x36ab902369100a0 for serverabc-zk-5.vip.ebay.com/10.210.82.29:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2019-07-29 04:27:35,459 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:1050) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:699) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:317) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:299) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} ResourceManager will retry to connect to zookeeper until it exhausted retry number and then give up. {code:java} 2019-07-29 02:25:06,404 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying operation on ZK. Retry no. 999 2019-07-29 02:25:06,718 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism. 2019-07-29 02:25:06,718 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 2019-07-29 02:25:06,404 INFO org.apache.
[jira] [Resolved] (YARN-6684) TestAMRMClient tests fail on branch-2.7
[ https://issues.apache.org/jira/browse/YARN-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-6684. - Resolution: Won't Fix branch-2.7 EOL, closing as won't fix > TestAMRMClient tests fail on branch-2.7 > --- > > Key: YARN-6684 > URL: https://issues.apache.org/jira/browse/YARN-6684 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Priority: Major > > {noformat}2017-06-01 19:10:44,362 INFO capacity.CapacityScheduler > (CapacityScheduler.java:addNode(1335)) - Added node > jhung-ld2.linkedin.biz:58205 clusterResource: > 2017-06-01 19:10:44,370 INFO server.MiniYARNCluster > (MiniYARNCluster.java:waitForNodeManagersToConnect(657)) - All Node Managers > connected in MiniYARNCluster > 2017-06-01 19:10:44,376 INFO client.RMProxy (RMProxy.java:createRMProxy(98)) > - Connecting to ResourceManager at jhung-ld2.linkedin.biz/ipaddr:36167 > 2017-06-01 19:10:45,501 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2017-06-01 19:10:46,502 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2017-06-01 19:10:47,503 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2017-06-01 19:10:48,504 INFO ipc.Client > (Client.java:handleConnectionFailure(872)) - Retrying connect to server: > jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS){noformat} > After some investigation, seems it is the same issue as described here: > HDFS-11893 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8825) Print application tags in ApplicationSummary
[ https://issues.apache.org/jira/browse/YARN-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-8825. - Resolution: Duplicate > Print application tags in ApplicationSummary > > > Key: YARN-8825 > URL: https://issues.apache.org/jira/browse/YARN-8825 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > > Useful for tracking application tag metadata. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9844) TestCapacitySchedulerPerf test errors in branch-2
[ https://issues.apache.org/jira/browse/YARN-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-9844. - Resolution: Fixed > TestCapacitySchedulerPerf test errors in branch-2 > - > > Key: YARN-9844 > URL: https://issues.apache.org/jira/browse/YARN-9844 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jonathan Hung >Priority: Major > > These TestCapacitySchedulerPerf throughput tests are failing in branch-2: > {{[ERROR] > TestCapacitySchedulerPerf.testUserLimitThroughputForFiveResources:263->testUserLimitThroughputWithNumberOfResourceTypes:114 > » ArrayIndexOutOfBounds}} > {{[ERROR] > TestCapacitySchedulerPerf.testUserLimitThroughputForFourResources:258->testUserLimitThroughputWithNumberOfResourceTypes:114 > » ArrayIndexOutOfBounds}} > {{[ERROR] > TestCapacitySchedulerPerf.testUserLimitThroughputForThreeResources:253->testUserLimitThroughputWithNumberOfResourceTypes:114 > » ArrayIndexOutOfBounds}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Hadoop-3.1.3-RC0
+1 (binding) Thanks Zhankun for all of your hard work on this release. I downloaded and built the source and ran it on an insecure multi-node pseudo cluster. I performed various YARN manual tests, including creating custom resources, creating queue submission ACLs, and queue refreshes. One concern is that preemption does not seem to be working when only the custom resources are over the queue capacity, but I don't think this is something introduced with this release. -Eric On Thursday, September 12, 2019, 3:04:44 AM CDT, Zhankun Tang wrote: Hi folks, Thanks to everyone's help on this release. Special thanks to Rohith, Wei-Chiu, Akira, Sunil, Wangda! I have created a release candidate (RC0) for Apache Hadoop 3.1.3. The RC release artifacts are available at: http://home.apache.org/~ztang/hadoop-3.1.3-RC0/ The maven artifacts are staged at: https://repository.apache.org/content/repositories/orgapachehadoop-1228/ The RC tag in git is here: https://github.com/apache/hadoop/tree/release-3.1.3-RC0 And my public key is at: https://dist.apache.org/repos/dist/release/hadoop/common/KEYS *This vote will run for 7 days, ending on Sept.19th at 11:59 pm PST.* For the testing, I have run several Spark and distributed shell jobs in my pseudo cluster. My +1 (non-binding) to start. BR, Zhankun On Wed, 4 Sep 2019 at 15:56, zhankun tang wrote: > Hi all, > > Thanks for everyone helping in resolving all the blockers targeting Hadoop > 3.1.3[1]. We've cleaned all the blockers and moved out non-blockers issues > to 3.1.4. > > I'll cut the branch today and call a release vote soon. Thanks! > > > [1]. https://s.apache.org/5hj5i > > BR, > Zhankun > > > On Wed, 21 Aug 2019 at 12:38, Zhankun Tang wrote: > >> Hi folks, >> >> We have Apache Hadoop 3.1.2 released on Feb 2019. >> >> It's been more than 6 months passed and there're >> >> 246 fixes[1]. 2 blocker and 4 critical Issues [2] >> >> (As Wei-Chiu Chuang mentioned, HDFS-13596 will be another blocker) >> >> >> I propose my plan to do a maintenance release of 3.1.3 in the next few >> (one or two) weeks. >> >> Hadoop 3.1.3 release plan: >> >> Code Freezing Date: *25th August 2019 PDT* >> >> Release Date: *31th August 2019 PDT* >> >> >> Please feel free to share your insights on this. Thanks! >> >> >> [1] https://s.apache.org/zw8l5 >> >> [2] https://s.apache.org/fjol5 >> >> >> BR, >> >> Zhankun >> > - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9846) User Fineer-Grain Synchronization in ResourceLocalizationService.java
David Mollitor created YARN-9846: Summary: User Fineer-Grain Synchronization in ResourceLocalizationService.java Key: YARN-9846 URL: https://issues.apache.org/jira/browse/YARN-9846 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788 # Remove these synchronization blocks # Ensure {{recentlyCleanedLocalizers}} is thread safe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree
+1 (binding) Thanks to the Ozone folks for their efforts at maintaining good separation with HDFS and common. I took a lot of heat for the unpopular opinion that they should be separate, so I am glad the process has worked out well for both codebases. It looks like my concerns were addressed and I appreciate it. It is cool to see the evolution here. Aaron On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran wrote: > in that case, > > +1 from me (binding) > > On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton wrote: > > > > one thing to consider here as you are giving up your ability to make > > > changes in hadoop-* modules, including hadoop-common, and their > > > dependencies, in sync with your own code. That goes for filesystem > > contract > > > tests. > > > > > > are you happy with that? > > > > > > Yes. I think we can live with it. > > > > Fortunatelly the Hadoop parts which are used by Ozone (security + rpc) > > are stable enough, we didn't need bigger changes until now (small > > patches are already included in 3.1/3.2). > > > > I think it's better to use released Hadoop bits in Ozone anyway, and > > worst (best?) case we can try to do more frequent patch releases from > > Hadoop (if required). > > > > > > m. > > > > > > >
[jira] [Created] (YARN-9845) Update to Use Java 8 Map Concurrent API
David Mollitor created YARN-9845: Summary: Update to Use Java 8 Map Concurrent API Key: YARN-9845 URL: https://issues.apache.org/jira/browse/YARN-9845 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java#L467 Class is using a {{ConcurrentHashMap}} but is not taking advantage of it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9844) TestCapacitySchedulerPerf test errors in branch-2
Jim Brennan created YARN-9844: - Summary: TestCapacitySchedulerPerf test errors in branch-2 Key: YARN-9844 URL: https://issues.apache.org/jira/browse/YARN-9844 Project: Hadoop YARN Issue Type: Bug Components: test, yarn Affects Versions: 2.10.0 Reporter: Jim Brennan **These TestCapacitySchedulerPerf throughput tests are failing in branch-2: {{[ERROR] TestCapacitySchedulerPerf.testUserLimitThroughputForFiveResources:263->testUserLimitThroughputWithNumberOfResourceTypes:114 » ArrayIndexOutOfBounds}}{{[ERROR] TestCapacitySchedulerPerf.testUserLimitThroughputForFourResources:258->testUserLimitThroughputWithNumberOfResourceTypes:114 » ArrayIndexOutOfBounds}}{{[ERROR] TestCapacitySchedulerPerf.testUserLimitThroughputForThreeResources:253->testUserLimitThroughputWithNumberOfResourceTypes:114 » ArrayIndexOutOfBounds}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Hadoop-3.1.3-RC0
+1 (binding) - Downloaded source, setup a single node cluster - Verified basic HDFS operations, put/get/cat etc - Verified basic YARN restful apis, cluster/nodes/scheduler, all seem good - Run several distributed shell job Thanks Weiwei On Sep 19, 2019, 4:28 PM +0800, Sunil Govindan , wrote: > +1 (binding) > > Thanks Zhankun for putting up the release. Thanks for leading this. > > - verified signature > - ran a local cluster from tar ball > - ran some MR jobs > - perform CLI ops, and looks good > - UI seems fine > > Thanks > Sunil > > On Thu, Sep 12, 2019 at 1:34 PM Zhankun Tang wrote: > > > Hi folks, > > > > Thanks to everyone's help on this release. Special thanks to Rohith, > > Wei-Chiu, Akira, Sunil, Wangda! > > > > I have created a release candidate (RC0) for Apache Hadoop 3.1.3. > > > > The RC release artifacts are available at: > > http://home.apache.org/~ztang/hadoop-3.1.3-RC0/ > > > > The maven artifacts are staged at: > > https://repository.apache.org/content/repositories/orgapachehadoop-1228/ > > > > The RC tag in git is here: > > https://github.com/apache/hadoop/tree/release-3.1.3-RC0 > > > > And my public key is at: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > > > *This vote will run for 7 days, ending on Sept.19th at 11:59 pm PST.* > > > > For the testing, I have run several Spark and distributed shell jobs in my > > pseudo cluster. > > > > My +1 (non-binding) to start. > > > > BR, > > Zhankun > > > > On Wed, 4 Sep 2019 at 15:56, zhankun tang wrote: > > > > > Hi all, > > > > > > Thanks for everyone helping in resolving all the blockers targeting > > Hadoop > > > 3.1.3[1]. We've cleaned all the blockers and moved out non-blockers > > issues > > > to 3.1.4. > > > > > > I'll cut the branch today and call a release vote soon. Thanks! > > > > > > > > > [1]. https://s.apache.org/5hj5i > > > > > > BR, > > > Zhankun > > > > > > > > > On Wed, 21 Aug 2019 at 12:38, Zhankun Tang wrote: > > > > > > > Hi folks, > > > > > > > > We have Apache Hadoop 3.1.2 released on Feb 2019. > > > > > > > > It's been more than 6 months passed and there're > > > > > > > > 246 fixes[1]. 2 blocker and 4 critical Issues [2] > > > > > > > > (As Wei-Chiu Chuang mentioned, HDFS-13596 will be another blocker) > > > > > > > > > > > > I propose my plan to do a maintenance release of 3.1.3 in the next few > > > > (one or two) weeks. > > > > > > > > Hadoop 3.1.3 release plan: > > > > > > > > Code Freezing Date: *25th August 2019 PDT* > > > > > > > > Release Date: *31th August 2019 PDT* > > > > > > > > > > > > Please feel free to share your insights on this. Thanks! > > > > > > > > > > > > [1] https://s.apache.org/zw8l5 > > > > > > > > [2] https://s.apache.org/fjol5 > > > > > > > > > > > > BR, > > > > > > > > Zhankun > > > > > > > > >
[jira] [Created] (YARN-9843) Test TestAMSimulator.testAMSimulator fails intermittently.
Abhishek Modi created YARN-9843: --- Summary: Test TestAMSimulator.testAMSimulator fails intermittently. Key: YARN-9843 URL: https://issues.apache.org/jira/browse/YARN-9843 Project: Hadoop YARN Issue Type: Test Reporter: Abhishek Modi Assignee: Abhishek Modi Stack trace for failure: java.lang.AssertionError: java.io.IOException: Unable to delete directory /testptch/hadoop/hadoop-tools/hadoop-sls/target/test-dir/output4038286622450859971/metrics. at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.deleteMetricOutputDir(TestAMSimulator.java:141) at org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.tearDown(TestAMSimulator.java:298) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree
in that case, +1 from me (binding) On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton wrote: > > one thing to consider here as you are giving up your ability to make > > changes in hadoop-* modules, including hadoop-common, and their > > dependencies, in sync with your own code. That goes for filesystem > contract > > tests. > > > > are you happy with that? > > > Yes. I think we can live with it. > > Fortunatelly the Hadoop parts which are used by Ozone (security + rpc) > are stable enough, we didn't need bigger changes until now (small > patches are already included in 3.1/3.2). > > I think it's better to use released Hadoop bits in Ozone anyway, and > worst (best?) case we can try to do more frequent patch releases from > Hadoop (if required). > > > m. > > >
[jira] [Created] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
Abhishek Modi created YARN-9842: --- Summary: Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2 Key: YARN-9842 URL: https://issues.apache.org/jira/browse/YARN-9842 Project: Hadoop YARN Issue Type: Task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0
Hi Rohith, Thanks for driving this release. +1 (binding) - built from the source on windows machine. - created a pseudo cluster. - ran PI job. - checked basic metrics with ATSv2 enabled. On Thu, Sep 19, 2019 at 12:30 PM Sunil Govindan wrote: > Hi Rohith > > Thanks for putting this together, appreciate the same. > > +1 (binding) > > - verified signature > - brought up a cluster from the tar ball > - Ran some basic MR jobs > - RM UI seems fine (old and new) > > > Thanks > Sunil > > On Wed, Sep 11, 2019 at 12:56 PM Rohith Sharma K S < > rohithsharm...@apache.org> wrote: > > > Hi folks, > > > > I have put together a release candidate (RC0) for Apache Hadoop 3.2.1. > > > > The RC is available at: > > http://home.apache.org/~rohithsharmaks/hadoop-3.2.1-RC0/ > > > > The RC tag in git is release-3.2.1-RC0: > > https://github.com/apache/hadoop/tree/release-3.2.1-RC0 > > > > > > The maven artifacts are staged at > > https://repository.apache.org/content/repositories/orgapachehadoop-1226/ > > > > You can find my public key at: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > > > This vote will run for 7 days(5 weekdays), ending on 18th Sept at 11:59 > pm > > PST. > > > > I have done testing with a pseudo cluster and distributed shell job. My > +1 > > to start. > > > > Thanks & Regards > > Rohith Sharma K S > > > -- Regards, Abhishek Modi
Re: [VOTE] Release Hadoop-3.1.3-RC0
+1 (binding) Thanks Zhankun for putting up the release. Thanks for leading this. - verified signature - ran a local cluster from tar ball - ran some MR jobs - perform CLI ops, and looks good - UI seems fine Thanks Sunil On Thu, Sep 12, 2019 at 1:34 PM Zhankun Tang wrote: > Hi folks, > > Thanks to everyone's help on this release. Special thanks to Rohith, > Wei-Chiu, Akira, Sunil, Wangda! > > I have created a release candidate (RC0) for Apache Hadoop 3.1.3. > > The RC release artifacts are available at: > http://home.apache.org/~ztang/hadoop-3.1.3-RC0/ > > The maven artifacts are staged at: > https://repository.apache.org/content/repositories/orgapachehadoop-1228/ > > The RC tag in git is here: > https://github.com/apache/hadoop/tree/release-3.1.3-RC0 > > And my public key is at: > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > *This vote will run for 7 days, ending on Sept.19th at 11:59 pm PST.* > > For the testing, I have run several Spark and distributed shell jobs in my > pseudo cluster. > > My +1 (non-binding) to start. > > BR, > Zhankun > > On Wed, 4 Sep 2019 at 15:56, zhankun tang wrote: > > > Hi all, > > > > Thanks for everyone helping in resolving all the blockers targeting > Hadoop > > 3.1.3[1]. We've cleaned all the blockers and moved out non-blockers > issues > > to 3.1.4. > > > > I'll cut the branch today and call a release vote soon. Thanks! > > > > > > [1]. https://s.apache.org/5hj5i > > > > BR, > > Zhankun > > > > > > On Wed, 21 Aug 2019 at 12:38, Zhankun Tang wrote: > > > >> Hi folks, > >> > >> We have Apache Hadoop 3.1.2 released on Feb 2019. > >> > >> It's been more than 6 months passed and there're > >> > >> 246 fixes[1]. 2 blocker and 4 critical Issues [2] > >> > >> (As Wei-Chiu Chuang mentioned, HDFS-13596 will be another blocker) > >> > >> > >> I propose my plan to do a maintenance release of 3.1.3 in the next few > >> (one or two) weeks. > >> > >> Hadoop 3.1.3 release plan: > >> > >> Code Freezing Date: *25th August 2019 PDT* > >> > >> Release Date: *31th August 2019 PDT* > >> > >> > >> Please feel free to share your insights on this. Thanks! > >> > >> > >> [1] https://s.apache.org/zw8l5 > >> > >> [2] https://s.apache.org/fjol5 > >> > >> > >> BR, > >> > >> Zhankun > >> > > >
[jira] [Created] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
Peter Bacsko created YARN-9841: -- Summary: Capacity scheduler: add support for combined %user + %primary_group mapping Key: YARN-9841 URL: https://issues.apache.org/jira/browse/YARN-9841 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko Right now in CS, using {{%primary_group}} with a parent queue is only possible this way: {{u:%user:parentqueue.%primary_group}} Looking at https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, we cannot do something like: {{u:%user:%primary_group.%user}} Fair Scheduler supports a nested rule where such a placement/mapping rule is possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9840) Capacity scheduler: add support for Secondary Group user mapping
Peter Bacsko created YARN-9840: -- Summary: Capacity scheduler: add support for Secondary Group user mapping Key: YARN-9840 URL: https://issues.apache.org/jira/browse/YARN-9840 Project: Hadoop YARN Issue Type: Improvement Reporter: Peter Bacsko Assignee: Peter Bacsko Currently, Capacity Scheduler only supports primary group rule mapping like this: {{u:%user:%primary_group}} Fair scheduler already supports secondary group placement rule. Let's add this to CS to reduce the feature gap. Class of interest: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0
Hi Rohith Thanks for putting this together, appreciate the same. +1 (binding) - verified signature - brought up a cluster from the tar ball - Ran some basic MR jobs - RM UI seems fine (old and new) Thanks Sunil On Wed, Sep 11, 2019 at 12:56 PM Rohith Sharma K S < rohithsharm...@apache.org> wrote: > Hi folks, > > I have put together a release candidate (RC0) for Apache Hadoop 3.2.1. > > The RC is available at: > http://home.apache.org/~rohithsharmaks/hadoop-3.2.1-RC0/ > > The RC tag in git is release-3.2.1-RC0: > https://github.com/apache/hadoop/tree/release-3.2.1-RC0 > > > The maven artifacts are staged at > https://repository.apache.org/content/repositories/orgapachehadoop-1226/ > > You can find my public key at: > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > This vote will run for 7 days(5 weekdays), ending on 18th Sept at 11:59 pm > PST. > > I have done testing with a pseudo cluster and distributed shell job. My +1 > to start. > > Thanks & Regards > Rohith Sharma K S >