[jira] [Commented] (YARN-8849) DynoYARN: A simulation and testing infrastructure for YARN clusters
[ https://issues.apache.org/jira/browse/YARN-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060563#comment-17060563 ] Jonathan Hung commented on YARN-8849: - Hey [~brahmareddy], we're working on the open sourcing process, we'll post here when there's an update. > DynoYARN: A simulation and testing infrastructure for YARN clusters > --- > > Key: YARN-8849 > URL: https://issues.apache.org/jira/browse/YARN-8849 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun Suresh >Assignee: Jonathan Hung >Priority: Major > > Traditionally, YARN workload simulation is performed using SLS (Scheduler > Load Simulator) which is packaged with YARN. It Essentially, starts a full > fledged *ResourceManager*, but runs simulators for the *NodeManager* and the > *ApplicationMaster* Containers. These simulators are lightweight and run in a > threadpool. The NM simulators do not open any external ports and send > (in-process) heartbeats to the ResourceManager. > There are a couple of drawbacks with using the SLS: > * It might be difficult to simulate really large clusters without having > access to a very beefy box - since the NMs are launched as tasks in a > threadpool, and each NM has to send periodic heartbeats to the RM. > * Certain features (like YARN-1011) requires changes to the NodeManager - > aspects such as queuing and selectively killing containers have to be > incorporated into the existing NM Simulator which might make the simulator a > bit heavy weight - there is a need for locking and synchronization. > * Since the NM and AM are simulations, only the Scheduler is faithfully > tested - it does not really perform an end-2-end test of a cluster. > Therefore, drawing inspiration from > [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework > for YARN deployable YARN cluster - *DynoYARN* - for testing, with the > following features: > * The NM already has hooks to plug-in custom *ContainerExecutor* and > *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s > Monitoring thread (and other modules like the LocalizationService), We can > probably inject an Executor that does not actually launch containers and a > Node and Container resource monitor that reports synthetic pre-specified > Utilization metrics back to the RM. > * Since we are launching fake containers, we cannot run normal AM > containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs. > Essentially, a test workflow would look like this: > * Launch a DynoYARN cluster. > * Use the Unmanaged AM feature to directly negotiate with the DynaYARN > Resource Manager for container tokens. > * Use the container tokens from the RM to directly ask the DynoYARN Node > Managers to start fake containers. > * The DynoYARN NodeManagers will start the fake containers and report to the > DynoYARN Resource Manager synthetically generated resource utilization for > the containers (which will be injected via the *ContainerLaunchContext* and > parsed by the plugged-in Container Executor). > * The Scheduler will use the utilization report to schedule containers - we > will be able to test allocation of *Opportunistic* containers based on > resource utilization. > * Since the DynoYARN Node Managers run the actual code paths, all preemption > and queuing logic will be faithfully executed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10200) Add number of containers to RMAppManager summary
[ https://issues.apache.org/jira/browse/YARN-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-10200: - Description: It would be useful to persist this so we can track containers processed by RM. (was: We track number of containers per app, it would be useful to persist this so we can track long-term containers processed by RM.) > Add number of containers to RMAppManager summary > > > Key: YARN-10200 > URL: https://issues.apache.org/jira/browse/YARN-10200 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Priority: Major > > It would be useful to persist this so we can track containers processed by RM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10200) Add number of containers to RMAppManager summary
Jonathan Hung created YARN-10200: Summary: Add number of containers to RMAppManager summary Key: YARN-10200 URL: https://issues.apache.org/jira/browse/YARN-10200 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung We track number of containers per app, it would be useful to persist this so we can track long-term containers processed by RM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2710) RM HA tests failed intermittently on trunk
[ https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060529#comment-17060529 ] Hudson commented on YARN-2710: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18059 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18059/]) YARN-2710. RM HA tests failed intermittently on trunk. Contributed by (ebadger: rev 1975479285c2bb295ee74fb6ba93cce4da188f4d) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolForTimelineV2.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java > RM HA tests failed intermittently on trunk > -- > > Key: YARN-2710 > URL: https://issues.apache.org/jira/browse/YARN-2710 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Environment: Java 8, jenkins >Reporter: Wangda Tan >Assignee: Ahmed Hussein >Priority: Major > Fix For: 2.10.1, 3.4.0 > > Attachments: TestResourceTrackerOnHA-output.2.txt, > YARN-2710-branch-2.10.001.patch, YARN-2710-branch-2.10.002.patch, > YARN-2710-branch-2.10.003.patch, YARN-2710.001.patch, YARN-2710.002.patch, > YARN-2710.003.patch, > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt > > > Failure like, it can be happened in TestApplicationClientProtocolOnHA, > TestResourceTrackerOnHA, etc. > {code} > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA > testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA) > Time elapsed: 9.491 sec <<< ERROR! > java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 > to asf905.gq1.ygridcore.net:28032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) > at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583) > at > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2710) RM HA tests failed intermittently on trunk
[ https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-2710: -- Fix Version/s: 3.4.0 2.10.1 I committed this to trunk and branch-2.10. Idk why I did it in that order, but I then realized that the cherry-pick to branch-3.2 from trunk doesn't compile even though it's a clean pick. [~ahussein], could you put up a patch for branch-3.2? I also don't see a branch-3.3, but 3.4 is the latest fix version available. [~brahma], can you advise on this? > RM HA tests failed intermittently on trunk > -- > > Key: YARN-2710 > URL: https://issues.apache.org/jira/browse/YARN-2710 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Environment: Java 8, jenkins >Reporter: Wangda Tan >Assignee: Ahmed Hussein >Priority: Major > Fix For: 2.10.1, 3.4.0 > > Attachments: TestResourceTrackerOnHA-output.2.txt, > YARN-2710-branch-2.10.001.patch, YARN-2710-branch-2.10.002.patch, > YARN-2710-branch-2.10.003.patch, YARN-2710.001.patch, YARN-2710.002.patch, > YARN-2710.003.patch, > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt > > > Failure like, it can be happened in TestApplicationClientProtocolOnHA, > TestResourceTrackerOnHA, etc. > {code} > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA > testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA) > Time elapsed: 9.491 sec <<< ERROR! > java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 > to asf905.gq1.ygridcore.net:28032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) > at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583) > at > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2710) RM HA tests failed intermittently on trunk
[ https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060488#comment-17060488 ] Hadoop QA commented on YARN-2710: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 42s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_242 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_242 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed with JDK v1.8.0_242 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 2 new + 23 unchanged - 2 fixed = 25 total (was 25) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed with JDK v1.8.0_242 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 41s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.api.impl.TestAMRMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:a969cad0a12 | | JIRA Issue | YARN-2710 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12996864/YARN-2710-branch-2.10.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ffc67b2d66ff 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.10 / 5854aa4 | | maven |
[jira] [Updated] (YARN-2710) RM HA tests failed intermittently on trunk
[ https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-2710: Attachment: YARN-2710-branch-2.10.003.patch > RM HA tests failed intermittently on trunk > -- > > Key: YARN-2710 > URL: https://issues.apache.org/jira/browse/YARN-2710 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Environment: Java 8, jenkins >Reporter: Wangda Tan >Assignee: Ahmed Hussein >Priority: Major > Attachments: TestResourceTrackerOnHA-output.2.txt, > YARN-2710-branch-2.10.001.patch, YARN-2710-branch-2.10.002.patch, > YARN-2710-branch-2.10.003.patch, YARN-2710.001.patch, YARN-2710.002.patch, > YARN-2710.003.patch, > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt > > > Failure like, it can be happened in TestApplicationClientProtocolOnHA, > TestResourceTrackerOnHA, etc. > {code} > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA > testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA) > Time elapsed: 9.491 sec <<< ERROR! > java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 > to asf905.gq1.ygridcore.net:28032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) > at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583) > at > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2710) RM HA tests failed intermittently on trunk
[ https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-2710: Attachment: YARN-2710.003.patch > RM HA tests failed intermittently on trunk > -- > > Key: YARN-2710 > URL: https://issues.apache.org/jira/browse/YARN-2710 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Environment: Java 8, jenkins >Reporter: Wangda Tan >Assignee: Ahmed Hussein >Priority: Major > Attachments: TestResourceTrackerOnHA-output.2.txt, > YARN-2710-branch-2.10.001.patch, YARN-2710-branch-2.10.002.patch, > YARN-2710.001.patch, YARN-2710.002.patch, YARN-2710.003.patch, > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt > > > Failure like, it can be happened in TestApplicationClientProtocolOnHA, > TestResourceTrackerOnHA, etc. > {code} > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA > testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA) > Time elapsed: 9.491 sec <<< ERROR! > java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 > to asf905.gq1.ygridcore.net:28032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) > at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583) > at > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868
[ https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060409#comment-17060409 ] Peter Bacsko commented on YARN-10198: - [~maniraj...@gmail.com] I think so. YARN-9868 introduced a regression. With the new changes, a leaf queue named {{%primary_group}} has to exist which was not the case before - if {{managedQueue}} exists with {{auto-create-child-queue=true}}, it was dynamically created. Since an existing behavior changed, that part of the patch should be reverted. > [managedParent].%primary_group mapping rule doesn't work after YARN-9868 > > > Key: YARN-10198 > URL: https://issues.apache.org/jira/browse/YARN-10198 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > YARN-9868 introduced an unnecessary check if we have the following placement > rule: > [managedParentQueue].%primary_group > Here, {{%primary_group}} is expected to be created if it doesn't exist. > However, there is this validation code which is not necessary: > {noformat} > } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) { > if (this.queueManager > .getQueue(groups.getGroups(user).get(0)) != null) { > return getPlacementContext(mapping, > groups.getGroups(user).get(0)); > } else { > return null; > } > {noformat} > We should revert this part to the original version: > {noformat} > } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) { > return getPlacementContext(mapping, > groups.getGroups(user).get(0)); > } > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10199) Simplify UserGroupMappingPlacementRule#getPlacementForUser
Andras Gyori created YARN-10199: --- Summary: Simplify UserGroupMappingPlacementRule#getPlacementForUser Key: YARN-10199 URL: https://issues.apache.org/jira/browse/YARN-10199 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Andras Gyori Assignee: Andras Gyori The UserGroupMappingPlacementRule#getPlacementForUser method, which is mainly responsible for queue naming, contains deeply nested branches. In order to provide an extendable mapping logic, the branches could be flattened and simplified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value
[ https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060322#comment-17060322 ] Hadoop QA commented on YARN-10197: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 59s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10197 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12996826/YARN-10197-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 990f803519a3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ea68863 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/25697/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25697/testReport/ | | Max. process+thread count | 865 (vs. ulimit of
[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value
[ https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10197: Attachment: YARN-10197-003.patch > FS-CS converter: fix emitted ordering policy string and max-am-resource > percent value > - > > Key: YARN-10197 > URL: https://issues.apache.org/jira/browse/YARN-10197 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10197-001.patch, YARN-10197-002.patch, > YARN-10197-003.patch > > > There are three problems that have to be addressed in the converter: > 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value > should be "fifo", not "FIFO". The reason we generate "FIFO" is because > {{FifoPolicy.NAME}} consists of uppercase letters. > 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in > FS, you can globally disable it with > {{-1.0}}. However this case > is not handled properly in > {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max > AM check is disabled, therefore we have to generate the value "1.0" to allow > as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should > also check if the current value differs from the global setting and only then > output "1.0" for a given queue. > 3) The multi-leaf queue check is no longer necessary and it doesn't work > anyway. The CS instance catches this kind of error during the verification > step. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-9879: -- Target Version/s: 3.3.0 > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, > YARN-9879.POC013.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10005) Code improvements in MutableCSConfigurationProvider
[ https://issues.apache.org/jira/browse/YARN-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T reassigned YARN-10005: Assignee: Bilwa S T > Code improvements in MutableCSConfigurationProvider > --- > > Key: YARN-10005 > URL: https://issues.apache.org/jira/browse/YARN-10005 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Bilwa S T >Priority: Minor > > * Important: constructKeyValueConfUpdate and all related methods seems a > separate responsibility: how to convert incoming SchedConfUpdateInfo to > Configuration changes (Configuration object) > * Duplicated code block (9 lines) in init / formatConfigurationInStore methods > * Method "getConfStore" could be package-private -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9995) Code cleanup in TestSchedConfCLI
[ https://issues.apache.org/jira/browse/YARN-9995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060040#comment-17060040 ] Szilard Nemeth commented on YARN-9995: -- Hi [~BilwaST], Yes, you are right. Rephrased the description message. Feel free to come up with any way that improves the code quality regarding SchedConfUpdateInfo. For example, I don't like that we have pieces of code like this: {code:java} schedUpdateInfo = new SchedConfUpdateInfo(); cli.addQueues("root.b:b1=bVal1;root.c:c1=cVal1", schedUpdateInfo); assertEquals(2, schedUpdateInfo.getAddQueueInfo().size()); QueueConfigInfo bAddInfo = schedUpdateInfo.getAddQueueInfo().get(0); assertEquals("root.b", bAddInfo.getQueue()); Map bParams = bAddInfo.getParams(); assertEquals(1, bParams.size()); assertEquals("bVal1", bParams.get("b1")); QueueConfigInfo cAddInfo = schedUpdateInfo.getAddQueueInfo().get(1); assertEquals("root.c", cAddInfo.getQueue()); Map cParams = cAddInfo.getParams(); assertEquals(1, cParams.size()); assertEquals("cVal1", cParams.get("c1")); {code} There are many occurrences of schedUpdateInfo.getAddQueueInfo, for example. My intention was: It's better to hide these calls behind helper methods so when something changes with SchedConfUpdateInfo, tests should not be rewritten completely. Moreover, you can also hide these calls + do an assertion in one helper method. Does this make sense? > Code cleanup in TestSchedConfCLI > > > Key: YARN-9995 > URL: https://issues.apache.org/jira/browse/YARN-9995 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Bilwa S T >Priority: Minor > > Some tests are too verbose: > - add / delete / remove queues testcases: Creating SchedConfUpdateInfo > instances could be simplified with a helper method or something like that. > - Some fields can be converted to local variables: sysOutStream, sysOut, > sysErr, csConf > - Any additional cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9995) Code cleanup in TestSchedConfCLI
[ https://issues.apache.org/jira/browse/YARN-9995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9995: - Description: Some tests are too verbose: - add / delete / remove queues testcases: Creating SchedConfUpdateInfo instances could be simplified with a helper method or something like that. - Some fields can be converted to local variables: sysOutStream, sysOut, sysErr, csConf - Any additional cleanup was: Some tests are too verbose: - add / delete / remove queues testcases: A Builder for SchedConfUpdateInfo could be created as this object is frequently created. - Some fields can be converted to local variables: sysOutStream, sysOut, sysErr, csConf - Any additional cleanup > Code cleanup in TestSchedConfCLI > > > Key: YARN-9995 > URL: https://issues.apache.org/jira/browse/YARN-9995 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Bilwa S T >Priority: Minor > > Some tests are too verbose: > - add / delete / remove queues testcases: Creating SchedConfUpdateInfo > instances could be simplified with a helper method or something like that. > - Some fields can be converted to local variables: sysOutStream, sysOut, > sysErr, csConf > - Any additional cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9995) Code cleanup in TestSchedConfCLI
[ https://issues.apache.org/jira/browse/YARN-9995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060026#comment-17060026 ] Bilwa S T commented on YARN-9995: - Hi [~snemeth] I have a small doubt. Why do we need a Builder for SchedConfUpdateInfo?? I can see that there are no set operations done for SchedConfUpdateInfo object. All are get operations. Builder is needed only when many values need to be set to create an object right?? > Code cleanup in TestSchedConfCLI > > > Key: YARN-9995 > URL: https://issues.apache.org/jira/browse/YARN-9995 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Bilwa S T >Priority: Minor > > Some tests are too verbose: > - add / delete / remove queues testcases: A Builder for SchedConfUpdateInfo > could be created as this object is frequently created. > - Some fields can be converted to local variables: sysOutStream, sysOut, > sysErr, csConf > - Any additional cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9998) Code cleanup in LeveldbConfigurationStore
[ https://issues.apache.org/jira/browse/YARN-9998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060023#comment-17060023 ] Adam Antal commented on YARN-9998: -- Checked the patch, thanks [~bteke], +1 (non-binding). > Code cleanup in LeveldbConfigurationStore > - > > Key: YARN-9998 > URL: https://issues.apache.org/jira/browse/YARN-9998 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > Attachments: YARN-9998.001.patch, YARN-9998.002.patch > > > Many things can be improved: > * Field compactionTimer could be a local variable > * Field versiondb should be camelcase > * initDatabase is a very long method: Initialize db / versionDb should be in > separate methods, split this method into smaller chunks > * Remove TODOs > * Remove duplicated code block in > LeveldbConfigurationStore.CompactionTimerTask > * Any other cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9999) TestFSSchedulerConfigurationStore: Extend from ConfigurationStoreBaseTest, general code cleanup
[ https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060021#comment-17060021 ] Adam Antal commented on YARN-: -- +1 (non-binding) > TestFSSchedulerConfigurationStore: Extend from ConfigurationStoreBaseTest, > general code cleanup > --- > > Key: YARN- > URL: https://issues.apache.org/jira/browse/YARN- > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > Attachments: YARN-.001.patch, YARN-.002.patch, > YARN-.003.patch > > > All config store tests are extended from ConfigurationStoreBaseTest: > * TestInMemoryConfigurationStore > * TestLeveldbConfigurationStore > * TestZKConfigurationStore > TestFSSchedulerConfigurationStore should also extend from it. > Additionally, some general code cleanup can be applied as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org