[jira] [Commented] (YARN-10067) Add dry-run feature to FS-CS converter tool
[ https://issues.apache.org/jira/browse/YARN-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012538#comment-17012538 ] Szilard Nemeth commented on YARN-10067: --- Sorry [~pbacsko], the patch I uploaded did not compile, messed up something while generating the patch. Looking into this now. > Add dry-run feature to FS-CS converter tool > --- > > Key: YARN-10067 > URL: https://issues.apache.org/jira/browse/YARN-10067 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10067-001.patch, YARN-10067-002.patch, > YARN-10067-003.patch, YARN-10067-004.patch, YARN-10067-005.patch > > > Add a "d" / "-dry-run" switch to the tool. The purpose of this would be to > inform the user whether a conversion is possible and if it is, are there any > warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7007) NPE in RM while using YarnClient.getApplications()
[ https://issues.apache.org/jira/browse/YARN-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012529#comment-17012529 ] Yang Wang commented on YARN-7007: - [~cheersyang] [~Tao Yang] We come across the same problem in FLINK-15534 and i think many users are using Flink with bundled hadoop-2.8.x. It will be very good if we could backport this fix to 2.8 and release in 2.8.6. Could you help with this? > NPE in RM while using YarnClient.getApplications() > -- > > Key: YARN-7007 > URL: https://issues.apache.org/jira/browse/YARN-7007 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Lingfeng Su >Assignee: Lingfeng Su >Priority: Major > Labels: patch > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-7007.001.patch > > > {code:java} > java.lang.NullPointerException: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getApplicationResourceUsageReport(RMAppAttemptImpl.java:857) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:629) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.verifyAndCreateAppReport(ClientRMService.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:898) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:734) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplications(ApplicationClientProtocolPBServiceImpl.java:239) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:441) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2198) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2196) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:254) > at sun.reflect.GeneratedMethodAccessor731.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy161.getApplications(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:479) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:456) > {code} > When I use YarnClient.getApplications() to get all applications of RM, > Occasionally, it throw a NPE problem. > {code:java} > RMAppAttempt currentAttempt = rmContext.getRMApps() >.get(attemptId.getApplicationId()).getCurrentAppAttempt(); > {code} > if the application id is not in ConcurrentMap > getRMApps(), it may throw NPE problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10071) Sync Mockito version with other modules
[ https://issues.apache.org/jira/browse/YARN-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012378#comment-17012378 ] Akira Ajisaka commented on YARN-10071: -- LGTM, +1. Thanks [~adam.antal] for the cleanup. > Sync Mockito version with other modules > --- > > Key: YARN-10071 > URL: https://issues.apache.org/jira/browse/YARN-10071 > Project: Hadoop YARN > Issue Type: Sub-task > Components: build, test >Reporter: Akira Ajisaka >Assignee: Adam Antal >Priority: Major > Attachments: YARN-10071.001.patch > > > YARN-8551 introduced Mockito 1.x dependency, update. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag
[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012300#comment-17012300 ] Eric Yang commented on YARN-9292: - [~ebadger] Thanks for the review. Here are my feedback: {quote}Doesn't the container know what image it was started with in its environment?{quote} ServiceScheduler runs as part of application master for YARN service. YARN AM is not containerized. Docker command to resolve image digest id happens prior to any docker container is launched. The lookup for the docker image is from the node which AM is running. We use the sha256 digest from AM node as authoritative signature to give the application equal chance of acquiring docker digest id on any node manager. {quote}If we don't care about the container and just want to know what the sha of the image:tag is, then I agree with Chandni Singh that we don't need to use the containerId.{quote} Container ID is used by container executor to properly permission the working directory, generate .cmd file for container-executor binary, and all output and exit code are stored to the container id directory. Without container ID, we will need to craft a complete separate path to acquire privileges to launch docker commands, which is extra code duplication and not follow the security practice that was baked in place to prevent parameter hijacking. I choose to follow the existing process to avoid code bloat. {quote}But if there are many, couldn't that not be the correct one?{quote} The output given from docker image [image-id] -f "{{.RepoDigests}}" may contain similar names, like local/centos and centos at the same time due to fuzzy matching. For loop matches the exact name instead of prefix matching. Hence, it is always the correct one that is matched. {quote}I think we should import this instead of including the full path{quote} Sorry, can't do. There is another import that reference to org.apache.hadoop.yarn.service.component.Component, which prevent use of the same name. {quote}Spacing issues on the operators.{quote} Checkstyle did not find spacing issue with the existing patch, and the issue is not clear to me. Care to elaborate? {quote}The first part of both of these regexes is identical. I think we should create a subregex and append to it to avoid having to make changes in multiple places in the future. One if the image followed by a tag and the other is an image followed by a sha. Should be easy to do.{quote} Sure, I will compact this to rebase this patch to trunk. {quote}The else clause syntax doesn't seem to work for me. Did I do something wrong?{quote} Yes, unlike C exec, when running docker command on cli, it needs to be quoted to prevent shell expansion: {code}docker images --format="{{json .}}" --filter="dangling=flase"{code}. For clarity, we are using: {code}docker image [image-id] -f "{{.RepoDigests}}"{code} to find the real digest hash due to bugs with docker images output. {quote}Another possible solution is to have the AM get the sha256 hash of the image that it is running in and then passing that sha to all of the containers that it starts. This would move the query into the Hadoop cluster itself.{quote} I think the patch is implementing what you are suggesting that Hadoop query into the cluster itself via a node manager rest endpoint. > Implement logic to keep docker image consistent in application that uses > :latest tag > > > Key: YARN-9292 > URL: https://issues.apache.org/jira/browse/YARN-9292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9292.001.patch, YARN-9292.002.patch, > YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, > YARN-9292.006.patch > > > Docker image with latest tag can run in YARN cluster without any validation > in node managers. If a image with latest tag is changed during containers > launch. It might produce inconsistent results between nodes. This is surfaced > toward end of development for YARN-9184 to keep docker image consistent > within a job. One of the ideas to keep :latest tag consistent for a job, is > to use docker image command to figure out the image id and use image id to > propagate to rest of the container requests. There are some challenges to > overcome: > # The latest tag does not exist on the node where first container starts. > The first container will need to download the latest image, and find image > ID. This can introduce lag time for other containers to start. > # If image id is used to start other container, container-executor may have > problems to check if the image is coming from a trusted source. Both image > name and ID must be
[jira] [Commented] (YARN-9511) TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012284#comment-17012284 ] Siyao Meng commented on YARN-9511: -- Thanks for looking into this [~aajisaka]. I just removed the [JDK11] header. > TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The > remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL: https://issues.apache.org/jira/browse/YARN-9511 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Siyao Meng >Assignee: Szilard Nemeth >Priority: Major > > Found in maven JDK 11 unit test run. Compiled on JDK 8. > {code} > [ERROR] > testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) > Time elapsed: 0.551 s <<< > ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote > jarfile should not be writable by group or others. The current Permission is > 436 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9511) TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated YARN-9511: - Summary: TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436 (was: [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436) > TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The > remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL: https://issues.apache.org/jira/browse/YARN-9511 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Siyao Meng >Assignee: Szilard Nemeth >Priority: Major > > Found in maven JDK 11 unit test run. Compiled on JDK 8. > {code} > [ERROR] > testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) > Time elapsed: 0.551 s <<< > ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote > jarfile should not be writable by group or others. The current Permission is > 436 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012282#comment-17012282 ] Wangda Tan commented on YARN-9879: -- Thanks [~shuzirra], I think adding a flag (suggestion from [~adam.antal]) will prevent admin to change it accidentally, but it is hard to be understand (thinking about a regular Hadoop user). And we need to maintain it in a long run. So instead, I would like to allow user to make changes but fail the application submission with a clear message (like you cannot submit the application because there're multiple queue with the name XYZ, you can make change to use the full qualified queue name or remove/rename duplicated queues, etc.). If admin want to regret and make changes back, they can easily do that. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10067) Add dry-run feature to FS-CS converter tool
[ https://issues.apache.org/jira/browse/YARN-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012197#comment-17012197 ] Hadoop QA commented on YARN-10067: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 48s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 43s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 43s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 43s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 53s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10067 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990451/YARN-10067-005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7e10af8f6159 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 93233a7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/25361/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/25361/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | javac |
[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag
[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012185#comment-17012185 ] Eric Badger commented on YARN-9292: --- Hey [~eyang], thanks for the patch! It looks like this patch only applies to native services and that any client that wants to solve this issue will have to solve it themselves. I don't think we can get around this issue unless we want the RM to do the image sha256 hash query. And that sounds like a bad idea. But I think it makes sense to do this for native services at least. {noformat} + + @GET + @Path("/container/{id}/docker/images/{name}") + @Produces({ MediaType.APPLICATION_JSON + "; " + JettyUtils.UTF_8, + MediaType.APPLICATION_XML + "; " + JettyUtils.UTF_8 }) + public String getImageId(@PathParam("id") String id, + @PathParam("name") String name) { +DockerImagesCommand dockerImagesCommand = new DockerImagesCommand(); +dockerImagesCommand = dockerImagesCommand.getSingleImageStatus(name); +PrivilegedOperationExecutor privOpExecutor = +PrivilegedOperationExecutor.getInstance(this.nmContext.getConf()); +try { + String output = DockerCommandExecutor.executeDockerCommand( + dockerImagesCommand, id, null, privOpExecutor, false, nmContext); + String[] ids = output.substring(1, output.length()-1).split(" "); + String result = name; + for (String image : ids) { +String[] parts = image.split("@"); +if (parts[0].equals(name.substring(0, parts[0].length( { + result = image; +} + } + return result; +} catch (ContainerExecutionException e) { + return "latest"; +} + } } {noformat} Doesn't the container know what image it was started with in its environment? Why do we need to run a docker command here? If we don't care about the container and just want to know what the sha of the image:tag is, then I agree with [~csingh] that we don't need to use the containerId. And if we do need to run a docker command, the for loop will give us the last sha256 associated with that image name. But if there are many, couldn't that not be the correct one? {noformat} +Collection {noformat} I think we should import this instead of including the full path {noformat} + if (compSpec.getArtifact()!=null && compSpec.getArtifact() + .getType()==TypeEnum.DOCKER) { {noformat} Spacing issues on the operators. {noformat} + public static final String DOCKER_IMAGE_REGEX = "^(([a-zA-Z0-9.-]+)(:\\d+)?/)?([a-z0-9_./-]+)(:[\\w.-]+)?$"; + private static final String DOCKER_IMAGE_DIGEST_REGEX = + "^(([a-zA-Z0-9.-]+)(:\\d+)?/)?([a-z0-9_./-]+)(@sha256:)([a-f0-9]{6,64})"; {noformat} The first part of both of these regexes is identical. I think we should create a subregex and append to it to avoid having to make changes in multiple places in the future. One if the image followed by a tag and the other is an image followed by a sha. Should be easy to do. {noformat} @@ -1771,11 +1779,29 @@ int get_docker_images_command(const char *command_file, const struct configurati if (ret != 0) { goto free_and_exit; } +ret = add_to_args(args, "-f"); +if (ret != 0) { + goto free_and_exit; +} +ret = add_to_args(args, "{{.RepoDigests}}"); +if (ret != 0) { + goto free_and_exit; +} + } else { +ret = add_to_args(args, DOCKER_IMAGES_COMMAND); +if (ret != 0) { + goto free_and_exit; +} +ret = add_to_args(args, "--format={{json .}}"); +if (ret != 0) { + goto free_and_exit; +} +ret = add_to_args(args, "--filter=dangling=false"); +if (ret != 0) { + goto free_and_exit; +} {noformat} {noformat} [ebadger@foo ~]$ sudo docker images --format={{json .}} --filter=dangling=false Template parsing error: template: :1: unclosed action [ebadger@foo ~]$ docker --version Docker version 1.13.1, build 4ef4b30/1.13.1 {noformat} The else clause syntax doesn't seem to work for me. Did I do something wrong? This patch assumes that the client can access the Docker Registry. I'm not super familiar with native services, but I imagine this client runs on a gateway node somewhere outside of the cluster itself. With that, I imagine it is possible that the cluster itself can access the Docker Registry while the client can't. Or the Registry could require credentials to access it. Should we make this feature optional to get around those error cases? Another possible solution is to have the AM get the sha256 hash of the image that it is running in and then passing that sha to all of the containers that it starts. This would move the query into the Hadoop cluster itself. > Implement logic to keep docker image consistent in application that uses > :latest tag > > > Key: YARN-9292 >
[jira] [Commented] (YARN-10067) Add dry-run feature to FS-CS converter tool
[ https://issues.apache.org/jira/browse/YARN-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012165#comment-17012165 ] Szilard Nemeth commented on YARN-10067: --- Hi [~pbacsko]! Thanks for fixing the concerns I raised. Patch looks good now. Regarding what I've mentioned in my previous comment with bullet point 5, I uploaded a patch based on your latest patch, so you can check what I was meant. You can probably adjust the namings here and there but I think you will get the basic idea: No scattered dryRun checks everywhere in the code, all of them are handled in a single point, ConversionOptions. Also, if we ever want to have more conversion options like dry run, we can add it more easily. Please share your thoughts. Thanks. > Add dry-run feature to FS-CS converter tool > --- > > Key: YARN-10067 > URL: https://issues.apache.org/jira/browse/YARN-10067 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10067-001.patch, YARN-10067-002.patch, > YARN-10067-003.patch, YARN-10067-004.patch, YARN-10067-005.patch > > > Add a "d" / "-dry-run" switch to the tool. The purpose of this would be to > inform the user whether a conversion is possible and if it is, are there any > warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10067) Add dry-run feature to FS-CS converter tool
[ https://issues.apache.org/jira/browse/YARN-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10067: -- Attachment: YARN-10067-005.patch > Add dry-run feature to FS-CS converter tool > --- > > Key: YARN-10067 > URL: https://issues.apache.org/jira/browse/YARN-10067 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10067-001.patch, YARN-10067-002.patch, > YARN-10067-003.patch, YARN-10067-004.patch, YARN-10067-005.patch > > > Add a "d" / "-dry-run" switch to the tool. The purpose of this would be to > inform the user whether a conversion is possible and if it is, are there any > warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10028) Integrate the new abstract log servlet to the JobHistory server
[ https://issues.apache.org/jira/browse/YARN-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012142#comment-17012142 ] Hadoop QA commented on YARN-10028: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 5s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 50s{color} | {color:orange} root: The patch generated 9 new + 118 unchanged - 1 fixed = 127 total (was 119) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 49s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 57s{color} | {color:red} hadoop-mapreduce-client-hs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobConf | | | hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesAttempts | | | hadoop.mapreduce.v2.hs.webapp.TestHsWebServices | | | hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery | | | hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs | | | hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesTasks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10028 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990441/YARN-10028.002.patch | |
[jira] [Commented] (YARN-10070) NPE if no rule is defined and application-tag-based-placement is enabled
[ https://issues.apache.org/jira/browse/YARN-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012122#comment-17012122 ] Prabhu Joseph commented on YARN-10070: -- [~kmarton] There are two issues 1. RMAppManager#getUserNameForPlacement() checks the queue mapping for the proxy user and then validates if the end user has SUBMIT_APPLICATIONS access to the mapped queue of proxy user. If yes, returns the end user for placement. {code:java} String userNameFromAppTag = getUserNameFromApplicationTag(applicationTags); if (userNameFromAppTag != null) { LOG.debug("Found 'userid' '{}' in application tag", userNameFromAppTag); UserGroupInformation callerUGI = UserGroupInformation .createRemoteUser(userNameFromAppTag); // check if the actual user has rights to submit application to the // user's queue from the application tag String queue = placementManager .placeApplication(context, usernameUsedForPlacement).getQueue(); if (callerUGI != null && scheduler .checkAccess(callerUGI, QueueACL.SUBMIT_APPLICATIONS, queue)) { usernameUsedForPlacement = userNameFromAppTag; } else { LOG.warn("User '{}' from application tag does not have access to " + " queue '{}'. " + "The placement is done for user '{}'", userNameFromAppTag, queue, user); } } else { LOG.warn("'userid' was not found in application tags"); } {code} For Example: yarn.scheduler.capacity.queue-mappings=u:hive:default,u:ambari-qa:tezq Assume hive is proxy user and ambari-qa is end user. The above logic gets queue mapping of proxy user hive which is default queue, then validates if end user ambari-qa has access to default queue, if so returns the user ambari-qa for placement. The placement for ambari-qa maps to tezq queue. This logic of checking access on default queue and then placing to tezq does not look correct 2. It expects to configure queue mapping for both proxy and end user. If not configured for proxy user, NPE happens. {code:java} String queue = placementManager .placeApplication(context, usernameUsedForPlacement).getQueue(); {code} > NPE if no rule is defined and application-tag-based-placement is enabled > > > Key: YARN-10070 > URL: https://issues.apache.org/jira/browse/YARN-10070 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > > If there is no rule defined for a user NPE is thrown by the following line. > {code:java} > String queue = placementManager > .placeApplication(context, usernameUsedForPlacement).getQueue();{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder
[ https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012080#comment-17012080 ] Eric Yang commented on YARN-9052: - Code clean up and performance optimization usually go hand in hand to ensure the net gain is positive. Some basic level of comprehension presents during the execution of the code. If code is hard to understand by human, it will run poorly by machine as well. Although machines can juggle with a much larger set of variables, poorly understood code can result in bugs. [~snemeth] had been doing code rewrite for Hadoop for many years. There is a few hiccups, but I think there are positive net gains with his help, however so slightly that it may seem. It will get people on the edge prior to release time because some code are baked well during the cycle of development. It would be helpful to show some performance number of the net result to boost the confidence for rest of the community. In this case I think Sunil's pain point for submitApp covers for this issue. Other issues should be discussed separately. It is near 3.3.0 release, unless we have good solid data points for performance gain, I would suggest to slow down on the code rewrite for now. > Replace all MockRM submit method definitions with a builder > --- > > Key: YARN-9052 > URL: https://issues.apache.org/jira/browse/YARN-9052 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Fix For: 3.3.0 > > Attachments: > YARN-9052-004withlogs-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs003-justfailed.txt, > YARN-9052-testlogs003-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs004-justfailed.txt, YARN-9052.001.patch, > YARN-9052.002.patch, YARN-9052.003.patch, YARN-9052.004.patch, > YARN-9052.004.withlogs.patch, YARN-9052.005.patch, YARN-9052.006.patch, > YARN-9052.007.patch, YARN-9052.008.patch, YARN-9052.009.patch, > YARN-9052.009.patch, YARN-9052.testlogs.002.patch, > YARN-9052.testlogs.002.patch, YARN-9052.testlogs.003.patch, > YARN-9052.testlogs.patch > > > MockRM has 31 definitions of submitApp, most of them having more than > acceptable number of parameters, ranging from 2 to even 22 parameters, which > makes the code completely unreadable. > On top of unreadability, it's very hard to follow what RmApp will be produced > for tests as they often pass a lot of empty / null values as parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012075#comment-17012075 ] Weiwei Yang commented on YARN-9567: --- > Not yet, I think it's not a strong requirement which only used for >debugging, we can rarely got a long table about that, and even if we have, it >may have a minor impact for the UI, right? It may cause big usability issues when there are lots of requests. Can we add this support? [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13234295] > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, > image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, > no_diagnostic_at_first.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012075#comment-17012075 ] Weiwei Yang edited comment on YARN-9567 at 1/9/20 5:50 PM: --- > Not yet, I think it's not a strong requirement which only used for >debugging, we can rarely got a long table about that, and even if we have, it >may have a minor impact for the UI, right? It may cause big usability issues when there are lots of requests. Can we add this support? was (Author: cheersyang): > Not yet, I think it's not a strong requirement which only used for >debugging, we can rarely got a long table about that, and even if we have, it >may have a minor impact for the UI, right? It may cause big usability issues when there are lots of requests. Can we add this support? [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13234295] > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, > image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, > no_diagnostic_at_first.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012069#comment-17012069 ] Weiwei Yang commented on YARN-9538: --- Hi [~Tao Yang] Thanks for the updates. Could you please also check the failures in Jenkins # trailing spaces # hadoop-yarn-site in the patch failed > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch, YARN-9538.002.patch, > YARN-9538.003.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9018) Add functionality to AuxiliaryLocalPathHandler to return all locations to read for a given path
[ https://issues.apache.org/jira/browse/YARN-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012067#comment-17012067 ] Hudson commented on YARN-9018: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17839 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17839/]) YARN-9018. Add functionality to AuxiliaryLocalPathHandler to return all (ericp: rev 93233a7d6e4d6b8098622a1aa830355cc18d9589) * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/AuxiliaryLocalPathHandler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java > Add functionality to AuxiliaryLocalPathHandler to return all locations to > read for a given path > --- > > Key: YARN-9018 > URL: https://issues.apache.org/jira/browse/YARN-9018 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.3, 2.8.5 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: YARN-9018.001.patch > > > Analogous to LocalDirAllocator#getAllLocalPathsToRead, this will allow aux > services(and other components) to use this function that they rely on when > using the former class objects. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10034) Allocation tags are not removed when node decommission
[ https://issues.apache.org/jira/browse/YARN-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012051#comment-17012051 ] Adam Antal commented on YARN-10034: --- [~kyungwan nam], thanks for creating this patch and that you look into it. +1 (non-binding) > Allocation tags are not removed when node decommission > -- > > Key: YARN-10034 > URL: https://issues.apache.org/jira/browse/YARN-10034 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-10034.001.patch, YARN-10034.002.patch, > YARN-10034.003.patch > > > When a node is decommissioned, allocation tags that are attached to the node > are not removed. > I could see that allocation tags are revived when recommissioning the node. > RM removes allocation tags only if NM confirms the container releases by > YARN-8511. but, decommissioned NM does not connect to RM anymore. > Once a node is decommissioned, allocation tags that attached to the node > should be removed immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10071) Sync Mockito version with other modules
[ https://issues.apache.org/jira/browse/YARN-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012046#comment-17012046 ] Adam Antal commented on YARN-10071: --- Test is not needed since it's a pom.xml / version harmonization. Could you spare some time to review this [~aajisaka]? > Sync Mockito version with other modules > --- > > Key: YARN-10071 > URL: https://issues.apache.org/jira/browse/YARN-10071 > Project: Hadoop YARN > Issue Type: Sub-task > Components: build, test >Reporter: Akira Ajisaka >Assignee: Adam Antal >Priority: Major > Attachments: YARN-10071.001.patch > > > YARN-8551 introduced Mockito 1.x dependency, update. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10028) Integrate the new abstract log servlet to the JobHistory server
[ https://issues.apache.org/jira/browse/YARN-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012044#comment-17012044 ] Adam Antal commented on YARN-10028: --- Notes for the patch: the goal was to integrate a {{LogServlet}} instance into {{HsWebApp}}. A {{LogServlet}} needs an {{AppInfoProvider}} implementation that can use to ask for application related information. An already used implementation is the {{WebServices}}, that was plugged into {{HsWebServices}}. The major flaw is that it still needs an {{ApplicationClientProtocol}} - so a protocol instance to communicate with the {{ResourceManager}}. To provide this dependency, a new {{Provider}} is introduced according to the Guice's dependency injector framework in {{WebApps}}'s {{Builder}} class. Now it can handle {{ApplicationClientProtocol}} as parameter. That parameter is injected in the constructor of {{HsWebServices}} and in that way {{HsWebApp}} is integrated with the {{LogServlet}}. > Integrate the new abstract log servlet to the JobHistory server > --- > > Key: YARN-10028 > URL: https://issues.apache.org/jira/browse/YARN-10028 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-10028.001.patch, YARN-10028.002.patch > > > Currently JHS has already incorporates a log servlet, but it in incapable of > serving REST calls. We can integrate the new common log servlet to the JHS in > order to have a REST interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9014) runC container runtime
[ https://issues.apache.org/jira/browse/YARN-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012042#comment-17012042 ] Eric Badger commented on YARN-9014: --- [~brahmareddy], I have updated the release notes on this JIRA so that it will show up in the 3.3.0 release notes. > runC container runtime > -- > > Key: YARN-9014 > URL: https://issues.apache.org/jira/browse/YARN-9014 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jason Darrell Lowe >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: OciSquashfsRuntime.v001.pdf, > RuncContainerRuntime.v002.pdf > > > This JIRA tracks a YARN container runtime that supports running containers in > images built by Docker but the runtime does not use Docker directly, and > Docker does not have to be installed on the nodes. The runtime leverages the > [OCI runtime standard|https://github.com/opencontainers/runtime-spec] to > launch containers, so an OCI-compliant runtime like {{runc}} is required. > {{runc}} has the benefit of not requiring a daemon like {{dockerd}} to be > running in order to launch/control containers. > The layers comprising the Docker image are uploaded to HDFS as > [squashfs|http://tldp.org/HOWTO/SquashFS-HOWTO/whatis.html] images, enabling > the runtime to efficiently download and execute directly on the compressed > layers. This saves image unpack time and space on the local disk. The image > layers, like other entries in the YARN distributed cache, can be spread > across the YARN local disks, increasing the available space for storing > container images on each node. > A design document will be posted shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9014) runC container runtime
[ https://issues.apache.org/jira/browse/YARN-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9014: -- Release Note: The new RuncContainerRuntime is an experimental feature that allows Hadoop tasks to be run inside of runC containers. This feature is similar to the DockerLinuxContainerRuntime, but is catered more to Hadoop. Notably, the RuncContainerRuntime leverages squashFS for its image format and spreads the images across all disks accessible to Hadoop using the distributed cache. Additionally, unlike Docker, the RuncContainerRuntime does not rely on any daemons. > runC container runtime > -- > > Key: YARN-9014 > URL: https://issues.apache.org/jira/browse/YARN-9014 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jason Darrell Lowe >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: OciSquashfsRuntime.v001.pdf, > RuncContainerRuntime.v002.pdf > > > This JIRA tracks a YARN container runtime that supports running containers in > images built by Docker but the runtime does not use Docker directly, and > Docker does not have to be installed on the nodes. The runtime leverages the > [OCI runtime standard|https://github.com/opencontainers/runtime-spec] to > launch containers, so an OCI-compliant runtime like {{runc}} is required. > {{runc}} has the benefit of not requiring a daemon like {{dockerd}} to be > running in order to launch/control containers. > The layers comprising the Docker image are uploaded to HDFS as > [squashfs|http://tldp.org/HOWTO/SquashFS-HOWTO/whatis.html] images, enabling > the runtime to efficiently download and execute directly on the compressed > layers. This saves image unpack time and space on the local disk. The image > layers, like other entries in the YARN distributed cache, can be spread > across the YARN local disks, increasing the available space for storing > container images on each node. > A design document will be posted shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10028) Integrate the new abstract log servlet to the JobHistory server
[ https://issues.apache.org/jira/browse/YARN-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-10028: -- Attachment: YARN-10028.002.patch > Integrate the new abstract log servlet to the JobHistory server > --- > > Key: YARN-10028 > URL: https://issues.apache.org/jira/browse/YARN-10028 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-10028.001.patch, YARN-10028.002.patch > > > Currently JHS has already incorporates a log servlet, but it in incapable of > serving REST calls. We can integrate the new common log servlet to the JHS in > order to have a REST interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10080) Support show app id on localizer thread pool
[ https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012022#comment-17012022 ] Hadoop QA commented on YARN-10080: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 52 unchanged - 0 fixed = 53 total (was 52) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 55s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10080 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990369/YARN-10080-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ed1752b6750f 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a40dc9e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25359/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25359/testReport/ | | Max. process+thread count | 335 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Created] (YARN-10081) Exception message from ClientRMProxy#getRMAddress is misleading
Adam Antal created YARN-10081: - Summary: Exception message from ClientRMProxy#getRMAddress is misleading Key: YARN-10081 URL: https://issues.apache.org/jira/browse/YARN-10081 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.3.0 Reporter: Adam Antal In {{ClientRMProxy#getRMAddress}} in the else branch we have the following piece of code. {code:java} } else { String message = "Unsupported protocol found when creating the proxy " + "connection to ResourceManager: " + ((protocol != null) ? protocol.getClass().getName() : "null"); LOG.error(message); throw new IllegalStateException(message); } {code} This is wrong, because the protocol variable is of type "Class", so {{Class.getClass()}} will be always {{Object}}. It should be {{protocol.getName()}}. An example of the error message if {{RMProxy}} is misused, and this exception is thrown: {noformat} java.lang.IllegalStateException: Unsupported protocol found when creating the proxy connection to ResourceManager: java.lang.Class at org.apache.hadoop.yarn.client.ClientRMProxy.getRMAddress(ClientRMProxy.java:109) at org.apache.hadoop.yarn.client.RMProxy.newProxyInstance(RMProxy.java:133) ... {noformat} where obviously not a {{Object.class}} was provided to this function as protocol parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9767) PartitionQueueMetrics Issues
[ https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012004#comment-17012004 ] Manikandan R commented on YARN-9767: Sorry for the delay. Rebased the patch. It should work now. Also it requires YARN-6492.007.WIP.patch. Regarding {{LeafQueue#getheadroom}}, made changes for some reason, will need to recollect. I think to address -ve metrics values. However, Did a quick revisit and run junits without those changes expecting it to be run without failures. Only 1 assert is failing. assertEquals(2 * GB, queueAUserMetrics.getAvailableMB()); in TestNodeLabelContainerAllocation#testQueueMetricsWithLabelsOnDefaultLabelNode. Expecting it to be -1. Since userLimitResource value is derived based on the partition exclusivity property, hence allowing the user to go beyond the max capacity and finally leading to this situation. Will dig deeper and update. > PartitionQueueMetrics Issues > > > Key: YARN-9767 > URL: https://issues.apache.org/jira/browse/YARN-9767 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9767.001.patch > > > The intent of the Jira is to capture the issues/observations encountered as > part of YARN-6492 development separately for ease of tracking. > Observations: > Please refer this > https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 > 1. Since partition info are being extracted from request and node, there is a > problem. For example, > > Node N has been mapped to Label X (Non exclusive). Queue A has been > configured with ANY Node label. App A requested resources from Queue A and > its containers ran on Node N for some reasons. During > AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) > would get used for calculation. Lets say allocate call has been fired for 3 > containers of 1 GB each, then > a. PartitionDefault * queue A -> pending mb is 3 GB > b. PartitionX * queue A -> pending mb is -3 GB > > is the outcome. Because app request has been fired without any label > specification and #a metrics has been derived. After allocation is over, > pending resources usually gets decreased. When this happens, it use node > partition info. hence #b metrics has derived. > > Given this kind of situation, We will need to put some thoughts on achieving > the metrics correctly. > > 2. Though the intent of this jira is to do Partition Queue Metrics, we would > like to retain the existing Queue Metrics for backward compatibility (as you > can see from jira's discussion). > With this patch and YARN-9596 patch, queuemetrics (for queue's) would be > overridden either with some specific partition values or default partition > values. It could be vice - versa as well. For example, after the queues (say > queue A) has been initialised with some min and max cap and also with node > label's min and max cap, Queuemetrics (availableMB) for queue A return values > based on node label's cap config. > I've been working on these observations to provide a fix and attached > .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, > availableVcores is correct (Please refer above #2 observation). Added more > asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for > #2 is working properly. > Also one more thing to note is, user metrics for availableMB, availableVcores > at root queue was not there even before. Retained the same behaviour. User > metrics for availableMB, availableVcores is available only at child queue's > level and also with partitions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011993#comment-17011993 ] Gergely Pollak commented on YARN-9879: -- Thank you for your feedbacks [~wilfreds], [~pbacsko], [~leftnoteasy]! {quote}The important part is applying a new configuration. If the configuration adds a leaf queue that is not unique the configuration update currently is rejected. With this change we would allow that config to become active. This *could* break existing applications when they try to submit to the leaf queue that is no longer unique. {quote} I think this can be a major issue. In a scenario when the user have let's say a job which runs daily, and it's been working like that for years, can simply break, because an other user, totally unrelated to his team or even department creates a queue with the same name. The suddenly this application will start failing, and even if we add logs which explicitly tell what is the reason behind the application rejection, user one might not even notice it started failing. (We shouldn't assume everyone has proper monitoring and warning systems). So technically any user who can create a queue, can disable an other user's application if it is started by using single queue name reference. And it concerns me a bit. However [~adam.antal] had an idea to even fix this issue, and then we can move forwards with Wilfred's suggestion: We should make it possible to flag QUEUES for using full queue name reference, and these flags should mean ALL queues under the flagged queue can ONLY be referenced by full queue name. For regular queues (non-flagged ones), we would still enforce the unique leaf queue policy, while newer or migrating users could stick to the full queue reference. This proposal can also helps gradual migration, for older CS users buy slowly flagging queues in which the applications are already started by queue name. What do you think of this idea? > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011828#comment-17011828 ] Hadoop QA commented on YARN-9538: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 32m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 15s{color} | {color:red} hadoop-yarn-site in the patch failed. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 48m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-9538 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990420/YARN-9538.003.patch | | Optional Tests | dupname asflicense mvnsite | | uname | Linux 2b0f5547b22c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a40dc9e | | maven | version: Apache Maven 3.3.9 | | mvnsite | https://builds.apache.org/job/PreCommit-YARN-Build/25358/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-site.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/25358/artifact/out/whitespace-eol.txt | | Max. process+thread count | 307 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25358/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch, YARN-9538.002.patch, > YARN-9538.003.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5542) Scheduling of opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011829#comment-17011829 ] Konstantinos Karanasos commented on YARN-5542: -- Cool, thanks [~brahmareddy] and [~abmodi]. > Scheduling of opportunistic containers > -- > > Key: YARN-5542 > URL: https://issues.apache.org/jira/browse/YARN-5542 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos >Priority: Major > Fix For: 3.3.0 > > > This JIRA groups all efforts related to the scheduling of opportunistic > containers. > It includes the scheduling of opportunistic container through the central RM > (YARN-5220), through distributed scheduling (YARN-2877), as well as the > scheduling of containers based on actual node utilization (YARN-1011) and the > container promotion/demotion (YARN-5085). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2886) Estimating waiting time in NM container queues
[ https://issues.apache.org/jira/browse/YARN-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011800#comment-17011800 ] Sriram Rao commented on YARN-2886: -- > Estimating waiting time in NM container queues > -- > > Key: YARN-2886 > URL: https://issues.apache.org/jira/browse/YARN-2886 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos >Priority: Major > > This JIRA is about estimating the waiting time of each NM queue. > Having these estimates is crucial for the distributed scheduling of container > requests, as it allows the LocalRM to decide in which NMs to queue the > queuable container requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011789#comment-17011789 ] Tao Yang commented on YARN-9567: Thanks [~cheersyang] for the review. {quote} 1. since this is a CS only feature, pls make sure nothing breaks when FS is enabled {quote} Yes, it should show this table only when CS is enabled, will updated in next patch. {quote} 2. does the table support paging? {quote} Not yet, I think it's not a strong requirement which only used for debugging, we can rarely got a long table about that, and even if we have, it may have a minor impact for the UI, right? > Add diagnostics for outstanding resource requests on app attempts page > -- > > Key: YARN-9567 > URL: https://issues.apache.org/jira/browse/YARN-9567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9567.001.patch, YARN-9567.002.patch, > image-2019-06-04-17-29-29-368.png, image-2019-06-04-17-31-31-820.png, > image-2019-06-04-17-58-11-886.png, image-2019-06-14-11-21-41-066.png, > no_diagnostic_at_first.png, > show_diagnostics_after_requesting_app_activities_REST_API.png > > > Currently on app attempt page we can see outstanding resource requests, it > will be helpful for users to know why if we can join diagnostics of this app > with these. > Discussed with [~cheersyang], we can passively load diagnostics from cache of > completed app activities instead of actively triggering which may bring > uncontrollable risks. > For example: > (1) At first we can see no diagnostic in cache if app activities not > triggered below the outstanding requests. > !no_diagnostic_at_first.png|width=793,height=248! > (2) After requesting the application activities REST API, we can see > diagnostics now. > !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9538: --- Attachment: YARN-9538.003.patch > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch, YARN-9538.002.patch, > YARN-9538.003.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9538) Document scheduler/app activities and REST APIs
[ https://issues.apache.org/jira/browse/YARN-9538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011781#comment-17011781 ] Tao Yang commented on YARN-9538: Attached v3 patch in which most comments are addressed, updates need more discussion are as follows: CS 1. the table of content can be auto-generated by Doxia Macros via defining "MACRO\{toc|fromDepth=0|toDepth=3}", so there's nothing we can do for this. I have updated other modifications, please help to review them as well, thanks: // Activities Scheduling activities are activity messages used for debugging on some critical scheduling path, they can be recorded and exposed via RESTful API with minor impact on the scheduler performance. // Scheduler Activities Scheduler activities include useful scheduling info in a scheduling cycle, which illustrate how the scheduler allocates a container. Scheduler activities REST API (`http://rm-http-address:port/ws/v1/cluster/scheduler/activities`) provides a way to enable recording scheduler activities and fetch them from cache.To eliminate the performance impact, scheduler automatically disables recording activities at the end of a scheduling cycle, you can query the RESTful API again to get the latest scheduler activities. // Application Activities Application activities include useful scheduling info for a specified application, which illustrate how the requirements are satisfied or just skipped. Application activities REST API (`http://rm-http-address:port/ws/v1/cluster/scheduler/app-activities/\{appid}`) provides a way to enable recording application activities for a specified application within a few seconds or fetch historical application activities from cache, available actions which include "refresh" and "get" can be specified by the "actions" parameter: RM 1. +The scheduler activities API currently supports Capacity Scheduler and provides a way to get scheduler activities in a single scheduling process, it will trigger recording scheduler activities in next scheduling process and then take last required scheduler activities from cache as the response. The response have hierarchical structure with multiple levels and important scheduling details which are organized by the sequence of scheduling process: -> The scheduler activities Restful API {color:#FF}is available if you are using capacity scheduler and{color} can fetch scheduler activities info recorded in a scheduling cycle. The API returns a message that includes important scheduling activities info {color:#FF}which has a hierarchical layout with following fields:{color} 7. + Application activities include useful scheduling info for a specified application, the response have hierarchical structure with multiple levels: -> Application activities Restful API {color:#FF}is available if you are using capacity scheduler and can fetch useful scheduling info for a specified application{color}, the response has a hierarchical layout with following fields: 8. * *AppActivities* - AppActivities are root structure of application activities within basic information. -> is the root element? Yes, updated: AppActivities are root {color:#FF}element{color} ... 9. +* *Applications* - Allocations are allocation attempts at app level queried from the cache. -> shouldn't here be applications? Right, updated: +* {color:#FF}*Allocations*{color} - Allocations ... > Document scheduler/app activities and REST APIs > --- > > Key: YARN-9538 > URL: https://issues.apache.org/jira/browse/YARN-9538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9538.001.patch, YARN-9538.002.patch, > YARN-9538.003.patch > > > Add documentation for scheduler/app activities in CapacityScheduler.md and > ResourceManagerRest.md. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page
[ https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011540#comment-17011540 ] Hadoop QA commented on YARN-9567: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 29s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-9567 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973773/YARN-9567.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 98ab3aec6f3e 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8fe01db | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25357/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25357/testReport/ | | Max. process+thread count | 821