[jira] [Commented] (YARN-9554) TimelineEntity DAO has java.util.Set interface which JAXB can't handle
[ https://issues.apache.org/jira/browse/YARN-9554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458788#comment-17458788 ] László Bodor commented on YARN-9554: while I'm trying to upgrade tez to hadoop 3.3.1, a [unit test|https://github.com/apache/tez/blob/master/tez-plugins/tez-yarn-timeline-history-with-acls/src/test/java/org/apache/tez/dag/history/ats/acls/TestATSHistoryWithACLs.java] throws an exception which is introduced by patch: {code} SEVERE: Failed to generate the schema for the JAX-B elements javax.xml.bind.JAXBException: TimelineEntity and TimelineEntities has IllegalAnnotation at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.ContextFactory.createContext(ContextFactory.java) {code} I'm a bit confused, what's the expected way of making the unit test work again? the exception is thrown when test tries to fetch a timeline entity from AHS (1.0): {code} private K getTimelineData(String url, Class clazz) { Client client = new Client(); WebResource resource = client.resource(url); ClientResponse response = resource.accept(MediaType.APPLICATION_JSON) .get(ClientResponse.class); assertEquals(200, response.getStatus()); assertTrue(MediaType.APPLICATION_JSON_TYPE.isCompatible(response.getType())); K entity = response.getEntity(clazz); <--- fails at this point, clazz is TimelineEntity.class assertNotNull(entity); return entity; } {code} > TimelineEntity DAO has java.util.Set interface which JAXB can't handle > -- > > Key: YARN-9554 > URL: https://issues.apache.org/jira/browse/YARN-9554 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9554-001.patch, YARN-9554-002.patch > > > TimelineEntity DAO has java.util.Set interface which JAXB can't handle. This > breaks the fix of YARN-7266. > {code} > Caused by: com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException: > 1 counts of IllegalAnnotationExceptions > java.util.Set is an interface, and JAXB can't handle interfaces. > this problem is related to the following location: > at java.util.Set > at public java.util.HashMap > org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getPrimaryFiltersJAXB() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity > at public java.util.List > org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities() > at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities > at > com.sun.xml.internal.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:91) > at > com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:445) > at > com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:277) > at > com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:124) > at > com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1123) > at > com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:147) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10907) Minimize usages of AbstractCSQueue#csContext
[ https://issues.apache.org/jira/browse/YARN-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10907: -- Fix Version/s: 3.4.0 > Minimize usages of AbstractCSQueue#csContext > > > Key: YARN-10907 > URL: https://issues.apache.org/jira/browse/YARN-10907 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Context objects can be a sign of a code smell as they can contain many, > possible loosely related references to other objects. > CapacitySchedulerContext seems like this. > This task is to investigate how the field AbstractCSQueue#csContext is being > used from this class and possibly keeping the usage of this context class on > the bare minimum. > Related article: https://wiki.c2.com/?ContextObjectsAreEvil -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9450) TestCapacityOverTimePolicy#testAllocation fails sporadically
[ https://issues.apache.org/jira/browse/YARN-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9450: Assignee: Szilard Nemeth (was: Prabhu Joseph) > TestCapacityOverTimePolicy#testAllocation fails sporadically > > > Key: YARN-9450 > URL: https://issues.apache.org/jira/browse/YARN-9450 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, test >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Szilard Nemeth >Priority: Major > > TestCapacityOverTimePolicy#testAllocation fails sporadically. Observed in > multiple builds ran for - YARN-9447, YARN-8193, YARN-8051. > {code} > Failed > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration > 90,000,000, height 0.25, numSubmission 1, periodic 8640)] > Failing for the past 1 build (Since Failed#23900 ) > Took 34 ms. > Stacktrace > junit.framework.AssertionFailedError > at junit.framework.Assert.fail(Assert.java:55) > at junit.framework.Assert.fail(Assert.java:64) > at junit.framework.TestCase.fail(TestCase.java:235) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136) > at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Standard Output > 2019-04-05 23:46:19,022 INFO [main] recovery.RMStateStore > (RMStateStore.java:transition(591)) - Storing reservation > allocation.reservation_-4277767163553399219_8391370105871519867 > 2019-04-05 23:46:19,022 INFO [main] recovery.RMStateStore > (MemoryRMStateStore.java:storeReservationState(258)) - Storing > reservationallocation for > reservation_-4277767163553399219_8391370105871519867 for plan dedicated > 2019-04-05 23:46:19,023 INFO [main] reservation.InMemoryPlan > (InMemoryPlan.java:addReservation(373)) - Successfully added reservation: >
[jira] [Assigned] (YARN-7548) TestCapacityOverTimePolicy.testAllocation is flaky
[ https://issues.apache.org/jira/browse/YARN-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-7548: Assignee: Szilard Nemeth > TestCapacityOverTimePolicy.testAllocation is flaky > -- > > Key: YARN-7548 > URL: https://issues.apache.org/jira/browse/YARN-7548 > Project: Hadoop YARN > Issue Type: Bug > Components: reservation system >Affects Versions: 3.0.0-beta1 >Reporter: Haibo Chen >Assignee: Szilard Nemeth >Priority: Major > > It failed in both YARN-7337 and YARN-6921 jenkins jobs. > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration > 90,000,000, height 0.25, numSubmission 1, periodic 8640)] > *Stacktrace* > {code:java} > junit.framework.AssertionFailedError: null > at junit.framework.Assert.fail(Assert.java:55) > at junit.framework.Assert.fail(Assert.java:64) > at junit.framework.TestCase.fail(TestCase.java:235) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136){code} > *Standard Output* > {code:java} > 2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore > (RMStateStore.java:transition(538)) - Storing reservation > allocation.reservation_-9026698577416205920_6337917439559340517 > 2017-11-20 23:57:03,759 INFO [main] recovery.RMStateStore > (MemoryRMStateStore.java:storeReservationState(247)) - Storing > reservationallocation for > reservation_-9026698577416205920_6337917439559340517 for plan dedicated > 2017-11-20 23:57:03,760 INFO [main] reservation.InMemoryPlan > (InMemoryPlan.java:addReservation(373)) - Successfully added reservation: > reservation_-9026698577416205920_6337917439559340517 to plan. > In-memory Plan: Parent Queue: dedicatedTotal Capacity: vCores:1000>Step: 1000reservation_-9026698577416205920_6337917439559340517 > user:u1 startTime: 0 endTime: 8640 Periodiciy: 8640 alloc: > [Period: 8640 > 0: > 3423748: > 86223748: > 8640: > 9223372036854775807: null > ]{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11045) ATSv2 storage monitor fails to read from hbase cluster
[ https://issues.apache.org/jira/browse/YARN-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11045: -- Labels: pull-request-available (was: ) > ATSv2 storage monitor fails to read from hbase cluster > -- > > Key: YARN-11045 > URL: https://issues.apache.org/jira/browse/YARN-11045 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HBase compatible guava dependency is bit messed up i.e. timelineservice-hbase > modules are still being built with Hadoop's guava version (defined in > hadoop-project) and this creates issues with HBaseStorageMonitor reading > records from hbase cluster: > {code:java} > java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: > java.lang.NoSuchMethodError: > com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; > at > org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95) > at > org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77) > at > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > java.lang.NoSuchMethodError: > com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:260) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:233) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:394) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:368) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:143) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > ... 3 more > Caused by: java.lang.NoSuchMethodError: > com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; > at org.apache.hadoop.hbase.net.Address.getHostName(Address.java:72) > at > org.apache.hadoop.hbase.net.Address.toSocketAddress(Address.java:57) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:576) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37250) > at > org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:231) > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11045) ATSv2 storage monitor fails to read from hbase cluster
[ https://issues.apache.org/jira/browse/YARN-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated YARN-11045: Summary: ATSv2 storage monitor fails to read from hbase cluster (was: ATSv2 storage monitor fails to read from HBase) > ATSv2 storage monitor fails to read from hbase cluster > -- > > Key: YARN-11045 > URL: https://issues.apache.org/jira/browse/YARN-11045 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > > HBase compatible guava dependency is bit messed up i.e. timelineservice-hbase > modules are still being built with Hadoop's guava version (defined in > hadoop-project) and this creates issues with HBaseStorageMonitor reading > records from hbase cluster: > {code:java} > java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: > java.lang.NoSuchMethodError: > com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; > at > org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95) > at > org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77) > at > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > java.lang.NoSuchMethodError: > com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:260) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:233) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:394) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:368) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:143) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > ... 3 more > Caused by: java.lang.NoSuchMethodError: > com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; > at org.apache.hadoop.hbase.net.Address.getHostName(Address.java:72) > at > org.apache.hadoop.hbase.net.Address.toSocketAddress(Address.java:57) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:576) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37250) > at > org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:231) > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11045) ATSv2 storage monitor fails to read from HBase
Viraj Jasani created YARN-11045: --- Summary: ATSv2 storage monitor fails to read from HBase Key: YARN-11045 URL: https://issues.apache.org/jira/browse/YARN-11045 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.4.0 Reporter: Viraj Jasani Assignee: Viraj Jasani HBase compatible guava dependency is bit messed up i.e. timelineservice-hbase modules are still being built with Hadoop's guava version (defined in hadoop-project) and this creates issues with HBaseStorageMonitor reading records from hbase cluster: {code:java} java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoSuchMethodError: com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95) at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77) at org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoSuchMethodError: com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:260) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:233) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:394) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:368) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:143) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) ... 3 more Caused by: java.lang.NoSuchMethodError: com.google.common.net.HostAndPort.getHostText()Ljava/lang/String; at org.apache.hadoop.hbase.net.Address.getHostName(Address.java:72) at org.apache.hadoop.hbase.net.Address.toSocketAddress(Address.java:57) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:576) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37250) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:231) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts
[ https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11044: -- Labels: newbie pull-request-available (was: newbie) > TestApplicationLimits.testLimitsComputation() has some uneffective asserts > -- > > Key: YARN-11044 > URL: https://issues.apache.org/jira/browse/YARN-11044 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: newbie, pull-request-available > > TestApplicationLimits.testLimitsComputation() has the following two asserts: > {code:java} > // should default to global setting if per queue setting not set > assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT, > (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > and > {code:java} > assertEquals((long) 0.5, > (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > In the current form neither of them make too much sense because > getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 > and 1.0), so the only way this will fail is when the configuration is below 0 > or above 1, but we're not testing invalid configurations here. This should be > corrected. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929
[ https://issues.apache.org/jira/browse/YARN-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11043: -- Labels: pull-request-available (was: ) > Clean up checkstyle warnings from YARN-11024/10907/10929 > > > Key: YARN-11043 > URL: https://issues.apache.org/jira/browse/YARN-11043 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Attachments: checkstyle_warnings.txt > > Time Spent: 10m > Remaining Estimate: 0h > > YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of > each other. This jira is a followup to clean up the checkstyle warnings > present in the modified files. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts
[ https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke reassigned YARN-11044: Assignee: Benjamin Teke > TestApplicationLimits.testLimitsComputation() has some uneffective asserts > -- > > Key: YARN-11044 > URL: https://issues.apache.org/jira/browse/YARN-11044 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: newbie > > TestApplicationLimits.testLimitsComputation() has the following two asserts: > {code:java} > // should default to global setting if per queue setting not set > assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT, > (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > and > {code:java} > assertEquals((long) 0.5, > (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > In the current form neither of them make too much sense because > getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 > and 1.0), so the only way this will fail is when the configuration is below 0 > or above 1, but we're not testing invalid configurations here. This should be > corrected. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts
[ https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-11044: - Labels: newbie (was: ) > TestApplicationLimits.testLimitsComputation() has some uneffective asserts > -- > > Key: YARN-11044 > URL: https://issues.apache.org/jira/browse/YARN-11044 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Benjamin Teke >Priority: Major > Labels: newbie > > TestApplicationLimits.testLimitsComputation() has the following two asserts: > {code:java} > // should default to global setting if per queue setting not set > assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT, > (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > and > {code:java} > assertEquals((long) 0.5, > (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > In the current form neither of them make too much sense because > getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 > and 1.0), so the only way this will fail is when the configuration is below 0 > or above 1, but we're not testing invalid configurations here. This should be > corrected. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts
[ https://issues.apache.org/jira/browse/YARN-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-11044: - Description: TestApplicationLimits.testLimitsComputation() has the following two asserts: {code:java} // should default to global setting if per queue setting not set assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT, (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent( queue.getQueuePath())); {code} and {code:java} assertEquals((long) 0.5, (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent( queue.getQueuePath())); {code} In the current form neither of them make too much sense because getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 and 1.0), so the only way this will fail is when the configuration is below 0 or above 1, but we're not testing invalid configurations here. This should be corrected. was: TestApplicationLimits.testLimitsComputation() has the following two asserts: {code:java} // should default to global setting if per queue setting not set assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT, (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent( queue.getQueuePath())); {code} and {code:java} assertEquals((long) 0.5, (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent( queue.getQueuePath())); {code} In the current form neither of them make too much sense because getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 and 1.0), so the only way this will fail, if the configuration is below 0 or above 1, but we're not testing that here. This should be corrected. > TestApplicationLimits.testLimitsComputation() has some uneffective asserts > -- > > Key: YARN-11044 > URL: https://issues.apache.org/jira/browse/YARN-11044 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Benjamin Teke >Priority: Major > > TestApplicationLimits.testLimitsComputation() has the following two asserts: > {code:java} > // should default to global setting if per queue setting not set > assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT, > (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > and > {code:java} > assertEquals((long) 0.5, > (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent( > queue.getQueuePath())); > {code} > In the current form neither of them make too much sense because > getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 > and 1.0), so the only way this will fail is when the configuration is below 0 > or above 1, but we're not testing invalid configurations here. This should be > corrected. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11044) TestApplicationLimits.testLimitsComputation() has some uneffective asserts
Benjamin Teke created YARN-11044: Summary: TestApplicationLimits.testLimitsComputation() has some uneffective asserts Key: YARN-11044 URL: https://issues.apache.org/jira/browse/YARN-11044 Project: Hadoop YARN Issue Type: Bug Reporter: Benjamin Teke TestApplicationLimits.testLimitsComputation() has the following two asserts: {code:java} // should default to global setting if per queue setting not set assertEquals((long)CapacitySchedulerConfiguration.DEFAULT_MAXIMUM_APPLICATIONMASTERS_RESOURCE_PERCENT, (long)csConf.getMaximumApplicationMasterResourcePerQueuePercent( queue.getQueuePath())); {code} and {code:java} assertEquals((long) 0.5, (long) csConf.getMaximumApplicationMasterResourcePerQueuePercent( queue.getQueuePath())); {code} In the current form neither of them make too much sense because getMaximumApplicationMasterResourcePerQueuePercent returns a float (between 0 and 1.0), so the only way this will fail, if the configuration is below 0 or above 1, but we're not testing that here. This should be corrected. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11024) Create an AbstractLeafQueue to store the common LeafQueue + AutoCreatedLeafQueue functionality
[ https://issues.apache.org/jira/browse/YARN-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11024: -- Fix Version/s: 3.4.0 > Create an AbstractLeafQueue to store the common LeafQueue + > AutoCreatedLeafQueue functionality > -- > > Key: YARN-11024 > URL: https://issues.apache.org/jira/browse/YARN-11024 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > AbstractAutoCreatedLeafQueue extends the LeafQueue class which is an > instantiable class, so every time an AutoCreatedLeafQueue is created a normal > LeafQueue is configured as well. This setup results in some strange behaviour > like having to pass the template configs of an auto created queue to a leaf > queue. To make the whole structure more flexible an AbstractLeafQueue should > be created which stores the common methods. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929
[ https://issues.apache.org/jira/browse/YARN-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-11043: - Attachment: checkstyle_warnings.txt > Clean up checkstyle warnings from YARN-11024/10907/10929 > > > Key: YARN-11043 > URL: https://issues.apache.org/jira/browse/YARN-11043 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Attachments: checkstyle_warnings.txt > > > YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of > each other. This jira is a followup to clean up the checkstyle warnings > present in the modified files. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929
Benjamin Teke created YARN-11043: Summary: Clean up checkstyle warnings from YARN-11024/10907/10929 Key: YARN-11043 URL: https://issues.apache.org/jira/browse/YARN-11043 Project: Hadoop YARN Issue Type: Sub-task Reporter: Benjamin Teke Assignee: Benjamin Teke Attachments: checkstyle_warnings.txt YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of each other. This jira is a followup to clean up the checkstyle warnings present in the modified files. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10870: - Assignee: Gergely Pollák (was: Siddharth Ahuja) > Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM > Scheduler page > > > Key: YARN-10870 > URL: https://issues.apache.org/jira/browse/YARN-10870 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Siddharth Ahuja >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: YARN-10870.001.patch, YARN-10870.002.patch, > YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, > YARN-10870.branch-3.3.002.patch > > > Non-permissible users are (incorrectly) able to view application submitted by > another user on the RM's Scheduler UI (not Applications UI), where > _non-permissible users_ are non-application-owners and are not present in the > application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL > as a Queue admin to which this job was submitted to" (see [1] where both the > filter setting introduced by YARN-8319 & ACL checks are performed): > The issue can be reproduced easily by having the setting > {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml. > The above disallows non-permissible users from viewing another user's > applications in the Applications page, but not in the Scheduler's page. > The filter setting seems to be getting checked only on the getApps() call but > not while rendering the apps information on the Scheduler page. This seems to > be a "missed" feature from YARN-8319. > Following pre-requisites are needed to reproduce the issue: > * Kerberized cluster, > * SPNEGO enabled for HDFS & YARN, > * Add test users - systest and user1 on all nodes. > * Add kerberos princs for the above users. > * Create HDFS user dirs for above users and chown them appropriately. > * Run a sample MR Sleep job and test. > Steps to reproduce the issue: > * kinit as "systest" user and run a sample MR sleep job from one of the nodes > in the cluster: > {code} > yarn jar sleep -m 1 -mt > 360 > {code} > * kinit as "user1" from Mac as an example (this assumes you've copied the > /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for > Spengo auth). > * Open the Applications page. user1 cannot view the job being run by systest. > This is correct. > * Open the Scheduler page. user1 *CAN* view the job being run by systest. > This is *INCORRECT*. > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch
[ https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-8234: - Labels: pull-request-available (was: ) > Improve RM system metrics publisher's performance by pushing events to > timeline server in batch > --- > > Key: YARN-8234 > URL: https://issues.apache.org/jira/browse/YARN-8234 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, timelineserver >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Ashutosh Gupta >Priority: Critical > Labels: pull-request-available > Attachments: YARN-8234-branch-2.8.3.001.patch, > YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch, > YARN-8234-branch-2.8.3.004.patch, YARN-8234.001.patch, YARN-8234.002.patch, > YARN-8234.003.patch, YARN-8234.004.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When system metrics publisher is enabled, RM will push events to timeline > server via restful api. If the cluster load is heavy, many events are sent to > timeline server and the timeline server's event handler thread locked. > YARN-7266 talked about the detail of this problem. Because of the lock, > timeline server can't receive event as fast as it generated in RM and lots of > timeline event stays in RM's memory. Finally, those events will consume all > RM's memory and RM will start a full gc (which cause an JVM stop-world and > cause a timeout from rm to zookeeper) or even get an OOM. > The main problem here is that timeline can't receive timeline server's event > as fast as it generated. Now, RM system metrics publisher put only one event > in a request, and most time costs on handling http header or some thing about > the net connection on timeline side. Only few time is spent on dealing with > the timeline event which is truly valuable. > In this issue, we add a buffer in system metrics publisher and let publisher > send events to timeline server in batch via one request. When sets the batch > size to 1000, in out experiment the speed of the timeline server receives > events has 100x improvement. We have implement this function int our product > environment which accepts 2 app's in one hour and it works fine. > We add following configuration: > * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of > system metrics publisher sending events in one request. Default value is 1000 > * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the > event buffer in system metrics publisher. > * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When > enable batch publishing, we must avoid that the publisher waits for a batch > to be filled up and hold events in buffer for long time. So we add another > thread which send event's in the buffer periodically. This config sets the > interval of the cyclical sending thread. The default value is 60s. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10823) Expose all node labels for root without explicit configurations
[ https://issues.apache.org/jira/browse/YARN-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10823: --- Assignee: Andras Gyori (was: Tibor Kovács) > Expose all node labels for root without explicit configurations > --- > > Key: YARN-10823 > URL: https://issues.apache.org/jira/browse/YARN-10823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > By definition root capacity should be set for all node labels that are > configured for its descendants. Current proposition is to set a default 100 > capacity for every node label that is configured for any of its descendant > and not for root. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10555) Missing access check before getAppAttempts
[ https://issues.apache.org/jira/browse/YARN-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10555: --- Assignee: lujie (was: Tibor Kovács) > Missing access check before getAppAttempts > --- > > Key: YARN-10555 > URL: https://issues.apache.org/jira/browse/YARN-10555 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: lujie >Assignee: lujie >Priority: Critical > Labels: pull-request-available, security > Fix For: 3.4.0, 3.3.1, 2.10.2, 3.2.3 > > Attachments: YARN-10555_1.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > It seems that we miss a security check before getAppAttempts, see > [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1127] > thus we can get the some sensitive information, like logs link. > {code:java} > application_1609318368700_0002 belong to user2 > user1@hadoop11$ curl --negotiate -u : > http://hadoop11:8088/ws/v1/cluster/apps/application_1609318368700_0002/appattempts/|jq > { > "appAttempts": { > "appAttempt": [ > { > "id": 1, > "startTime": 1609318411566, > "containerId": "container_1609318368700_0002_01_01", > "nodeHttpAddress": "hadoop12:8044", > "nodeId": "hadoop12:36831", > "logsLink": > "http://hadoop12:8044/node/containerlogs/container_1609318368700_0002_01_01/user2;, > "blacklistedNodes": "", > "nodesBlacklistedBySystem": "" > } > ] > } > } > {code} > Other apis, like getApps and getApp, has access check like "hasAccess(app, > hsr)", they would hide the logs link if the appid do not belong to query > user, see > [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1098] > We need add hasAccess(app, hsr) for getAppAttempts. > > Besides, at > [https://github.com/apache/hadoop/blob/580a6a75a3e3d3b7918edeffd6e93fc211166884/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java#L145] > it seems that we have a access check in its caller, so now i pass "true" to > AppAttemptInfo in the patch. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10870: --- Assignee: Siddharth Ahuja (was: Tibor Kovács) > Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM > Scheduler page > > > Key: YARN-10870 > URL: https://issues.apache.org/jira/browse/YARN-10870 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Major > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: YARN-10870.001.patch, YARN-10870.002.patch, > YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, > YARN-10870.branch-3.3.002.patch > > > Non-permissible users are (incorrectly) able to view application submitted by > another user on the RM's Scheduler UI (not Applications UI), where > _non-permissible users_ are non-application-owners and are not present in the > application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL > as a Queue admin to which this job was submitted to" (see [1] where both the > filter setting introduced by YARN-8319 & ACL checks are performed): > The issue can be reproduced easily by having the setting > {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml. > The above disallows non-permissible users from viewing another user's > applications in the Applications page, but not in the Scheduler's page. > The filter setting seems to be getting checked only on the getApps() call but > not while rendering the apps information on the Scheduler page. This seems to > be a "missed" feature from YARN-8319. > Following pre-requisites are needed to reproduce the issue: > * Kerberized cluster, > * SPNEGO enabled for HDFS & YARN, > * Add test users - systest and user1 on all nodes. > * Add kerberos princs for the above users. > * Create HDFS user dirs for above users and chown them appropriately. > * Run a sample MR Sleep job and test. > Steps to reproduce the issue: > * kinit as "systest" user and run a sample MR sleep job from one of the nodes > in the cluster: > {code} > yarn jar sleep -m 1 -mt > 360 > {code} > * kinit as "user1" from Mac as an example (this assumes you've copied the > /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for > Spengo auth). > * Open the Applications page. user1 cannot view the job being run by systest. > This is correct. > * Open the Scheduler page. user1 *CAN* view the job being run by systest. > This is *INCORRECT*. > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging
[ https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10720: --- Assignee: Qi Zhu (was: Tibor Kovács) > YARN WebAppProxyServlet should support connection timeout to prevent proxy > server from hanging > -- > > Key: YARN-10720 > URL: https://issues.apache.org/jira/browse/YARN-10720 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Fix For: 3.4.0 > > Attachments: YARN-10720.001.patch, YARN-10720.002.patch, > YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, > YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, > image-2021-03-29-14-05-32-708.png > > > Following is proxy server show, {color:#de350b}too many connections from one > client{color}, this caused the proxy server hang, and the yarn web can't jump > to web proxy. > !image-2021-03-29-14-04-33-776.png|width=632,height=57! > Following is the AM which is abnormal, but proxy server don't know it is > abnormal already, so the connections can't be closed, we should add time out > support in proxy server to prevent this. And one abnormal AM may cause > hundreds even thousands of connections, it is very heavy. > !image-2021-03-29-14-05-32-708.png|width=669,height=101! > > After i kill the abnormal AM, the proxy server become healthy. This case > happened many times in our production clusters, our clusters are huge, and > the abnormal AM will be existed in a regular case. > > I will add timeout supported in web proxy server in this jira. > > cc [~pbacsko] [~ebadger] [~Jim_Brennan] [~ztang] [~epayne] [~gandras] > [~bteke] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10701) The yarn.resource-types should support multi types without trimmed.
[ https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10701: --- Assignee: Qi Zhu (was: Tibor Kovács) > The yarn.resource-types should support multi types without trimmed. > --- > > Key: YARN-10701 > URL: https://issues.apache.org/jira/browse/YARN-10701 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, > YARN-10701.002.patch > > > {code:java} > > > yarn.resource-types > yarn.io/gpu, yarn.io/fpga > > {code} > When i configured the resource type above with gpu and fpga, the error > happend: > > {code:java} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is > not a valid resource name. A valid resource name must begin with a letter and > contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource > name may also be optionally preceded by a name space followed by a slash. A > valid name space consists of period-separated groups of letters, numbers, and > dashes.{code} > > The resource types should support trim. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10555) Missing access check before getAppAttempts
[ https://issues.apache.org/jira/browse/YARN-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10555: --- Assignee: Tibor Kovács (was: lujie) > Missing access check before getAppAttempts > --- > > Key: YARN-10555 > URL: https://issues.apache.org/jira/browse/YARN-10555 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: lujie >Assignee: Tibor Kovács >Priority: Critical > Labels: pull-request-available, security > Fix For: 3.4.0, 3.3.1, 2.10.2, 3.2.3 > > Attachments: YARN-10555_1.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > It seems that we miss a security check before getAppAttempts, see > [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1127] > thus we can get the some sensitive information, like logs link. > {code:java} > application_1609318368700_0002 belong to user2 > user1@hadoop11$ curl --negotiate -u : > http://hadoop11:8088/ws/v1/cluster/apps/application_1609318368700_0002/appattempts/|jq > { > "appAttempts": { > "appAttempt": [ > { > "id": 1, > "startTime": 1609318411566, > "containerId": "container_1609318368700_0002_01_01", > "nodeHttpAddress": "hadoop12:8044", > "nodeId": "hadoop12:36831", > "logsLink": > "http://hadoop12:8044/node/containerlogs/container_1609318368700_0002_01_01/user2;, > "blacklistedNodes": "", > "nodesBlacklistedBySystem": "" > } > ] > } > } > {code} > Other apis, like getApps and getApp, has access check like "hasAccess(app, > hsr)", they would hide the logs link if the appid do not belong to query > user, see > [https://github.com/apache/hadoop/blob/513f1995adc9b73f9c7f4c7beb89725b51b313ac/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L1098] > We need add hasAccess(app, hsr) for getAppAttempts. > > Besides, at > [https://github.com/apache/hadoop/blob/580a6a75a3e3d3b7918edeffd6e93fc211166884/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java#L145] > it seems that we have a access check in its caller, so now i pass "true" to > AppAttemptInfo in the patch. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10823) Expose all node labels for root without explicit configurations
[ https://issues.apache.org/jira/browse/YARN-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10823: --- Assignee: Tibor Kovács (was: Andras Gyori) > Expose all node labels for root without explicit configurations > --- > > Key: YARN-10823 > URL: https://issues.apache.org/jira/browse/YARN-10823 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Andras Gyori >Assignee: Tibor Kovács >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > By definition root capacity should be set for all node labels that are > configured for its descendants. Current proposition is to set a default 100 > capacity for every node label that is configured for any of its descendant > and not for root. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10870: --- Assignee: Tibor Kovács (was: Gergely Pollák) > Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM > Scheduler page > > > Key: YARN-10870 > URL: https://issues.apache.org/jira/browse/YARN-10870 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Siddharth Ahuja >Assignee: Tibor Kovács >Priority: Major > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: YARN-10870.001.patch, YARN-10870.002.patch, > YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, > YARN-10870.branch-3.3.002.patch > > > Non-permissible users are (incorrectly) able to view application submitted by > another user on the RM's Scheduler UI (not Applications UI), where > _non-permissible users_ are non-application-owners and are not present in the > application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL > as a Queue admin to which this job was submitted to" (see [1] where both the > filter setting introduced by YARN-8319 & ACL checks are performed): > The issue can be reproduced easily by having the setting > {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml. > The above disallows non-permissible users from viewing another user's > applications in the Applications page, but not in the Scheduler's page. > The filter setting seems to be getting checked only on the getApps() call but > not while rendering the apps information on the Scheduler page. This seems to > be a "missed" feature from YARN-8319. > Following pre-requisites are needed to reproduce the issue: > * Kerberized cluster, > * SPNEGO enabled for HDFS & YARN, > * Add test users - systest and user1 on all nodes. > * Add kerberos princs for the above users. > * Create HDFS user dirs for above users and chown them appropriately. > * Run a sample MR Sleep job and test. > Steps to reproduce the issue: > * kinit as "systest" user and run a sample MR sleep job from one of the nodes > in the cluster: > {code} > yarn jar sleep -m 1 -mt > 360 > {code} > * kinit as "user1" from Mac as an example (this assumes you've copied the > /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for > Spengo auth). > * Open the Applications page. user1 cannot view the job being run by systest. > This is correct. > * Open the Scheduler page. user1 *CAN* view the job being run by systest. > This is *INCORRECT*. > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging
[ https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10720: --- Assignee: Tibor Kovács (was: Qi Zhu) > YARN WebAppProxyServlet should support connection timeout to prevent proxy > server from hanging > -- > > Key: YARN-10720 > URL: https://issues.apache.org/jira/browse/YARN-10720 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Tibor Kovács >Priority: Critical > Fix For: 3.4.0 > > Attachments: YARN-10720.001.patch, YARN-10720.002.patch, > YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, > YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, > image-2021-03-29-14-05-32-708.png > > > Following is proxy server show, {color:#de350b}too many connections from one > client{color}, this caused the proxy server hang, and the yarn web can't jump > to web proxy. > !image-2021-03-29-14-04-33-776.png|width=632,height=57! > Following is the AM which is abnormal, but proxy server don't know it is > abnormal already, so the connections can't be closed, we should add time out > support in proxy server to prevent this. And one abnormal AM may cause > hundreds even thousands of connections, it is very heavy. > !image-2021-03-29-14-05-32-708.png|width=669,height=101! > > After i kill the abnormal AM, the proxy server become healthy. This case > happened many times in our production clusters, our clusters are huge, and > the abnormal AM will be existed in a regular case. > > I will add timeout supported in web proxy server in this jira. > > cc [~pbacsko] [~ebadger] [~Jim_Brennan] [~ztang] [~epayne] [~gandras] > [~bteke] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10701) The yarn.resource-types should support multi types without trimmed.
[ https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kovács reassigned YARN-10701: --- Assignee: Tibor Kovács (was: Qi Zhu) > The yarn.resource-types should support multi types without trimmed. > --- > > Key: YARN-10701 > URL: https://issues.apache.org/jira/browse/YARN-10701 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Tibor Kovács >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, > YARN-10701.002.patch > > > {code:java} > > > yarn.resource-types > yarn.io/gpu, yarn.io/fpga > > {code} > When i configured the resource type above with gpu and fpga, the error > happend: > > {code:java} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is > not a valid resource name. A valid resource name must begin with a letter and > contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource > name may also be optionally preceded by a name space followed by a slash. A > valid name space consists of period-separated groups of letters, numbers, and > dashes.{code} > > The resource types should support trim. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org