[jira] [Updated] (YARN-11644) LogAggregationService can't upload log in time when application finished
[ https://issues.apache.org/jira/browse/YARN-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-11644: - Affects Version/s: 3.3.6 > LogAggregationService can't upload log in time when application finished > > > Key: YARN-11644 > URL: https://issues.apache.org/jira/browse/YARN-11644 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Affects Versions: 3.3.6 >Reporter: Xie YiFan >Assignee: Xie YiFan >Priority: Minor > Attachments: image-2024-01-10-11-03-57-553.png > > > LogAggregationService is responsible for uploading log to HDFS. It applies > thread pool to execute upload task. > The workflow of upload log as follow: > # NM construct Applicaiton object when first container of a certain > application launch, then notify LogAggregationService to init > AppLogAggregationImpl. > # LogAggregationService submit AppLogAggregationImpl to task queue > # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. > # AppLogAggregationImpl do while loop to check the application state, do > upload when application finished. > Suppose the following scenario: > * LogAggregationService initialize thread pool with 4 threads. > * 4 long running applications start on this NM, so all threads are occupied > by aggregator. > * The next short application starts on this NM and quickly finish, but no > idle thread for this app to upload log. > as a result, the following applications have to wait the previous > applications finish before uploading their logs. > !image-2024-01-10-11-03-57-553.png|width=599,height=195! > h4. Solution > Change the spin behavior of AppLogAggregationImpl. If application has not > finished, just return to yield current thread and resubmit itself to executor > service. So the LogAggregationService can roll the task queue and the logs of > finished application can be uploaded immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11644) LogAggregationService can't upload log in time when application finished
[ https://issues.apache.org/jira/browse/YARN-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-11644: - Description: LogAggregationService is responsible for uploading log to HDFS. It applies thread pool to execute upload task. The workflow of upload log as follow: # NM construct Applicaiton object when first container of a certain application launch, then notify LogAggregationService to init AppLogAggregationImpl. # LogAggregationService submit AppLogAggregationImpl to task queue # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. # AppLogAggregationImpl do while loop to check the application state, do upload when application finished. Suppose the following scenario: * LogAggregationService initialize thread pool with 4 threads. * 4 long running applications start on this NM, so all threads are occupied by aggregator. * The next short application starts on this NM and quickly finish, but no idle thread for this app to upload log. as a result, the following applications have to wait the previous applications finish before uploading their logs. !image-2024-01-10-11-03-57-553.png|width=599,height=195! h4. Solution Change the spin behavior of AppLogAggregationImpl. If application has not finished, just return to yield current thread and resubmit itself to executor service. So the LogAggregationService can roll the task queue and the logs of finished application can be uploaded immediately. was: LogAggregationService is responsible for uploading log to HDFS. It applies thread pool to execute upload task. The workflow of upload log as follow: # NM construct Applicaiton object when first container of a certain application launch, then notify LogAggregationService to init AppLogAggregationImpl. # LogAggregationService submit AppLogAggregationImpl to task queue. # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. # AppLogAggregationImpl do while loop to check the application state, do upload when application finished. Suppose the following scenario: * LogAggregationService initialize thread pool with 4 threads. * 4 long running applications start on this NM, so all threads are occupied by aggregator. * The next short application starts on this NM and quickly finish, but no idle thread for this app to upload log. as a result, the following applications have to wait the previous applications finish before uploading their logs. !image-2024-01-10-11-03-57-553.png|width=599,height=195! h4. Solution Change the spin behavior of AppLogAggregationImpl. If application has not finished, just return to yield current thread and resubmit itself to executor service. So the LogAggregationService can roll the task queue and the logs of finished application can be uploaded immediately. > LogAggregationService can't upload log in time when application finished > > > Key: YARN-11644 > URL: https://issues.apache.org/jira/browse/YARN-11644 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Reporter: Xie YiFan >Assignee: Xie YiFan >Priority: Minor > Attachments: image-2024-01-10-11-03-57-553.png > > > LogAggregationService is responsible for uploading log to HDFS. It applies > thread pool to execute upload task. > The workflow of upload log as follow: > # NM construct Applicaiton object when first container of a certain > application launch, then notify LogAggregationService to init > AppLogAggregationImpl. > # LogAggregationService submit AppLogAggregationImpl to task queue > # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. > # AppLogAggregationImpl do while loop to check the application state, do > upload when application finished. > Suppose the following scenario: > * LogAggregationService initialize thread pool with 4 threads. > * 4 long running applications start on this NM, so all threads are occupied > by aggregator. > * The next short application starts on this NM and quickly finish, but no > idle thread for this app to upload log. > as a result, the following applications have to wait the previous > applications finish before uploading their logs. > !image-2024-01-10-11-03-57-553.png|width=599,height=195! > h4. Solution > Change the spin behavior of AppLogAggregationImpl. If application has not > finished, just return to yield current thread and resubmit itself to executor > service. So the LogAggregationService can roll the task queue and the logs of > finished application can be uploaded immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoo
[jira] [Created] (YARN-11644) LogAggregationService can't upload log in time when application finished
Xie YiFan created YARN-11644: Summary: LogAggregationService can't upload log in time when application finished Key: YARN-11644 URL: https://issues.apache.org/jira/browse/YARN-11644 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Reporter: Xie YiFan Assignee: Xie YiFan Attachments: image-2024-01-10-11-03-57-553.png LogAggregationService is responsible for uploading log to HDFS. It applies thread pool to execute upload task. The workflow of upload log as follow: # NM construct Applicaiton object when first container of a certain application launch, then notify LogAggregationService to init AppLogAggregationImpl. # LogAggregationService submit AppLogAggregationImpl to task queue. # The idle worker of thread pool pulls AppLogAggregationImpl from task queue. # AppLogAggregationImpl do while loop to check the application state, do upload when application finished. Suppose the following scenario: * LogAggregationService initialize thread pool with 4 threads. * 4 long running applications start on this NM, so all threads are occupied by aggregator. * The next short application starts on this NM and quickly finish, but no idle thread for this app to upload log. as a result, the following applications have to wait the previous applications finish before uploading their logs. !image-2024-01-10-11-03-57-553.png|width=599,height=195! h4. Solution Change the spin behavior of AppLogAggregationImpl. If application has not finished, just return to yield current thread and resubmit itself to executor service. So the LogAggregationService can roll the task queue and the logs of finished application can be uploaded immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11634) Speed-up TestTimelineClient
[ https://issues.apache.org/jira/browse/YARN-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804933#comment-17804933 ] ASF GitHub Bot commented on YARN-11634: --- slfan1989 commented on code in PR #6419: URL: https://github.com/apache/hadoop/pull/6419#discussion_r1446781464 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineConnector.java: ## @@ -145,14 +145,14 @@ protected void serviceInit(Configuration conf) throws Exception { @Override public HttpURLConnection configure(HttpURLConnection conn) throws IOException { - setTimeouts(conn, DEFAULT_SOCKET_TIMEOUT); + setTimeouts(conn, 60_000); Review Comment: @brumi1024 Thanks for reviewing the code, I will improve it. > Speed-up TestTimelineClient > --- > > Key: YARN-11634 > URL: https://issues.apache.org/jira/browse/YARN-11634 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Bence Kosztolnik >Assignee: Bence Kosztolnik >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > The TimelineConnector.class has a hardcoded 1-minute connection time out, > which makes the TestTimelineClient a long-running test (~15:30 min). > Decreasing the timeout to 10ms will speed up the test run (~56 sec). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11643) Skip unnecessary pre-check in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804825#comment-17804825 ] ASF GitHub Bot commented on YARN-11643: --- hadoop-yetus commented on PR #6426: URL: https://github.com/apache/hadoop/pull/6426#issuecomment-1883508019 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 18s | | trunk passed | | +1 :green_heart: | compile | 0m 31s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 26s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 34s | | trunk passed | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 31s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 14s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 54s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 25s | | the patch passed | | +1 :green_heart: | compile | 0m 26s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 26s | | the patch passed | | +1 :green_heart: | compile | 0m 24s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 24s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 20s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 24s | | the patch passed | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 19m 53s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 577m 3s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6426/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicense | 0m 22s | | The patch does not generate ASF License warnings. | | | | 658m 39s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerWithMultiResourceTypes | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestAMAllocatedToNonExclusivePartition | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerMultiNodes | | | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | | | hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestAbsoluteResourceWithAutoQueue | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled | | | hadoop.yarn.server.resourcemanager.scheduler
[jira] [Assigned] (YARN-11639) ConcurrentModificationException and NPE in PriorityUtilizationQueueOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferenc Erdelyi reassigned YARN-11639: - Assignee: Ferenc Erdelyi > ConcurrentModificationException and NPE in > PriorityUtilizationQueueOrderingPolicy > - > > Key: YARN-11639 > URL: https://issues.apache.org/jira/browse/YARN-11639 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > > When dynamic queue creation is enabled in weight mode and the deletion policy > coincides with the PriorityQueueResourcesForSorting, RM stops assigning > resources because of either ConcurrentModificationException or NPE in > PriorityUtilizationQueueOrderingPolicy. > Reproduced the NPE issue in Java8 and Java11 environment: > {code:java} > ... INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Removing queue: root.dyn.PmvkMgrEBQppu > 2024-01-02 17:00:59,399 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[Thread-11,5,main] threw an Exception. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy$PriorityQueueResourcesForSorting.(PriorityUtilizationQueueOrderingPolicy.java:225) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:260) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:1100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:942) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:1124) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:942) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1724) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1659) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1816) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1562) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:558) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:605) > {code} > Observed the ConcurrentModificationException in Java8 environment, but could > not reproduce yet: > {code:java} > 2023-10-27 02:50:37,584 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler:Thread > Thread[Thread-15,5, main] threw an Exception. > java.util.ConcurrentModificationException > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1388) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterat