Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/206/ [Jul 16, 2020 4:29:37 PM] (noreply) HADOOP-17129. Validating storage keys in ABFS correctly (#2141) [Jul 16, 2020 5:09:59 PM] (noreply) HADOOP-17130. Configuration.getValByRegex() shouldn't be updating the results while fetching. (#2142) [Jul 16, 2020 6:06:49 PM] (pjoseph) YARN-10339. Fix TimelineClient in NodeManager failing when Simple Http Auth used in Secure Cluster [Error replacing 'FILE' - Workspace is not accessible] - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10354) deadlock in ContainerMetrics and MetricsSystemImpl
Lee young gon created YARN-10354: Summary: deadlock in ContainerMetrics and MetricsSystemImpl Key: YARN-10354 URL: https://issues.apache.org/jira/browse/YARN-10354 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: hadoop 3.1.2 Reporter: Lee young gon Attachments: full_thread_dump.txt Could not get information about jmx in nodemanager. and I found deadlock through thread dump. Below is the deadlock threads. {code:java} "Timer for 'NodeManager' metrics system" - Thread t@42 java.lang.Thread.State: BLOCKED at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.getMetrics(ContainerMetrics.java:235) - waiting to lock <7668d6f0> (a org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics) owned by "NM ContainerManager dispatcher" t@299 at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406) - locked <3b956878> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381) - locked <3b956878> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Locked ownable synchronizers: - None "NM ContainerManager dispatcher" - Thread t@299 java.lang.Thread.State: BLOCKED at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.unregisterSource(MetricsSystemImpl.java:247) - waiting to lock <3b956878> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl) owned by "Timer for 'NodeManager' metrics system" t@42 at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.unregisterContainerMetrics(ContainerMetrics.java:228) - locked <4e31c3ec> (a java.lang.Class) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.finished(ContainerMetrics.java:255) - locked <7668d6f0> (a org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.updateContainerMetrics(ContainersMonitorImpl.java:813) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.onStopMonitoringContainer(ContainersMonitorImpl.java:935) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.handle(ContainersMonitorImpl.java:900) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.handle(ContainersMonitorImpl.java:57) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10353) Log vcores used and cumulative cpu in containers monitor
Jim Brennan created YARN-10353: -- Summary: Log vcores used and cumulative cpu in containers monitor Key: YARN-10353 URL: https://issues.apache.org/jira/browse/YARN-10353 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.4.0 Reporter: Jim Brennan Assignee: Jim Brennan We currently log the percentage/cpu and percentage/cpus-used-by-yarn in the Containers Monitor log. It would be useful to also log vcores used vs vcores assigned, and total accumulated CPU time. For example, currently we have an audit log that looks like this: {noformat} 2020-07-16 20:33:51,550 DEBUG [Container Monitor] ContainersMonitorImpl.audit (ContainersMonitorImpl.java:recordUsage(651)) - Resource usage of ProcessTree 809 for container-id container_1594931466123_0002_01_07: 309.5 MB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used CPU:143.0905 CPU/core:35.772625 {noformat} The proposal is to add two more fields to show vCores and Cumulative CPU ms: {noformat} 2020-07-16 20:33:51,550 DEBUG [Container Monitor] ContainersMonitorImpl.audit (ContainersMonitorImpl.java:recordUsage(651)) - Resource usage of ProcessTree 809 for container-id container_1594931466123_0002_01_07: 309.5 MB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used CPU:143.0905 CPU/core:35.772625 vCores:2/1 CPU-ms:4180 {noformat} This is a snippet of a log from one of our clusters running branch-2.8 with a similar change. {noformat} 2020-07-16 21:00:02,240 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 5267 for container-id container_e04_1594079801456_1397450_01_001992: 1.6 GB of 2.5 GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 18 of 10 CPU vCores used. Cumulative CPU time: 157410 2020-07-16 21:00:02,269 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 18801 for container-id container_e04_1594079801456_1390375_01_19: 413.2 MB of 2.5 GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 of 10 CPU vCores used. Cumulative CPU time: 113830 2020-07-16 21:00:02,298 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 5279 for container-id container_e04_1594079801456_1397450_01_001991: 2.2 GB of 2.5 GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 17 of 10 CPU vCores used. Cumulative CPU time: 128630 2020-07-16 21:00:02,339 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 24189 for container-id container_e04_1594079801456_1390430_01_000415: 392.7 MB of 2.5 GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 of 10 CPU vCores used. Cumulative CPU time: 96060 2020-07-16 21:00:02,367 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 6751 for container-id container_e04_1594079801456_1397923_01_003248: 1.3 GB of 3 GB physical memory used; 4.3 GB of 6.3 GB virtual memory used. CPU usage: 12 of 10 CPU vCores used. Cumulative CPU time: 116820 2020-07-16 21:00:02,396 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 12138 for container-id container_e04_1594079801456_1397760_01_44: 4.4 GB of 6 GB physical memory used; 6.9 GB of 12.6 GB virtual memory used. CPU usage: 15 of 10 CPU vCores used. Cumulative CPU time: 45900 2020-07-16 21:00:02,424 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 101918 for container-id container_e04_1594079801456_1391130_01_002378: 2.4 GB of 4 GB physical memory used; 5.8 GB of 8.4 GB virtual memory used. CPU usage: 13 of 10 CPU vCores used. Cumulative CPU time: 2572390 2020-07-16 21:00:02,456 [Container Monitor] DEBUG ContainersMonitorImpl.audit: Memory usage of ProcessTree 26596 for container-id container_e04_1594079801456_1390446_01_000665: 418.6 MB of 2.5 GB physical memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 of 10 CPU vCores used. Cumulative CPU time: 101210 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/205/ [Jul 15, 2020 4:39:48 AM] (noreply) HDFS-15385 Upgrade boost library to 1.72 (#2051) [Jul 15, 2020 4:46:20 AM] (noreply) MAPREDUCE-7284. TestCombineFileInputFormat#testMissingBlocks fails (#2136) [Jul 15, 2020 5:02:25 AM] (Shashikant Banerjee) HDFS-15319. Fix INode#isInLatestSnapshot() API. Contributed by Shashikant Banerjee. [Jul 15, 2020 5:31:34 AM] (Akira Ajisaka) YARN-10350. TestUserGroupMappingPlacementRule fails [Jul 15, 2020 6:24:34 AM] (noreply) MAPREDUCE-7285. Junit class missing from hadoop-mapreduce-client-jobclient-*-tests jar. (#2139) [Jul 15, 2020 2:53:18 PM] (Jonathan Turner Eagles) HADOOP-17101. Replace Guava Function with Java8+ Function [Jul 15, 2020 4:39:06 PM] (Jonathan Turner Eagles) HADOOP-17099. Replace Guava Predicate with Java8+ Predicate -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml findbugs : module:hadoop-yarn-project/hadoop-yarn Uncallable method org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance() defined in anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:[line 87] Dead store to entities in org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl) At TestTimelineReaderHBaseDown.java:org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl) At TestTimelineReaderHBaseDown.java:[line 190] findbugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server Uncallable method org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance() defined in anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:[line 87] Dead store to entities in org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl) At TestTimelineReaderHBaseDown.java:org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl) At TestTimelineReaderHBaseDown.java:[line 190] findbugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests Uncallable method org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance() defined in anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:[line 87] Dead store to entities in org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl) At TestTimelineReaderHBaseDown.java:org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl) At TestTimelineReaderHBaseDown.java:[line 190] findbugs : module:hadoop-yarn-project Uncallable method org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance() defined in anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At TestTimelineReaderWebServicesHBaseStorage.java:[line 87] Dead store to entities in org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl) At TestTimelineReaderHBaseDown.java:org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBas
[jira] [Created] (YARN-10352) MultiNode Placament assigns container on stopped NodeManagers
Prabhu Joseph created YARN-10352: Summary: MultiNode Placament assigns container on stopped NodeManagers Key: YARN-10352 URL: https://issues.apache.org/jira/browse/YARN-10352 Project: Hadoop YARN Issue Type: Bug Reporter: Prabhu Joseph Assignee: Prabhu Joseph When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM Active Nodes will be still having those stopped nodes until NM Liveliness Monitor Expires after configured timeout (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, Multi Node Placement assigns the containers on those nodes. They need to exclude the nodes which has not heartbeated for configured heartbeat interval (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to Asynchronous Capacity Scheduler Threads. (CapacityScheduler#shouldSkipNodeSchedule) *Repro:* 1. Enable Multi Node Placement (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery Enabled (yarn.node.recovery.enabled) 2. Have only one NM running say worker0 3. Stop worker0 and start any other NM say worker1 4. Submit a sleep job. The containers will timeout as assigned to stopped NM worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/749/ [Jul 15, 2020 3:33:42 PM] (ekrogen) HADOOP-17127. Use RpcMetrics.TIMEUNIT to initialize rpc queueTime and -1 overall The following subsystems voted -1: asflicense findbugs hadolint jshint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml findbugs : module:hadoop-yarn-project/hadoop-yarn Useless object stored in variable removedNullContainers of method org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List) At NodeStatusUpdaterImpl.java:removedNullContainers of method org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List) At NodeStatusUpdaterImpl.java:[line 664] org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeVeryOldStoppedContainersFromCache() makes inefficient use of keySet iterator instead of entrySet iterator At NodeStatusUpdaterImpl.java:keySet iterator instead of entrySet iterator At NodeStatusUpdaterImpl.java:[line 741] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.createStatus() makes inefficient use of keySet iterator instead of entrySet iterator At ContainerLocalizer.java:keySet iterator instead of entrySet iterator At ContainerLocalizer.java:[line 359] org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.usageMetrics is a mutable collection which should be package protected At ContainerMetrics.java:which should be package protected At ContainerMetrics.java:[line 134] Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] org.apache.hadoop.yarn.state.StateMachineFactory.generateStateGraph(String) makes inefficient use of keySet iterator instead of entrySet iterator At StateMachineFactory.java:keySet iterator instead of entrySet iterator At StateMachineFactory.java:[line 505] findbugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common org.apache.hadoop.yarn.state.StateMachineFactory.generateStateGraph(String) makes inefficient use of keySet iterator instead of entrySet iterator At StateMachineFactory.java:keySet iterator instead of entrySet iterator At StateMachineFactory.java:[line 505] findbugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server Useless object stored in variable removedNullContainers of method org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List) At NodeStatusUpdaterImpl.java:removedNullContainers of method org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List) At NodeStatusUpdaterImpl.java:[line 664] org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeVeryOldStoppedContainersFromCache() makes inefficient use of keySet iterator instead of entrySet iterator At NodeStatusUpdaterImpl.java:keySet iterator instead of entrySet iterator At NodeStatusUpdaterImpl.java:[line 741] org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.createStatus() makes inefficient use of keySet iterator instead of entrySet iterator At ContainerLocalizer.java:keySet iterator instead of entrySet iterator At ContainerLocalizer.java:[line 359] org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.usageMetrics is a mutable collection which should be package protected At ContainerMetrics.java:which should be package protected At ContainerMetrics.java:[line 134] Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At