[ https://issues.apache.org/jira/browse/HDFS-17102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDFS-17102: ---------------------------------- Labels: pull-request-available (was: ) > Timeout encountered when running TestDataNodeOutlierDetectionViaMetrics > ----------------------------------------------------------------------- > > Key: HDFS-17102 > URL: https://issues.apache.org/jira/browse/HDFS-17102 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: ConfX > Priority: Critical > Labels: pull-request-available > Attachments: reproduce.sh > > > h2. What happened: > Got a timeout when running {{TestDataNodeOutlierDetectionViaMetrics}} and > setting min outlier to 0 or negative. > h2. Where's the bug: > In {{TestDataNodeOutlierDetectionViaMetrics.injectFastNodesSamples}} the test > injects several packets into the nodes: > {noformat} > for (int i = 0; > i < 2 * peerMetrics.getMinOutlierDetectionSamples(); > ++i) { > peerMetrics.addSendPacketDownstream( > nodeName, random.nextInt(FAST_NODE_MAX_LATENCY_MS)); > }{noformat} > A similar logic appears in the {{{}injectSlowNodesSamples{}}}. A problem with > this code is that if > {{dfs.datanode.peer.metrics.min.outlier.detection.samples}} is set to > negative or 0, no packet would be injected and the {{waitFor}} later: > {noformat} > GenericTestUtils.waitFor(new Supplier<Boolean>() { > @Override > public Boolean get() { > return peerMetrics.getOutliers().size() > 0; > } > }, 500, 100_000);{noformat} > would keeping waiting until timeout. > h2. How to reproduce: > (1) Set {{dfs.datanode.peer.metrics.min.outlier.detection.samples }} to {{0}} > (2) Run test: > {{org.apache.hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics#testOutlierIsDetected}} > h2. Stacktrace: > > {noformat} > java.util.concurrent.TimeoutException: > Timed out waiting for condition. > Thread diagnostics: > Timestamp: 2023-07-04 04:08:54,535 > "Reference Handler" daemon prio=10 tid=2 runnable > java.lang.Thread.State: RUNNABLE > at > java.base@11.0.18/java.lang.ref.Reference.waitForReferencePendingList(Native > Method) > at > java.base@11.0.18/java.lang.ref.Reference.processPendingReferences(Reference.java:241) > at > java.base@11.0.18/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213) > "surefire-forkedjvm-command-thread" daemon prio=5 tid=23 runnable > java.lang.Thread.State: RUNNABLE > ... > {noformat} > For an easy reproduction, run the reproduce.sh in the attachment. > We are happy to provide a patch if this issue is confirmed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org