Luke Chen created KAFKA-14242: --------------------------------- Summary: Hanging logManager in testReloadUpdatedFilesWithoutConfigChange test Key: KAFKA-14242 URL: https://issues.apache.org/jira/browse/KAFKA-14242 Project: Kafka Issue Type: Test Reporter: Luke Chen Assignee: Luke Chen
Recently, we got a lot of build failed (and terminated) with core:unitTest failure. The failed messages look like this: FAILURE: Build failed with an exception. [2022-09-14T09:51:52.190Z] [2022-09-14T09:51:52.190Z] * What went wrong: [2022-09-14T09:51:52.190Z] Execution failed for task ':core:unitTest'. [2022-09-14T09:51:52.190Z] > Process 'Gradle Test Executor 128' finished with non-zero exit value 1{{}} After investigation, I found one reason of it (maybe there are other reasons). In {{BrokerMetadataPublisherTest#testReloadUpdatedFilesWithoutConfigChange}} test, we created logManager twice, but when cleanup, we only close one of them. So, there will be a log cleaner keeping running. But during this time, the temp log dirs are deleted, so it will {{{}Exit.halt(1){}}}, and got the error we saw in gradle, like this code did when we encounter IOException in all our log dirs: fatal(s"Shutdown broker because all log dirs in ${logDirs.mkString(", ")} have failed") Exit.halt(1){{}} And, why does it sometimes pass, sometimes failed? Because during test cluster close, we shutdown broker first, and then other components. And the log cleaner is triggered in an interval. So, if the cluster can close fast enough, and finish this test, it'll be passed. Otherwise, it'll exit with 1. -- This message was sent by Atlassian Jira (v8.20.10#820010)