Manoj Govindassamy created HUDI-3031: ----------------------------------------
Summary: TestHoodieDeltaStreamerWithMultiWriter time out due to async services and writer deadlock Key: HUDI-3031 URL: https://issues.apache.org/jira/browse/HUDI-3031 Project: Apache Hudi Issue Type: Task Components: Writer Core Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy Fix For: 0.11.0 Off late, TestHoodieDeltaStreamerWithMultiWriter started consistently failing for the MOR table type. The test spins off few pool threads to do table ingestion via back filling along with async compaction and clustering. After the data ingestion is completed the test endlessly waits for the the following condition to pass. {code:java} // Condition for parallel ingestion job Function<Boolean, Boolean> conditionForRegularIngestion = (r) -> { if (tableType.equals(HoodieTableType.MERGE_ON_READ)) { TestHoodieDeltaStreamer.TestHelpers.assertAtleastNDeltaCommitsAfterCommit(3, lastSuccessfulCommit, tableBasePath, fs()); } else { TestHoodieDeltaStreamer.TestHelpers.assertAtleastNCompactionCommitsAfterCommit(3, lastSuccessfulCommit, tableBasePath, fs()); } TestHoodieDeltaStreamer.TestHelpers.assertRecordCount(totalRecords, tableBasePath + "/*/*.parquet", sqlContext()); TestHoodieDeltaStreamer.TestHelpers.assertDistanceCount(totalRecords, tableBasePath + "/*/*.parquet", sqlContext()); return true; }; {code} Issue 1: The compaction thread and the writer thread are in deadlock {code:java} "async_compact_thread" #188 prio=5 os_prio=31 tid=0x00007f8c26266800 nid=0x15803 waiting for monitor entry [0x0000700009d3e000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:70) - waiting to lock <0x00000006c353f528> (a org.apache.hudi.client.transaction.TransactionManager) at org.apache.hudi.client.SparkRDDWriteClient.completeCompaction(SparkRDDWriteClient.java:312) at org.apache.hudi.client.SparkRDDWriteClient.commitCompaction(SparkRDDWriteClient.java:294) at org.apache.hudi.client.HoodieSparkCompactor.compact(HoodieSparkCompactor.java:59) at org.apache.hudi.async.AsyncCompactService.lambda$null$1(AsyncCompactService.java:89) at org.apache.hudi.async.AsyncCompactService$$Lambda$612/2034420774.get(Unknown Source) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "pool-22-thread-1" #143 prio=5 os_prio=31 tid=0x00007f8c0b125800 nid=0x12603 waiting on condition [0x0000700006fb7000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hudi.client.transaction.FileSystemBasedLockProviderTestClass.tryLock(FileSystemBasedLockProviderTestClass.java:80) at org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:68) at org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:64) - locked <0x00000006c353f528> (a org.apache.hudi.client.transaction.TransactionManager) at org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:193) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:536) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:308) {code} Issue 2: Even after fixing the above my replacing the hoodie.write.lock.provider with the local lock provider, the end condition of 3 DeltaCommitAfterLastCommit is not met and the test times out. This needs to be digged further. -- This message was sent by Atlassian Jira (v8.20.1#820001)