Manoj Govindassamy created HUDI-3031:
----------------------------------------

             Summary: TestHoodieDeltaStreamerWithMultiWriter time out due to 
async services and writer deadlock
                 Key: HUDI-3031
                 URL: https://issues.apache.org/jira/browse/HUDI-3031
             Project: Apache Hudi
          Issue Type: Task
          Components: Writer Core
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy
             Fix For: 0.11.0


Off late, TestHoodieDeltaStreamerWithMultiWriter started consistently failing 
for the MOR table type. The test spins off few pool threads to do table 
ingestion via back filling along with async compaction and clustering. After 
the data ingestion is completed the test endlessly waits for the the following 
condition to pass.

 
{code:java}
// Condition for parallel ingestion job
Function<Boolean, Boolean> conditionForRegularIngestion = (r) -> {
  if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
    
TestHoodieDeltaStreamer.TestHelpers.assertAtleastNDeltaCommitsAfterCommit(3, 
lastSuccessfulCommit, tableBasePath, fs());
  } else {
    
TestHoodieDeltaStreamer.TestHelpers.assertAtleastNCompactionCommitsAfterCommit(3,
 lastSuccessfulCommit, tableBasePath, fs());
  }
  TestHoodieDeltaStreamer.TestHelpers.assertRecordCount(totalRecords, 
tableBasePath + "/*/*.parquet", sqlContext());
  TestHoodieDeltaStreamer.TestHelpers.assertDistanceCount(totalRecords, 
tableBasePath + "/*/*.parquet", sqlContext());
  return true;
}; {code}
Issue 1: The compaction thread and the writer thread are in deadlock
{code:java}
"async_compact_thread" #188 prio=5 os_prio=31 tid=0x00007f8c26266800 
nid=0x15803 waiting for monitor entry [0x0000700009d3e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at 
org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:70)
    - waiting to lock <0x00000006c353f528> (a 
org.apache.hudi.client.transaction.TransactionManager)
    at 
org.apache.hudi.client.SparkRDDWriteClient.completeCompaction(SparkRDDWriteClient.java:312)
    at 
org.apache.hudi.client.SparkRDDWriteClient.commitCompaction(SparkRDDWriteClient.java:294)
    at 
org.apache.hudi.client.HoodieSparkCompactor.compact(HoodieSparkCompactor.java:59)
    at 
org.apache.hudi.async.AsyncCompactService.lambda$null$1(AsyncCompactService.java:89)
    at 
org.apache.hudi.async.AsyncCompactService$$Lambda$612/2034420774.get(Unknown 
Source)
    at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748) 



"pool-22-thread-1" #143 prio=5 os_prio=31 tid=0x00007f8c0b125800 nid=0x12603 
waiting on condition [0x0000700006fb7000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at 
org.apache.hudi.client.transaction.FileSystemBasedLockProviderTestClass.tryLock(FileSystemBasedLockProviderTestClass.java:80)
    at 
org.apache.hudi.client.transaction.lock.LockManager.lock(LockManager.java:68)
    at 
org.apache.hudi.client.transaction.TransactionManager.beginTransaction(TransactionManager.java:64)
    - locked <0x00000006c353f528> (a 
org.apache.hudi.client.transaction.TransactionManager)
    at 
org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:193)
    at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
    at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:536)
    at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:308)



{code}
Issue 2: Even after fixing the above my replacing the  
hoodie.write.lock.provider with the local lock provider, the end condition of 3 
DeltaCommitAfterLastCommit is not met and the test times out. This needs to be 
digged further.

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to