[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

2022-10-22 Thread GitBox
nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1287946948 I have put up a patch to auto retry with spark data source writes incase of conflicts https://github.com/apache/hudi/pull/6854 Hope that helps your case. -- This is an autom

[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

2022-09-20 Thread GitBox
nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1253030434 nope. thats not how it works as of today. 2nd writer don't wait for 1st writer to complete. Thats not OCC at all in my understanding. what you are suggesting is, take a global lock f

[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

2022-09-15 Thread GitBox
nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1248873001 you can read about multi writer guarantees here https://hudi.apache.org/docs/concurrency_control#multi-writer-guarantees -- This is an automated message from the Apache Git Ser

[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

2022-09-15 Thread GitBox
nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1248872759 here is what is happening. if there are two concurrent writers writing to non overlapping data files, hudi will succeed both writes. but if both are modifying the same data file,

[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

2022-09-10 Thread GitBox
nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1242774819 @koochiswathiTR : can you check my above response and update please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

2022-09-06 Thread GitBox
nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1238882651 oh, I thought, both jobs are running concurrently? is it not. can you throw some light on exact steps. is it. step1: start job1 in EMR cluster1. which consumes from source X a

[GitHub] [hudi] nsivabalan commented on issue #6606: Observing data duplication with Single Writer

2022-09-06 Thread GitBox
nsivabalan commented on issue #6606: URL: https://github.com/apache/hudi/issues/6606#issuecomment-1238880984 unless you configure lock providers, hudi can't guarantee this. I would suggest to add locking for both writers. -- This is an automated message from the Apache Git Service. To re