nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1287946948
I have put up a patch to auto retry with spark data source writes incase of
conflicts https://github.com/apache/hudi/pull/6854
Hope that helps your case.
--
This is an autom
nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1253030434
nope. thats not how it works as of today. 2nd writer don't wait for 1st
writer to complete. Thats not OCC at all in my understanding. what you are
suggesting is, take a global lock f
nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1248873001
you can read about multi writer guarantees here
https://hudi.apache.org/docs/concurrency_control#multi-writer-guarantees
--
This is an automated message from the Apache Git Ser
nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1248872759
here is what is happening.
if there are two concurrent writers writing to non overlapping data files,
hudi will succeed both writes. but if both are modifying the same data file,
nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1242774819
@koochiswathiTR : can you check my above response and update please.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1238882651
oh, I thought, both jobs are running concurrently? is it not. can you throw
some light on exact steps.
is it.
step1: start job1 in EMR cluster1. which consumes from source X a
nsivabalan commented on issue #6606:
URL: https://github.com/apache/hudi/issues/6606#issuecomment-1238880984
unless you configure lock providers, hudi can't guarantee this. I would
suggest to add locking for both writers.
--
This is an automated message from the Apache Git Service.
To re