Thanks Balaji,

That makes a lot of sense. I haven't seen any issues in my testing, I am
just trying to understand all the edge cases.

I suppose the only theoretical issue is that a reader may not see the most
recent update for the writer but that would be a rare and transient
occurrence in real life.

Best,
Ryan

On Thu, Aug 13, 2020 at 5:38 PM Balaji Varadarajan
<[email protected]> wrote:

> Hey Ryan,
>
> Thanks for the detailed writeup and great job explaining the question and
> the links :)
>
> W.r.t Renaming, Hudi avoids renaming metadata files altogether and creates
> immutable metadata filenames encoded with state of the commit.
>
> Generally, We believe some of the consistency solutions out there have
> been written in early days of S3 when the guarantees were not well
> estabilished/understood.
>
> S3 consistency guard in Hudi has been fairly battle-tested for a while by
> the community now in their production cluser. Are you seeing any specific
> issues in your setup ?
>
> Once again thanks for your interest in Hudi
>
> Balaji.V
> On Wednesday, August 12, 2020, 10:35:05 AM PDT, Ryan Murray <
> [email protected]> wrote:
>
>
> Hey all,
>
> I've been playing around with Hudi for a little while now. Really like it!
> Thanks for all the work :-)
>
> I do have a question about S3 and consistency: How does Hudi get around
> eventual consistency in S3? Particularly in the case of metadata files.
>
> I can see there is a ConsistencyGuard[1] which ensures that the JVM Thread
> its run in can see a path, however it isn't clear to me that this would be
> valid across a system.
>
> If a writer 'A' performs an action which requires a rename for example how
> can we ensure that readers B and C see the newly renamed file? Or even that
> nodes across reader B (eg a spark cluster) see the same file content?
>
> To me this is checking if an object is visible from a particular thread
> rather than checking the eventual consistency restrictions of S3[2]. People
> have gone to great lengths to get around S3s consistency issues as well
> [3][4].
>
> Apologies if this is a naive question, I am still grappling with the Hudi
> commit model.
>
> Best,
> Ryan
>
> [1]
> https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/fs/ConsistencyGuard.java
> [2]
> https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel
> [3] https://github.com/Netflix/s3mper
> [4] https://docs.delta.io/latest/delta-storage.html#amazon-s3
>

Reply via email to