subject:"Observer Namenode and Committer Algorithm V1"

Re: Observer Namenode and Committer Algorithm V1

2021-09-20 Thread Venkatakrishnan Sowrirajan

I have created a JIRA (https://issues.apache.org/jira/browse/SPARK-36810) to track this issue. Will look into this issue further in the coming days. Regards Venkata krishnan On Tue, Sep 7, 2021 at 5:57 AM Steve Loughran wrote: > FileContext came in Hadoop 2.x with a cleaner split of client API

Re: Observer Namenode and Committer Algorithm V1

2021-09-07 Thread Steve Loughran

FileContext came in Hadoop 2.x with a cleaner split of client API and driver implementation, and stricter definition of some things considered broken in FileSystem (rename() corner cases, notion of a current directory, ...) But as it came out after the platform was broadly adopted & never backport

Re: Observer Namenode and Committer Algorithm V1

2021-09-06 Thread Adam Binford

Sharing some things I learned looking into the Delta Lake issue: - This was a read after write inconsistency _all on the driver_. Specifically it currently uses the FileSystem API for reading table logs for greater compatibility, but the FileContext API for writes for atomic renames. This led to t

Re: Observer Namenode and Committer Algorithm V1

2021-08-20 Thread Steve Loughran

ooh, this is fun, v2 isn't safe to use unless every task attempt generates files with exactly the same names and it is okay to intermingle the output of two task attempts. This is because task commit can felt partway through (or worse, that process pause for a full GC), and a second attempt commi

Re: Observer Namenode and Committer Algorithm V1

2021-08-20 Thread Adam Binford

So it turns out Delta Lake isn't compatible out of the box due to it's mixed use of the FileContext API for writes and the FileSystem API for reads on the driver. Bringing that up with those devs now but in the meantime the auto-msync-only-on-driver trick is already coming in handy, thanks! On Wed

Re: Observer Namenode and Committer Algorithm V1

2021-08-18 Thread Adam Binford

Ahhh we don't do any RDD checkpointing but that makes sense too. Thanks for the tip on setting that on the driver only, I didn't know that was possible but it makes a lot of sense. I couldn't tell you the first thing about reflection but good to know it's actually something possible to implement o

Re: Observer Namenode and Committer Algorithm V1

2021-08-17 Thread Erik Krogen

Hi Adam, Thanks for this great writeup of the issue. We (LinkedIn) also operate Observer NameNodes, and have observed the same issues, but have not yet gotten around to implementing a proper fix. To add a bit of context from our side, there is at least one other place besides the committer v1 alg

Observer Namenode and Committer Algorithm V1

2021-08-17 Thread Adam Binford

Hi, We ran into an interesting issue that I wanted to share as well as get thoughts on if anything should be done about this. We run our own Hadoop cluster and recently deployed an Observer Namenode to take some burden off of our Active Namenode. We mostly use Delta Lake as our format, and everyth

Re: Observer Namenode and Committer Algorithm V1

Re: Observer Namenode and Committer Algorithm V1

Re: Observer Namenode and Committer Algorithm V1

Re: Observer Namenode and Committer Algorithm V1

Re: Observer Namenode and Committer Algorithm V1

Re: Observer Namenode and Committer Algorithm V1

Re: Observer Namenode and Committer Algorithm V1

Observer Namenode and Committer Algorithm V1

8 matches

Site Navigation

Mail list logo

Footer information