Re: [E] Re: the v2 commit algorithm

2020-10-20 Thread Steve Loughran
I Think the conclusion is "no change for now", but people do need to understand the risks better. One thing I'd like to understand are: which FileOutputFormat subclasses generate unique filenames which are different in different task attempts? I've heard a mention of Avro here, but not looked in th

Re: [E] Re: the v2 commit algorithm

2020-09-24 Thread epa...@apache.org
Thanks Steve and Jim for bringing this issue to our attention. IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very quick. With this kind of performance difference, is wise to change the default behavior for released versions of Hadoop? Should this be limited to trunk? Than

Re: [E] Re: the v2 commit algorithm

2020-09-24 Thread Steve Loughran
On Wed, 23 Sep 2020 at 20:16, Jim Brennan wrote: > I replied in the Jira. The speed up provided by the v2 commit algorithm > is very important to us at Verizon Media (Yahoo). Please do not remove it. > I referred to this comment from Jason Lowe on the original Jira: > > https://issues.apache.o

Re: the v2 commit algorithm

2020-09-24 Thread Steve Loughran
On Wed, 23 Sep 2020 at 20:07, Igor Dvorzhak wrote: > What will be the solution for object stores to have fast and correct > commit algorithms? > https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_draft_006 There's a plugin point for you to add an explicit committer for gcs:

Re: [E] Re: the v2 commit algorithm

2020-09-23 Thread Jim Brennan
I replied in the Jira. The speed up provided by the v2 commit algorithm is very important to us at Verizon Media (Yahoo). Please do not remove it. I referred to this comment from Jason Lowe on the original Jira: https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=

Re: the v2 commit algorithm

2020-09-23 Thread Igor Dvorzhak
What will be the solution for object stores to have fast and correct commit algorithms? On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran wrote: > I've got a PR up to completely remove the v2 commit algorithm > > https://github.com/apache/hadoop/pull/2320 > > That may seem overkill, but while *we*