I Think the conclusion is "no change for now", but people do need to
understand the risks better. One thing I'd like to understand are: which
FileOutputFormat subclasses generate unique filenames which are different
in different task attempts? I've heard a mention of Avro here, but not
looked in th
Thanks Steve and Jim for bringing this issue to our attention.
IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very
quick. With this kind of performance
difference, is wise to change the default behavior for released versions of
Hadoop? Should this be limited to
trunk?
Than
On Wed, 23 Sep 2020 at 20:16, Jim Brennan
wrote:
> I replied in the Jira. The speed up provided by the v2 commit algorithm
> is very important to us at Verizon Media (Yahoo). Please do not remove it.
> I referred to this comment from Jason Lowe on the original Jira:
>
> https://issues.apache.o
On Wed, 23 Sep 2020 at 20:07, Igor Dvorzhak wrote:
> What will be the solution for object stores to have fast and correct
> commit algorithms?
>
https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_draft_006
There's a plugin point for you to add an explicit committer for gcs:
I replied in the Jira. The speed up provided by the v2 commit algorithm
is very important to us at Verizon Media (Yahoo). Please do not remove it.
I referred to this comment from Jason Lowe on the original Jira:
https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=
What will be the solution for object stores to have fast and correct commit
algorithms?
On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
wrote:
> I've got a PR up to completely remove the v2 commit algorithm
>
> https://github.com/apache/hadoop/pull/2320
>
> That may seem overkill, but while *we*