[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-714:
---------------------------
    Comment: was deleted

(was: Regarding the partial output, After second thought, I think we should 
only consider it as vertex basis rather than output basis. That means either 
one vertex's all outputs commits successfully or abort all.  I think the 
purpose of TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is to allow external system 
to check the intermediate vertex's output at vertex level. Say if one vertex 
has 2 outputs and TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false, and the 
first commit succeeded but the second commit fails, then we should abort both 
of them and mark this vertex to failed state. And it would be weird that if one 
vertex go to FAILED with one commit aborted while another commit is not aborted)

> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, 
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to