[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Zhang updated TEZ-714: --------------------------- Comment: was deleted (was: Regarding the partial output, After second thought, I think we should only consider it as vertex basis rather than output basis. That means either one vertex's all outputs commits successfully or abort all. I think the purpose of TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is to allow external system to check the intermediate vertex's output at vertex level. Say if one vertex has 2 outputs and TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false, and the first commit succeeded but the second commit fails, then we should abort both of them and mark this vertex to failed state. And it would be weird that if one vertex go to FAILED with one commit aborted while another commit is not aborted) > OutputCommitters should not run in the main AM dispatcher thread > ---------------------------------------------------------------- > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement > Reporter: Siddharth Seth > Assignee: Jeff Zhang > Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)