Hi Peter, Thanks a lot for the breakdown, it all makes sense.
Unfortunately I work with companies who are stuck with older versions of Hive, so I'm trying to find some workarounds. I was actually able to make it mostly work. Here's what I do: - In configureJobConf(): - Create a work directory with a name based on the Hive Query ID, e.g. hdfs:///tmp/work-ABC123 - Get some details from the TableDesc object and save them into a JSON file: hdfs:///tmp/work-ABC123/info.json -- We need those details to know what external destination table we will output the results to later at commit time. - Each RecordWriter instance (which runs in a task) writes their own output file, e.g.: hdfs:///tmp/work-ABC123/task-XYZ789.output - When using MR, the OutputCommitter.commitJob() automatically gets called. There we read the external destination table details from the info.json file, collect all the task output files, and copy them to the external table. Lastly we clean up the work directory. - When using Tez, the commitInsertTable() automatically gets called. From there, we make explicit call to the OutputCommitter.commitJob() to finish the work. Does that approach seem sound to you? If so, the one thing that I think is still missing, is the ability to clean up the work directory at the end of the job **if the job has failed**. My understanding is that with MR the OutputCommitter.abortJob() method would be called. However, with Tez I'm not sure where such a hook might be. I see there is rollbackInsertTable() but that seems to be used for another purpose, i.e. to do something if commitInsertTable() itself fails. Is there a place I could hook into with Tez **at the end of a job** in the event that the job as a whole has failed? Alternatively, is there a hook for when a job is complete, regardless of whether it has succeeded or failed? Thanks, Julien