Hello
I'm working on a mapred wrapper for the MagicS3GuardCommiter (necessary
for Hive since it cannot migrate to MRv2, and anything that interacts with
Hive tables like Pig). The wrapper just forwards calls to the mapreduce
version similar to how other mapred classes are implemented, and it works
correctly with the MR execution engine.
When using the committer for a simple Pig job which writes to a Hive table,
the job completes but none of the data is committed when using Tez as the
execution engine. Based on my investigations, it looks like the root cause
is that the MagicS3GuardCommitter assumes that the jobId of both the task
and the job are the same when determining where to write the PendingSet.
This assumption holds for MR, but not Tez. Reproducing the behavior
requires applying a patch to add the wrapper. Is this something with a
known workaround? Should I go ahead and open a JIRA for this?
If I should be e-mailing a different mailing list about this, let me know,
I wasn't sure which one I should go to for hadoop-aws issues.
Thanks
Venkatasubrahmanian Narayanan