This is regarding the fix that was incorporated in HIVE-6888 <https://issues.apache.org/jira/browse/HIVE-6888> (commit <https://github.com/apache/hive/commit/eb9fece245a4a529ce3d400a580c55f0c2180785> ).
The fix was issued because the MapWork objects were being leaked due to having multiple AMs. However, there are cases when this fix clears gWorkMap prematurely and it is populated (and cleared) again. For example, when HiveInputFormat.getSplits() is called from HiveSplitGenerator.initialize(). Here, gWorkMap is cleared when getSplits() is called, and populated again when splitGrouper.generateGroupedSplits() is called. gWorkMap is finally cleared in the 'finally' block of HiveSplitGenerator.initialize(). In our codebase, we do some modification to MapWork in the getSplits() call, and those changes are negated when clearMapWork() is called inside HiveInputFormat.getSplits(). I'm wondering if this call is really required?