Rajesh Balamohan created TEZ-4245: ------------------------------------- Summary: Optimise split grouping when locality information is set to null/empty Key: TEZ-4245 URL: https://issues.apache.org/jira/browse/TEZ-4245 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan
In objectstores like S3, locality information always shows up as "localhost". Having this information in inputsplit slows down scheduling as explained in https://issues.apache.org/jira/browse/HIVE-14060 Systems like hive remove "localhost" information from splits. Split information without any locality information (localhost/null/empty) should be treated equally, so that split grouping can do meaningful grouping based on cluster size. This is to avoid creating small split groups, which can significantly increase runtime due to sequential processing (i.e same map task getting lots of inputs and system ends up spending time in open/seek/close on objectstores). -- This message was sent by Atlassian Jira (v8.3.4#803005)