Rajesh Balamohan created TEZ-4245:
-------------------------------------

             Summary: Optimise split grouping when locality information is set 
to null/empty
                 Key: TEZ-4245
                 URL: https://issues.apache.org/jira/browse/TEZ-4245
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Rajesh Balamohan


In objectstores like S3, locality information always shows up as "localhost".  
Having this information in inputsplit slows down scheduling as explained in 
https://issues.apache.org/jira/browse/HIVE-14060 Systems like hive remove 
"localhost" information from splits.

 

Split information without any locality information (localhost/null/empty) 
should be treated equally, so that split grouping can do meaningful grouping 
based on cluster size. This is to avoid creating small split groups, which can 
significantly increase runtime due to sequential processing (i.e same map task 
getting lots of inputs and system ends up spending time in open/seek/close on 
objectstores).

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to