[ 
https://issues.apache.org/jira/browse/TEZ-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-4245:
----------------------------------
    Attachment: TEZ-4245.1.patch

> Optimise split grouping when locality information is set to null/empty
> ----------------------------------------------------------------------
>
>                 Key: TEZ-4245
>                 URL: https://issues.apache.org/jira/browse/TEZ-4245
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Priority: Major
>         Attachments: TEZ-4245.1.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In objectstores like S3, locality information always shows up as "localhost". 
>  Having this information in inputsplit slows down scheduling as explained in 
> https://issues.apache.org/jira/browse/HIVE-14060 Systems like hive remove 
> "localhost" information from splits.
>  
> Split information without any locality information (localhost/null/empty) 
> should be treated equally, so that split grouping can do meaningful grouping 
> based on cluster size. This is to avoid creating small split groups, which 
> can significantly increase runtime due to sequential processing (i.e same map 
> task getting lots of inputs and system ends up spending time in 
> open/seek/close on objectstores).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to