Robert Metzger created FLINK-1287:
-------------------------------------

             Summary: Improve File Input Split assignment
                 Key: FLINK-1287
                 URL: https://issues.apache.org/jira/browse/FLINK-1287
             Project: Flink
          Issue Type: Improvement
          Components: Local Runtime
            Reporter: Robert Metzger


While running some DFS read-intensive benchmarks, I found that the assignment 
of input splits is not optimal. In particular in cases where the numWorker != 
numDataNodes and when the replication factor is low (in my case it was 1).

In the particular example, the input had 40960 splits, of which 4694 were read 
remotely.  Spark did only 2056 remote reads for the same dataset.
With the replication factor increased to 2, Flink did only 290 remote reads. So 
usually, users shouldn't be affected by this issue.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to