Maximilian Michels created FLINK-2239:
-----------------------------------------

             Summary: print() on DataSet: stream results and print incrementally
                 Key: FLINK-2239
                 URL: https://issues.apache.org/jira/browse/FLINK-2239
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Runtime
    Affects Versions: 0.9
            Reporter: Maximilian Michels
             Fix For: 0.10


Users find it counter-intuitive that {{print()}} on a DataSet internally calls 
{{collect()}} and fully materializes the set. This leads to out of memory 
errors on the client. It also leaves users with the feeling that Flink cannot 
handle large amount of data and that it fails frequently.

To improve on this situation requires some major architectural changes in 
Flink. The easiest solution would probably be to transfer the data from the job 
manager to the client via the {{BlobManager}}. Alternatively, the client could 
directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to