Hi all, I am currently running some jobs coded in Beam in streaming mode on
Yarn session by Flink. My data sink was CSV files like the one in examples of
TfIdf. And I noticed that the output format for Beam is to produce one file for
every record, and also temp files for them. That would result in my space used
exceed maximum. I am not sure whether is the problem that I used the API
incorrectly but I am wondering if there any way I can put all those records
into one file, or keep updating in that file, or delete those tempt files by
windowing or triggering?
Claire