streaming output in just one files

Claire Yuan Tue, 08 Aug 2017 10:09:02 -0700

Hi all,  I am currently running some jobs coded in Beam in streaming mode on 
Yarn session by Flink. My data sink was CSV files like the one in examples of 
TfIdf. And I noticed that the output format for Beam is to produce one file for 
every record, and also temp files for them. That would result in my space used 
exceed maximum.   I am not sure whether is the problem that I used the API 
incorrectly but I am wondering if there any way I can put all those records 
into one file, or keep updating in that file, or delete those tempt files by 
windowing or triggering?
Claire

streaming output in just one files

Reply via email to