> On 05 Feb 2016, at 08:56, Jeyhun Karimov <je.kari...@gmail.com> wrote:
> 
> For example, I will do aggregate operations with other windows (n-window 
> aggregations) that are already outputted.
> I tried your suggestion and used filesystem sink, outputted to HDFS.
>  I got k files in HDFS directory where k is the number of parallelism (I used 
> single machine).
> These files get bigger (new records are appended) as stream continues. 
> Because they are (outputted files) not closed and file size is changed 
> regularly, would this cause some problems while processing data with dataset 
> api or hadoop or another library?

I think you have used the plain file sink and Robert was referring to the 
rolling HDFS file sink [1] This will bucket your data in different directories 
like this: /base/path/{date-time}/part-{parallel-task}-{count}

– Ufuk

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/hdfs.html

Reply via email to