Read a given list of HDFS folder

2016-03-21 Thread Gwenhael Pasquiers
Hello, Sorry if this has been already asked or is already in the docs, I did not find the answer : Is there a way to read a given set of folders in Flink batch ? Let's say we have one folder per hour of data, written by flume, and we'd like to read only the N last hours (or any other pattern o

Re: Read a given list of HDFS folder

2016-03-21 Thread Ufuk Celebi
Hey Gwenhaël, see here for recursive traversal of input paths: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#recursive-traversal-of-the-input-path-directory Regarding the phases: the best way to exchange data between batch jobs is via files. You can then execute two

RE: Read a given list of HDFS folder

2016-03-21 Thread Gwenhael Pasquiers
fuk Celebi [mailto:u...@apache.org] Sent: lundi 21 mars 2016 13:39 To: user@flink.apache.org Subject: Re: Read a given list of HDFS folder Hey Gwenhaël, see here for recursive traversal of input paths: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#recursive-traversal-of

Re: Read a given list of HDFS folder

2016-03-29 Thread Maximilian Michels
t; > -Original Message- > From: Ufuk Celebi [mailto:u...@apache.org] > Sent: lundi 21 mars 2016 13:39 > To: user@flink.apache.org > Subject: Re: Read a given list of HDFS folder > > Hey Gwenhaël, > > see here for recursive traversal of input paths: > > h