Re: readFile, DataStream

2017-11-14 Thread Juan Miguel Cejuela
Hi Kostas, thank you very much for your answer. Yes, I proposed the change in https://github.com/apache/flink/pull/4997 to compare as modificationTime < globalModificationTime (without accepting equals). Later, however, I realized, as you correctly point out, that this creates duplicates. The

Re: readFile, DataStream

2017-11-13 Thread Kostas Kloudas
Hi Juan, The problem is that once a file for a certain timestamp is processed and the global modification timestamp is modified, then all files for that timestamp are considered processed. The solution is not to remove the = from the modificationTime <= globalModificationTime; in

readFile, DataStream

2017-11-10 Thread Juan Miguel Cejuela
Hi there, I’m trying to watch a directory for new incoming files (with StreamExecutionEnvironment#readFile) with a subsecond latency (interval watch of ~100ms, and using the flag FileProcessingMode.PROCESS_CONTINUOUSLY ). If many files come in within (under) the interval watching time, flink