Re: KafkaIO reading from latest offset when pipeline fails on FlinkRunner

2018-01-10 Thread Mingmin Xu
@Sushil, I have several jobs running on KafkaIO+FlinkRunner, hope my experience can help you a bit. For short, `ENABLE_AUTO_COMMIT_CONFIG` doesn't meet your requirement, you need to leverage exactly-once checkpoint/savepoint in Flink. The reason is, with `ENABLE_AUTO_COMMIT_CONFIG` KafkaIO

Re: KafkaIO reading from latest offset when pipeline fails on FlinkRunner

2018-01-10 Thread Raghu Angadi
How often does your pipeline checkpoint/snapshot? If the failure happens before the first checkpoint, the pipeline could restart without any state, in which case KafkaIO would read from latest offset. There is probably some way to verify if pipeline is restarting from a checkpoint. On Sun, Jan 7,

Re: FileBasedSource does not match files on HDFS

2018-01-10 Thread Jean-Baptiste Onofré
No problem, happy to help ;) Regards JB On 01/10/2018 11:15 AM, Shashank Prabhakara wrote: It was the missing hdfs filesystem extension dependency. Thanks Jean-Baptiste. Much appreciated. Regards, Shashank On Wed, Jan 10, 2018 at 2:09 PM, Jean-Baptiste Onofré

Re: FileBasedSource does not match files on HDFS

2018-01-10 Thread Shashank Prabhakara
It was the missing hdfs filesystem extension dependency. Thanks Jean-Baptiste. Much appreciated. Regards, Shashank On Wed, Jan 10, 2018 at 2:09 PM, Jean-Baptiste Onofré wrote: > Hi > > Do you have the beam hdfs filesystem extension in the dependencies ? Did > you define the

Re: jackson to parse options?

2018-01-10 Thread Romain Manni-Bucau
updated my junit 5 PR to show that: https://github.com/apache/beam/pull/4360/files#diff-578d1770f8b47ebbc1e74a2c19de9a6aR28 It doesn't remove jackson yet but exposes a nicer user interface for the config. I'm not fully clear on all the jackson usage yet, there are some round trips (PO -> json ->

Jenkins build is still unstable: beam_Release_NightlySnapshot #648

2018-01-10 Thread Apache Jenkins Server
See

Re: FileBasedSource does not match files on HDFS

2018-01-10 Thread Jean-Baptiste Onofré
Hi Do you have the beam hdfs filesystem extension in the dependencies ? Did you define the HADOOP_CONF_DIR env variable containing path to hdfs-site.xml ? Regards JB On 01/10/2018 08:55 AM, Shashank Prabhakara wrote: Hello, I'm testing some pipelines on a dataproc cluster with hadoop