Question about Kafka Flink consumer parallelism

2019-05-30 Thread wang xuchen
Hi Flink users, I am trying to figure out how leverage parallelism to improve throughput of a Kafka consumer. From my research, I understand the scenario when *kafka partitions (=<>) # consumer and * to use rebalance spread messages evenly across workers. Also use setParallelism(#) to achieve

Re: Throttling/effective back-pressure on a Kafka sink

2019-05-30 Thread Derek VerLee
Was any progress ever made on this?  We have seen the same issue in the past.  What I do remember is, whatever I set max.block.ms to, is when the job crashes. I am going to attempt to reproduce the issue again and will report back. On 3/28/19

Pipeline TimeOut with Beam SideInput

2019-05-30 Thread bjbq4d
Hi everyone, I've made a Beam pipeline that makes use of a SideInput which in my case is a Map of key/values. I'm running Flink (1.7.1) on yarn (hadoop 2.6.0). I've found that if my map is small enough everything works fine but if I make it large enough (2-3MB) the pipeline fails with,

Re: [External] Flink 1.7.1 on EMR metrics

2019-05-30 Thread Peter Groesbeck
Hi Padarn for what it's worth I am using DataDog metrics on EMR with Flink 1.7.1 and this here my flink-conf configuration: - Classification: flink-conf ConfigurationProperties: metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter

Re: [External] Flink 1.7.1 on EMR metrics

2019-05-30 Thread Yun Tang
Hi Padarn If you want to verify why no metrics sending out, how about using the built-in Slf4j reporter [1] which would record metrics in logs. If you could view the metrics after enabled slf4j-reporter, you could then compare the configurations. Best Yun Tang [1]

[External] Flink 1.7.1 on EMR metrics

2019-05-30 Thread Padarn Wilson
Hello all, I am trying to run Flink 1.7.1 on EMR and having some trouble with metric reporting. I was using the DataDogHttpReporter, but have also tried the StatsDReporter, but with both was seeing no metrics being collected. To debug this I implemented my own reporter (based on StatsDReporter)

How to build dependencies and connections between stream jobs?

2019-05-30 Thread 徐涛
Hi Experts, In batch computing, there are products like Azkaban or airflow to manage batch job dependencies. By using the dependency management tool, we can build a large-scale system consist of small jobs. In stream processing, it is not practical to put all dependencies in one