Re: SideOutput Issue

2018-04-05 Thread Chesnay Schepler
We were able to reproduce the issue. It was caused by calling getSideOutput() and split() on a single DataStream, which isn't properly handled by Flink. As a work-around one can add a no-op map function before the split() call. I've filed FLINK-9141

Re: Task Manager fault tolerance does not work

2018-04-05 Thread Fabian Hueske
Hi, Thanks for the feedback! As Till explained, the problem is that the JM first tries to schedule the job to the failed TM (which hasn't been detected as failed yet). The configured three restart attempts are "consumed" by these attempts and the job fails afterwards. Best, Fabian 2018-04-05 8:1

Re: KeyedSream question

2018-04-05 Thread Fabian Hueske
Amit is correct. keyBy() ensures that all records with the same key are processed by the same paralllel instance of a function. This is different from "a parallel instance only sees records of one key". I had a look at the docs [1]. I agree that "Logically partitions a stream into disjoint partiti

Re: Kafka Consumers Partition Discovery doesn't work

2018-04-05 Thread Juho Autio
Still not working after I had a fresh build from https://github.com/apache/flink/tree/release-1.5. When the job starts this is logged: 2018-04-05 09:29:38,157 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: flink.partition-discovery.interval-mil

Re: KeyedSream question

2018-04-05 Thread Michael Latta
Thanks for the clarification. I was just trying to understand the intended behavior. It would have been nice if Flink tracked state for downstream operators by key, but I can do that with a map in the downstream functions. Michael Sent from my iPad > On Apr 5, 2018, at 2:30 AM, Fabian Hueske

Restart strategy defined in flink-conf.yaml is ignored

2018-04-05 Thread Alexander Smirnov
Hello, I've defined restart strategy in flink-conf.yaml as none. WebUI / Job Manager section confirms that. But looks like this setting is disregarded. When I go into job's configuration in the WebUI, in the Execution Configuration section I can see: Max. number of execution retries

Re: Checkpoints very slow with high backpressure

2018-04-05 Thread Edward
I read through this thread and didn't see any resolution to the slow checkpoint issue (just that someone resolved their backpressure issue). We are experiencing the same problem: - When there is no backpressure, checkpoints take less than 100ms - When there is high backpressure, checkpoints take

Re: Kafka exceptions in Flink log file

2018-04-05 Thread Alexander Smirnov
Hi Timo, it is the latest released version - 1.4.2 This only happens when a job falls into a restart loop and stays in it for 20 minutes or so. Looks like for each restart, Flink loads classes anew, but previously loaded classes are not garbage collected for some reason (still referenced?) Very

Re: Checkpoints very slow with high backpressure

2018-04-05 Thread Piotr Nowojski
Hi, If I’m not mistaken this is a known issue, that we were working to resolve for Flink 1.5 release. The problem is that with back pressure, data are being buffered between nodes and on checkpoint, all of those data must be processed before checkpoint can be completed. This is especially probl

Re: Restart strategy defined in flink-conf.yaml is ignored

2018-04-05 Thread Piotr Nowojski
Hi, Can you provide more details, like post your configuration/log files/screen shots from web UI and Flink version being used? Piotrek > On 5 Apr 2018, at 06:07, Alexander Smirnov > wrote: > > Hello, > > I've defined restart strategy in flink-conf.yaml as none. WebUI / Job Manager > secti

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-04-05 Thread Till Rohrmann
Hi Juho, you are right that due to a limitation in the Yarn proxy [1] we cannot directly contact the cluster through the Yarn proxy. The way it works at the moment is that the Flink client retrieves the AM's hostname through the ApplicationReport and then directly talks to the AM. This of course

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-04-05 Thread Juho Autio
Thanks for the answer. Wrapping with GET sounds good to me. You said next version; do you mean that Flink 1.5 would already include this improvement when it's released? On Thu, Apr 5, 2018 at 2:40 PM, Till Rohrmann wrote: > Hi Juho, > > you are right that due to a limitation in the Yarn proxy [1

Re: sharebuffer prune code

2018-04-05 Thread Kostas Kloudas
Hi Aitozi, I think you are correct. Could you open a JIRA and share it here? Thanks, Kostas > On Apr 2, 2018, at 7:19 AM, aitozi wrote: > > Hi, > > i am running into a cep bug : it always running into failed to find previous > sharebufferEntry, i think it may be caused by prune the sharebuffe

Re: Checkpoints very slow with high backpressure

2018-04-05 Thread Edward
Thanks for the update Piotr. The reason it prevents us from using checkpoints is this: We are relying on the checkpoints to trigger commit of Kafka offsets for our source (kafka consumers). When there is no backpressure this works fine. When there is backpressure, checkpoints fail because they tak

Re: Checkpoints very slow with high backpressure

2018-04-05 Thread Piotr Nowojski
Thanks for the explanation. I hope that either 1.5 will solve your issue (please let us know if it doesn’t!) or if you can’t wait, that decreasing memory buffers can mitigate the problem. Piotrek > On 5 Apr 2018, at 08:13, Edward wrote: > > Thanks for the update Piotr. > > The reason it pre

Re: REST API "broken" on YARN because POST is not allowed via YARN proxy

2018-04-05 Thread Till Rohrmann
This improvement is unfortunately out of scope for the 1.5 release since the feature freeze is already quite some time ago. But I hope that this improvement will make it into the 1.6 release. Cheers, Till On Thu, Apr 5, 2018 at 4:45 PM, Juho Autio wrote: > Thanks for the answer. Wrapping with G

Re: Restart strategy defined in flink-conf.yaml is ignored

2018-04-05 Thread Alexander Smirnov
jobmanager.log: *2018-04-05 22:37:28,348 INFO org.apache.flink.configuration.GlobalConfiguration- Loading configuration property: restart-strategy, none* 2018-04-05 22:37:28,353 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. Th

Re: Restart strategy defined in flink-conf.yaml is ignored

2018-04-05 Thread Piotr Nowojski
Hi, Thanks for the details! I can confirm this behaviour. flink-conf.yaml restart-strategy value is being completely ignored (regardless of it’s value) when user enables checkpointing: env.enableCheckpointing(5000, CheckpointingMode.EXACTLY_ONCE); I suspect this is a bug, but I have to confirm

Re: Unable to load AWS credentials: Flink 1.2.1 + S3 + Kubernetes

2018-04-05 Thread Bajaj, Abhinav
Hi, Thanks for your replies and help. We found the root cause and fixed the problem. I will share the details in detail below for everyone but here’s the meat – We had to add the IAM role to the taskmanager pods. The exception I noticed in Jobmanager logs originated from taskmanager. As soon as

Flink Client job submission through SSL

2018-04-05 Thread Sampath Bhat
Hello I would like to know if the job submission through flink command line say ./bin/flink run can be authenticated. Like if SSL is enabled then will the job submission require SSL certificates. But I don't see any behavior as such. Simple flink run is able to submit the job even if SSL is enabl