Re: Can not cancel with savepoint with Flink 1.3.2

2018-03-15 Thread Hao Sun
Is this related? 2018-03-16 03:43:42,557 INFO akka.actor.EmptyLocalActorRef - Message [org.apache.flink.runtime.metrics.dump.MetricDumpSerialization$MetricSerializationResult] from

Can not cancel with savepoint with Flink 1.3.2

2018-03-15 Thread Hao Sun
Hi, I am running flink on K8S and store states in s3 with rocksdb backend. I used to be able to cancel and savepointing through the rest api. But sometimes the process never finish. No matter how many time I try. Is there a way to figure out what is going wrong? Why "isStoppable"=>false? Thanks

Slow flink checkpoint

2018-03-15 Thread 林德强
Hi, I'm run a job on Flink streaming. I found with the increase in the number of 'InternalTimer' object the checkpoint more and more slowly. Is there any way to solve this problem ? such as make the "timeServiceManager" snapshot async. Thanks

documentation SEO

2018-03-15 Thread karim amer
any google or duckduckgo search results in flink 1.3 version of the doc at The top of the results instead of 1.4 or latest.

Re: Flink SSL Setup on a standalone cluster

2018-03-15 Thread Vinay Patil
Hi, Even tried with ip-address for JobManager.host.name property, but did not work. When I tried netstat -anp | grep 6123 , I see 3 TM connection state as established, however when I submit the job , I see two more entries with state as TIME_WAIT and after some time these entries are gone and I

Re: Dependency Injection and Flink

2018-03-15 Thread Stephan Ewen
Would it help to be able to register "initializers", meaning some classes/methods that will be called at every process entry point, to set up something like this? On Tue, Mar 13, 2018 at 7:56 PM, Steven Wu wrote: > Xiaochuan, > > We are doing exactly as you described. We

[ANNOUNCE] Apache Flink 1.3.3 released

2018-03-15 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink 1.3.3, which is the third bugfix release for the Apache Flink 1.3 series.  Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

Calling close() on Failure

2018-03-15 Thread Gregory Fee
Hello! I had a program lose a task manager the other day. The fail over back to a checkpoint and recovery worked like a charm. However, on one of my ProcessFunctions I defined a close() method and I noticed that it did not get called. To be clear, the task manager that failed was running that

Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-15 Thread Fabian Hueske
Great! Thanks for reporting back :-) Buffer timeout of 0ms is quite aggressive. You might sending buffers of (by default) 32KB that just contain a single record. Anyway, now you know the nobs to tune the latency. Cheers, Fabian 2018-03-15 21:00 GMT+01:00 Yan Zhou [FDS Science]

Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-15 Thread Yan Zhou [FDS Science]
Hi Fabian, Thank you for the information. After setting the watermark interval to 10ms and buffer timeout to 0 ms, the end-to-end latency is reduced to 5ms. I am very happy with the result and will go from there. Best Yan From: Fabian Hueske

Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-15 Thread Fabian Hueske
I see... Another issue might be the frequency with which you emit watermarks (in case you use a periodic watermark assigner). You can set the interval with StreamExecutionEnvironment.getConfig.setAutoWatermarkInterval() [1]. However, keep in mind that each watermark is an additional record which

Migration to Flip6 Kubernetes

2018-03-15 Thread Edward Rojas
Hello, Currently I have a Flink 1.4 cluster running on kubernetes based on the configuration describe on https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/kubernetes.html with additional config for HA with Zookeeper. With this I have several Taskmanagers, a single

Re: flink sql: "slow" performance on Over Window aggregation with time attribute set to event time

2018-03-15 Thread Yan Zhou [FDS Science]
Hi Fabian, Yes, it is typically not a good idea to generate watermark based on system time. I was setting the watermark based on system time with very little delay to see how fast my application could process the data. All the servers are sync with ntp and only 1ms difference with each other.

Re: TaskManager crashes with PageRank algorithm in Gelly

2018-03-15 Thread Greg Hogan
Termination of the TaskManager by the Linux OOM killer indicates an overallocation of memory and you have set "taskmanager.heap.mb: 139264” on machines with 136 GB. Even though you were able to (temporarily?) resolve the issue by enabling preallocation, you may see degraded performance if

Re: activemq connector not working..

2018-03-15 Thread Puneet Kinra
I tried getting this in logs.. 2018-03-15 20:59:38,154 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - No state backend has been configured, using default state backend (Memory / JobManager) 2018-03-15 20:59:38,296 INFO

Re: activemq connector not working..

2018-03-15 Thread Puneet Kinra
I tried in cluster as well . On Wed, Mar 14, 2018 at 10:01 PM, Timo Walther wrote: > Hi Puneet, > > are you running this job on the cluster or locally in your IDE? > > Regards, > Timo > > > Am 14.03.18 um 13:49 schrieb Puneet Kinra: > > Hi > > I used apache bahir connector

Re: Flink SSL Setup on a standalone cluster

2018-03-15 Thread Vinay Patil
Just an update, I am submitting the job from the master node, not using the normal flink run command to submit the job , but using Remote Execution Environment in code to do this. And in that I am passing the hostname which is same as provided in flink-conf.yaml Regards, Vinay Patil On Thu,

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-15 Thread Stephan Ewen
Just to double check: We are talking about a Flink PartitionNotFoundException , I assume? The split brain situation is a good hint - the minority

Re: Deserializing the InputFormat (org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat@50bb10fd) failed: unread block data after upgrade to 1.4.2

2018-03-15 Thread eSKa
we were jumping from version 1.3.1 (where everything worked fine) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Implement a sort inside the WindowFunction

2018-03-15 Thread Fabian Hueske
Hi Felipe, Just like the ReduceFunction, the WindowFunction is applied in the context of a single key. So, it will be called for each key and always just see a single record (the reduced record of the key). You'd have to add a non-keyed window (allWindow) for your sorting WindowFunction. Note

Re: Too many open files on Bucketing sink

2018-03-15 Thread Piotr Nowojski
Hi, There is an open similar issue: https://issues.apache.org/jira/browse/FLINK-8707 It’s still under investigation and it would be helpful if you could follow up the discussion there, run same diagnostics commands as Alexander Gardner did

Re: Restart hook and checkpoint

2018-03-15 Thread Fabian Hueske
If I understand fine-grained recovery correctly, one would still need to take checkpoints. Ashish would like to avoid checkpointing and accept to lose the state of the failed task. However, he would like to avoid losing more state than necessary due to restarting of tasks that did not fail.