Re: checkpoint always fails

2018-07-25 Thread vino yang
Hi Marvin, Thanks for reporting this issue. Can you share more details about the failed checkpoint, such as log, exception stack trace, which statebackend used, HA configuration? These information can help to trace the issue. Thanks, vino. 2018-07-26 10:12 GMT+08:00 Marvin777 : > Hi, all: >

Re: 答复: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-25 Thread vino yang
Hi Youjun, Thanks, you can try this but I am not sure if it works correctly. Because for the REST Client, there are quite a few changes from 1.4 to 1.5. Maybe you can customize the source code in 1.4 refer to specific implementation of 1.5? Another option, upgrade your Flink version. To Chesnay

checkpoint always fails

2018-07-25 Thread Marvin777
Hi, all: flink job can run normally, but checkpoint always fails, like this: [image: image.png] [image: image.png] checkpoint configuration: [image: image.png] thanks.

答复: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-25 Thread Yuan,Youjun
Thanks for the information. Forgot to mention, I am using Flink 1.4, the RestClusterClient seems don’t have the ability to retrieve the leader address. I did notice there is webMonitorRetrievalService member in Flink 1.5. I wonder if I can use

Re: Implement Joins with Lookup Data

2018-07-25 Thread ashish pok
Hi Michael, We are currently using 15 TMs with 4 cores and 4 slots each, 10GB of memory on each TM. We have 15 partitions on Kafka for stream and 6 for context/smaller stream. Heap is around 50%, GC is about 150ms and CPU loads are low. We may be able to reduce resources on this if need be. 

Running a Python streaming job with Java dependencies

2018-07-25 Thread Joe Malt
Hi, I'm trying to run a job with Flink's new Python streaming API but I'm running into issues with Java imports. I have a Jython project in IntelliJ with a lot of Java dependencies configured through Maven. I can't figure out how to make Flink "see" these dependencies. An example script that

Re: Questions on Unbounded number of keys

2018-07-25 Thread Chang Liu
Hi Till, Thanks for your reply. But I think maybe I did not make my question clear. My question is not about whether the States within each keyed operator instances will run out of memory. My question is about, whether the unlimited keyed operator instances themselves will run out of memory.

Regarding the use of JMX reporters for reporting custom (Gauge) metrics using notifications

2018-07-25 Thread Konstantinos Barmpis
I have been using the JMX reporter to gather data from a running flink instance, more specifically to access a Gauge in one of my functions. I am able to access this metric through: JMXClientListener listener = new JMXClientListener(); JMXServiceURL url = new

Re: Flink 1.5 batch job fails to start

2018-07-25 Thread Alex Vinnik
Hi Vino, Data is ok i double checked. Input is plain json and it can be processed by same code compiled and run on 1.3.1 flink. Thanks for the hint about avro and parquet versions. Got my fat jar synced up with flink 1.5.1 avro/parguet versions. Hope was high that it will help to resolve the

Re: Flink 1.5 batch job fails to start

2018-07-25 Thread Alex Vinnik
Hi Till, Server start up entrypoint log 2018-07-25T12:19:12.268+ [org.apache.flink.runtime.entrypoint.ClusterEntrypoint] INFO 2018-07-25T12:19:12.271+ [org.apache.flink.runtime.entrypoint.ClusterEntrypoint]

Re: Re: downgrade Flink

2018-07-25 Thread Vino yang
Hi, You are welcome. Hope for enhancing Flink ML in the future. Vino yang Thanks. On 2018-07-25 20:02 , Cederic Bosmans Wrote: Dear I was able to fix the problem. Thank you very much for your support! Kind regards  Cederic On Wed, Jul 25, 2018 at 1:57 PM Cederic Bosmans wrote: Dear I

Re: override jvm params

2018-07-25 Thread Hequn Cheng
Hi Cussac, If I understand correctly, you want to pass rules.consumer.topic=test and rules.consumer.topic=test to flink jvm. I think you can try: flink run -m $HOSTPORT -yD rules.consumer.topic=test -yD rules.consumer.topic=test Hope this helps. Hequn On Wed, Jul 25, 2018 at 3:26 PM, Cussac,

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Vinay Patil
Hi Chesnay, No error in the logs. That is why I am not able to understand why checkpoints are getting triggered. Regards, Vinay Patil On Wed, Jul 25, 2018 at 4:36 PM Chesnay Schepler wrote: > Please check the job- and taskmanager logs for anything suspicious. > > On 25.07.2018 12:33, Vinay

RE: override jvm params

2018-07-25 Thread Cussac, Franck
Hi Hequn, Thanks for your answer. I just tested and it doesn’t work. I’m using PureConfig to parse my conf files. With java I can override any argument using –D= syntax. How can I do same with flink in yarn mode ? Franck. De : Hequn Cheng [mailto:chenghe...@gmail.com] Envoyé : mercredi 25

"Futures timed out" when trying to cancel a job with savepoint

2018-07-25 Thread Julio Biason
Hey guys, We just built a brand new Flink 1.4.0 cluster with HA and everything seems to be working fine, but we are getting some errors with savepoints. For example, I have a running job -- Running/Restarting Jobs --- 25.07.2018 11:55:18 :

Re: job status monitor

2018-07-25 Thread Renjie Liu
You can use rest api here https://ci.apache.org/projects/flink/flink-docs-release-1.5/monitoring/rest_api.html On Wed, Jul 25, 2018 at 5:18 PM 从六品州同 <26304...@qq.com> wrote: > dear all: > Is there a notification mechanism for Flink? When job's status changes, > such as restart, failure, notify

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Chesnay Schepler
Can you provide us with the job code? I assume that checkpointing runs properly if you submit the same job to a normal cluster? On 25.07.2018 13:15, Vinay Patil wrote: No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered. Regards, Vinay

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Vinay Patil
No error in the logs. That is why I am not able to understand why checkpoints are not getting triggered. Regards, Vinay Patil On Wed, Jul 25, 2018 at 4:44 PM Vinay Patil wrote: > Hi Chesnay, > > No error in the logs. That is why I am not able to understand why > checkpoints are getting

Re: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-25 Thread vino yang
Hi Martin, For a standalone cluster which exists multiple JM instances, If you do not use Rest API, but use Flink provided Cluster client. The client can perceive which one this the JM leader from multiple JM instances. For example, you can use CLI to submit flink job in a non-Leader node. But

Re: downgrade Flink

2018-07-25 Thread vino yang
Hi Cederic, The README said the "*latest*" version is 1.3.2, it means as of that time. I think for the latest Flink version, there is no guarantee. I suggest replace the version's value from LATEST to specific version number. Thanks, vino. 2018-07-25 15:49 GMT+08:00 Cederic Bosmans : > Dear

Re: Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Chesnay Schepler
Please check the job- and taskmanager logs for anything suspicious. On 25.07.2018 12:33, Vinay Patil wrote: Hi, I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am

Checkpointing not happening in Standalone HA mode

2018-07-25 Thread Vinay Patil
Hi, I am starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. The HA cluster is formed correctly and I am able to submit jobs to this cluster using RemoteExecutionEnvironment but when I enable checkpointing in code I

Re: Query regarding rest.port property

2018-07-25 Thread Vinay Patil
Thanks Chesnay for your inputs. I am actually starting the cluster using bootstrap application where in I am calling Job Manager and Task Manager main class to form the cluster. So I have removed flink-runtime-web dependency and used only flink_runtime dependency for forming the cluster , but

Re: S3 file source - continuous monitoring - many files missed

2018-07-25 Thread Fabian Hueske
Hi, First of all, the ticket reports a bug (or improvement or feature suggestion) such that others are aware of the problem and understand its cause. At some point it might be picked up and implemented. In general, there is no guarantee whether or when this happens, but the Flink community is of

override jvm params

2018-07-25 Thread Cussac, Franck
Hi, Following the documentation I want to use -yD option to override some params in my conf like this : flink run -m $HOSTPORT -yD "env.java.opts.taskmanager=-Drules.consumer.topic=test" -yD "env.java.opts.jobmanager=-Drules.consumer.topic=test" myjar mymain but it is just ignored. Nothing

Re: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-25 Thread Martin Eden
Hi, This is actually very relevant to us as well. We want to deploy Flink 1.3.2 on a 3 node DCOS cluster. In the case of Mesos/DCOS, Flink HA runs only one JobManager which gets restarted on another node by Marathon in case of failure and re-load it's state from Zookeeper. Yuan I am guessing

Re: job status monitor

2018-07-25 Thread Chesnay Schepler
There's no built-in mechanism to send notifications. To monitor the job status you can poll the REST API. (*/jobs/:jobid) *Alternatively you could implement a MetricReporter that explicitly checks for the availability metrics

RocksDB state backend Checkpointing Failed

2018-07-25 Thread Marvin777
Hi,all: Checkpoint always fails, like this: https://jira.apache.org/jira/browse/FLINK-9945 [image: image.png] thanks.

job status monitor

2018-07-25 Thread ??????????
dear all??Is there a notification mechanism for Flink? When job's status changes, such as restart, failure, notify other systems?? Or I have a system to monitor the job state of Flink. What should I do? thanks

Re: S3 file source - continuous monitoring - many files missed

2018-07-25 Thread Averell
Thank you Fabian for the guide to implement the fix. I'm not quite clear about the best practice of creating JIRA ticket. I modified its priority to Major as you said that it is important. What would happen next with that issue then? Someone (anyone) will pick it and create a fix, then include

Re: S3 file source - continuous monitoring - many files missed

2018-07-25 Thread Fabian Hueske
Hi, Thanks for creating the Jira issue. I'm not sure if I would consider this a blocker but it is certainly an important problem to fix. Anyway, in the original version Flink checkpoints the modification timestamp up to which all files have been read (or at least up to which point it *thinks* to

Re: Implement Joins with Lookup Data

2018-07-25 Thread Michael Gendelman
Hi Ashish, We are planning for a similar use case and I was wondering if you can share the amount of resources you have allocated for this flow? Thanks, Michael On Tue, Jul 24, 2018, 18:57 ashish pok wrote: > BTW, > > We got around bootstrap problem for similar use case using a “nohup” topic

Re: Flink 1.5 batch job fails to start

2018-07-25 Thread Till Rohrmann
Hi Alex, could you share with us the full logs of the client and the cluster entrypoint? That would be tremendously helpful. Cheers, Till On Wed, Jul 25, 2018 at 4:08 AM vino yang wrote: > Hi Alex, > > Is it possible that the data has been corrupted? > > Or have you confirmed that the avro

***UNCHECKED*** Re: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-25 Thread vino yang
Hi Yuan Youjun, Actually, RestClusterClient has a method named getWebMonitorBaseUrl which will retrieve the webmonitor's leader address when you submit job automatically.[1] Ideally, you do not need to retrieve JM by yourself. Currently, the webmonitor is binding with JobManager, maybe if JM