Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-23 Thread Chesnay Schepler
controller is used to manage JobManager. Am I right? *From:* Chesnay Schepler *Sent:* Saturday, August 22, 2020 12:58 AM *To:* Alexey Trenikhun ; Piotr Nowojski *Cc:* Flink User Mail List *Subject:* Re: Flink Job cluster

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-22 Thread Alexey Trenikhun
: Flink Job cluster in HA mode - recovery vs upgrade If, and only if, the cluster-id and JobId are identical then the JobGraph will be recovered from ZooKeeper. On 22/08/2020 06:12, Alexey Trenikhun wrote: Not sure I that I understand your statement about "the HaServices are only being

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-22 Thread Chesnay Schepler
failed job will be over written by new one which will have same job-id? *From:* Chesnay Schepler *Sent:* Friday, August 21, 2020 12:16 PM *To:* Alexey Trenikhun ; Piotr Nowojski *Cc:* Flink User Mail List *Subject:* Re: Flink J

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-21 Thread Alexey Trenikhun
0 12:16 PM To: Alexey Trenikhun ; Piotr Nowojski Cc: Flink User Mail List Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade The HaServices are only being given the JobGraph, to this is not possible. Actually I have to correct myself. For a job cluster the state in HA should be irrelevant w

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-21 Thread Chesnay Schepler
Schepler *Cc:* Alexey Trenikhun ; Flink User Mail List *Subject:* Re: Flink Job cluster in HA mode - recovery vs upgrade Thank you for the clarification Chesney and sorry for the incorrect previous answer. Piotrek czw., 20 sie 2020 o 15:59 Chesnay Schepler <mailto:ches...@apache.org>>

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-21 Thread Alexey Trenikhun
: Piotr Nowojski Sent: Thursday, August 20, 2020 7:04 AM To: Chesnay Schepler Cc: Alexey Trenikhun ; Flink User Mail List Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade Thank you for the clarification Chesney and sorry for the incorrect previous answer. Piotrek czw., 20 sie

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-20 Thread Piotr Nowojski
as long as the operator UIDs are the same before > and after the upgrade (for operator state to match before and after the > upgrade). > > Best, Piotrek > > czw., 20 sie 2020 o 06:34 Alexey Trenikhun napisał(a): > >> Hello, >> >> Let's say I run Flink Job clu

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-20 Thread Chesnay Schepler
com>> napisał(a): Hello, Let's say I run Flink Job cluster with persistent storage and Zookeeper HA on k8s with single  JobManager and use externalized checkpoints. When JM crashes, k8s will restart JM pod, and JM will read JobId and JobGraph from ZK and restore from latest c

Re: Flink Job cluster in HA mode - recovery vs upgrade

2020-08-20 Thread Piotr Nowojski
before and after the upgrade (for operator state to match before and after the upgrade). Best, Piotrek czw., 20 sie 2020 o 06:34 Alexey Trenikhun napisał(a): > Hello, > > Let's say I run Flink Job cluster with persistent storage and Zookeeper HA > on k8s with single JobMan

Flink Job cluster in HA mode - recovery vs upgrade

2020-08-19 Thread Alexey Trenikhun
Hello, Let's say I run Flink Job cluster with persistent storage and Zookeeper HA on k8s with single JobManager and use externalized checkpoints. When JM crashes, k8s will restart JM pod, and JM will read JobId and JobGraph from ZK and restore from latest checkpoint. Now let's say I want

Re: Tools for Flink Job performance testing

2020-08-13 Thread narasimha
Thanks, Arvid. The guide was helpful in how to start working with Flink. I'm currently exploring SQL/Table API. Will surely come back for queries on it. On Thu, Aug 13, 2020 at 1:25 PM Arvid Heise wrote: > Hi, > > performance testing is quite vague. Usually you start by writing a small >

Re: Flink job percentage

2020-08-13 Thread Arvid Heise
Hi Flavio, This is a daunting task to implement properly. There is an easy fix in related workflow systems though. Assuming that it's a rerunning task, then you simply store the run times of the last run, use some kind of low-pass filter (=decaying average) and compare the current runtime with

Re: Tools for Flink Job performance testing

2020-08-13 Thread Arvid Heise
Hi, performance testing is quite vague. Usually you start by writing a small first version of your pipeline and check how the well computation scales on your data. Flink's web UI [1] already helps quite well for the first time. Usually you'd also add some metric system and look for advanced

Re: Flink job percentage

2020-08-11 Thread Robert Metzger
Hi Flavio, I'm not aware of such a heuristic being implemented anywhere. You need to come up with something yourself. On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier wrote: > Hi to all, > one of our customers asked us to see a percentage of completion of a Flink > Batch job. Is there any

Tools for Flink Job performance testing

2020-08-10 Thread narasimha
Hi, I'm new to the streaming world, checking on Performance testing tools. Are there any recommended Performance testing tools for Flink? -- A.Narasimha Swamy

Flink job percentage

2020-08-07 Thread Flavio Pompermaier
Hi to all, one of our customers asked us to see a percentage of completion of a Flink Batch job. Is there any already implemented heuristic I can use to compute it? Will this be possible also when DataSet api will migrate to DataStream..? Thanks in advance, Flavio

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-06 Thread Eleanore Jin
the monitoring? In production, I have hundreds of small >> flink jobs running (2-8 TM pods) doing stateless processing, it is really >> hard for us to expose ingress for each JM rest endpoint to periodically >> query the job status for each flink job. >> >> Thanks a

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-06 Thread Eleanore Jin
the monitoring? In production, I have hundreds of small >> flink jobs running (2-8 TM pods) doing stateless processing, it is really >> hard for us to expose ingress for each JM rest endpoint to periodically >> query the job status for each flink job. >> >> Thanks a

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-06 Thread Yang Wang
p, but if this will no longer be the case, how > can we deal with the monitoring? In production, I have hundreds of small > flink jobs running (2-8 TM pods) doing stateless processing, it is really > hard for us to expose ingress for each JM rest endpoint to periodically > quer

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-06 Thread Yang Wang
p, but if this will no longer be the case, how > can we deal with the monitoring? In production, I have hundreds of small > flink jobs running (2-8 TM pods) doing stateless processing, it is really > hard for us to expose ingress for each JM rest endpoint to periodically > quer

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Eleanore Jin
be the case, how can we deal with the monitoring? In production, I have hundreds of small flink jobs running (2-8 TM pods) doing stateless processing, it is really hard for us to expose ingress for each JM rest endpoint to periodically query the job status for each flink job. Thanks a lot

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Eleanore Jin
be the case, how can we deal with the monitoring? In production, I have hundreds of small flink jobs running (2-8 TM pods) doing stateless processing, it is really hard for us to expose ingress for each JM rest endpoint to periodically query the job status for each flink job. Thanks a lot

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Till Rohrmann
54写道: >>> >>>> Hi Yang & Till, >>>> >>>> Thanks for your prompt reply! >>>> >>>> Yang, regarding your question, I am actually not using k8s job, as I >>>> put my app.jar and its dependencies under flink's lib dir

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Till Rohrmann
54写道: >>> >>>> Hi Yang & Till, >>>> >>>> Thanks for your prompt reply! >>>> >>>> Yang, regarding your question, I am actually not using k8s job, as I >>>> put my app.jar and its dependencies under flink's lib dir

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Yang Wang
Till, >>> >>> Thanks for your prompt reply! >>> >>> Yang, regarding your question, I am actually not using k8s job, as I put >>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s >>> deployment for job manager, and 1 k8s d

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Yang Wang
Till, >>> >>> Thanks for your prompt reply! >>> >>> Yang, regarding your question, I am actually not using k8s job, as I put >>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s >>> deployment for job manager, and 1 k8s d

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Till Rohrmann
! >> >> Yang, regarding your question, I am actually not using k8s job, as I put >> my app.jar and its dependencies under flink's lib directory. I have 1 k8s >> deployment for job manager, and 1 k8s deployment for task manager, and 1 >> k8s service for job manager.

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Till Rohrmann
! >> >> Yang, regarding your question, I am actually not using k8s job, as I put >> my app.jar and its dependencies under flink's lib directory. I have 1 k8s >> deployment for job manager, and 1 k8s deployment for task manager, and 1 >> k8s service for job manager.

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Yang Wang
tion, I am actually not using k8s job, as I put > my app.jar and its dependencies under flink's lib directory. I have 1 k8s > deployment for job manager, and 1 k8s deployment for task manager, and 1 > k8s service for job manager. > > As you mentioned above, if flink job is marked

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-05 Thread Yang Wang
tion, I am actually not using k8s job, as I put > my app.jar and its dependencies under flink's lib directory. I have 1 k8s > deployment for job manager, and 1 k8s deployment for task manager, and 1 > k8s service for job manager. > > As you mentioned above, if flink job is marked

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Eleanore Jin
ager. As you mentioned above, if flink job is marked as failed, it will cause the job manager pod to be restarted. Which is not the ideal behavior. Do you suggest that I should change the deployment strategy from using k8s deployment to k8s job? In case the flink program exit with non-zero code

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Eleanore Jin
ager. As you mentioned above, if flink job is marked as failed, it will cause the job manager pod to be restarted. Which is not the ideal behavior. Do you suggest that I should change the deployment strategy from using k8s deployment to k8s job? In case the flink program exit with non-zero code

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Yang Wang
>>> FAILED, which causes k8s to restart the pod, this seems not help at all, >>> what are the suggestions for such scenario? >>> >>> Thanks a lot! >>> Eleanore >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-relea

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Yang Wang
>>> FAILED, which causes k8s to restart the pod, this seems not help at all, >>> what are the suggestions for such scenario? >>> >>> Thanks a lot! >>> Eleanore >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-relea

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Till Rohrmann
lib. >> >> So my question is more like, in this case, if the job is marked as >> FAILED, which causes k8s to restart the pod, this seems not help at all, >> what are the suggestions for such scenario? >> >> Thanks a lot! >> Eleanore >> >> [

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Till Rohrmann
lib. >> >> So my question is more like, in this case, if the job is marked as >> FAILED, which causes k8s to restart the pod, this seems not help at all, >> what are the suggestions for such scenario? >> >> Thanks a lot! >> Eleanore >> >> [

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Yang Wang
seems not help at all, what are > the suggestions for such scenario? > > Thanks a lot! > Eleanore > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes > > On Mon, Aug 3, 2020 at 2:13 AM Till

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-04 Thread Yang Wang
seems not help at all, what are > the suggestions for such scenario? > > Thanks a lot! > Eleanore > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes > > On Mon, Aug 3, 2020 at 2:13 AM Till

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-03 Thread Eleanore Jin
as FAILED, which causes k8s to restart the pod, this seems not help at all, what are the suggestions for such scenario? Thanks a lot! Eleanore [1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes On Mon, Aug 3, 2020 at 2:13 AM

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-03 Thread Eleanore Jin
as FAILED, which causes k8s to restart the pod, this seems not help at all, what are the suggestions for such scenario? Thanks a lot! Eleanore [1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes On Mon, Aug 3, 2020 at 2:13 AM

Re: How long it took a Flink Job to start up ?

2020-08-03 Thread Till Rohrmann
/rest_api.html#jobs-jobid Cheers, Till On Fri, Jul 31, 2020 at 7:56 PM Vijay Balakrishnan wrote: > Hi, > I am trying to figure how long it took a Flink Job to start up ? > I used /jobs/overview and it gave me just the start-time as a long value. > The Flink DashBoard UI shows the Start-T

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-03 Thread Till Rohrmann
nce the pods > are not running. However, kubernetes will then restart the job again as the > available replicas do not match the desired one. > > I wonder what are the suggestions for such a scenario? How should I > configure the flink job running on k8s? > > Thanks a lot! > Eleanore >

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

2020-08-03 Thread Till Rohrmann
nce the pods > are not running. However, kubernetes will then restart the job again as the > available replicas do not match the desired one. > > I wonder what are the suggestions for such a scenario? How should I > configure the flink job running on k8s? > > Thanks a lot! > Eleanore >

Behavior for flink job running on K8S failed after restart strategy exhausted

2020-07-31 Thread Eleanore Jin
. However, kubernetes will then restart the job again as the available replicas do not match the desired one. I wonder what are the suggestions for such a scenario? How should I configure the flink job running on k8s? Thanks a lot! Eleanore

Behavior for flink job running on K8S failed after restart strategy exhausted

2020-07-31 Thread Eleanore Jin
. However, kubernetes will then restart the job again as the available replicas do not match the desired one. I wonder what are the suggestions for such a scenario? How should I configure the flink job running on k8s? Thanks a lot! Eleanore

Re: flink job 跑一段时间 watermark 不推进的问题

2020-07-20 Thread shizk233
Hi, Flink metrics里有一项是task相关的指标currentWatermark,从中可以知道subtask_index,task_name,watermark三项信息,应该能帮助排查watermark的推进情况。 Best, shizk233 snack white 于2020年7月20日周一 下午3:51写道: > HI: > flink job 跑一段时间 watermark 不推进,任务没挂,source 是 kafka ,kafka 各个partition > 均有数据, flink job statue backend 为

??????flink job ?????????? watermark ????????????

2020-07-20 Thread Cayden chen
??idea??local?? ---- ??: "user-zh"

flink job 跑一段时间 watermark 不推进的问题

2020-07-20 Thread snack white
HI: flink job 跑一段时间 watermark 不推进,任务没挂,source 是 kafka ,kafka 各个partition 均有数据, flink job statue backend 为 memory 。有debug 的姿势推荐吗? 看过 CPU GC 等指标,看不出来有异常。 Best regards! white

Re: ERROR submmiting a flink job

2020-07-15 Thread Yun Tang
y, July 15, 2020 7:29 To: user@flink.apache.org Subject: ERROR submmiting a flink job Hello Guys, I am trying to launch a FLINK app on a distance server, but I have this error message. org.apache.flink.client.program.ProgramInvocationException: Th

ERROR submmiting a flink job

2020-07-14 Thread Aissa Elaffani
Hello Guys, I am trying to launch a FLINK app on a distance server, but I have this error message. org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID:

Re: Flink job不定期就会重启,版本是1.9

2020-07-03 Thread zhisheng
3, 2020 at 10:48 AM noon cjihg wrote: > > > Hi,大佬们 > > > > Flink job经常不定期重启,看了异常日志基本都是下面这种,可以帮忙解释下什么原因吗? > > > > 2020-07-01 20:20:43.875 [flink-akka.actor.default-dispatcher-27] INFO > > akka.remote.RemoteActorRefProvider$RemotingTerminator > > flink-akka.

Re: Flink job不定期就会重启,版本是1.9

2020-07-02 Thread Xintong Song
从报错信息看是 Akka 的 RPC 调用超时,因为是 LocalFencedMessage 所以基本上可以排除网络问题。 建议看一下 JM 进程的 GC 压力以及线程数量,是否存在压力过大 RPC 来不及响应的情况。 Thank you~ Xintong Song On Fri, Jul 3, 2020 at 10:48 AM noon cjihg wrote: > Hi,大佬们 > > Flink job经常不定期重启,看了异常日志基本都是下面这种,可以帮忙解释下什么原因吗? > > 2020-07-01 20:2

Flink job不定期就会重启,版本是1.9

2020-07-02 Thread noon cjihg
Hi,大佬们 Flink job经常不定期重启,看了异常日志基本都是下面这种,可以帮忙解释下什么原因吗? 2020-07-01 20:20:43.875 [flink-akka.actor.default-dispatcher-27] INFO akka.remote.RemoteActorRefProvider$RemotingTerminator flink-akka.remote.default-remote-dispatcher-22 - Remoting shut down. 2020-07-01 20:20:43.875 [flink-akka.actor.default

Re: Unable to run flink job in dataproc cluster with jobmanager provided

2020-06-22 Thread Chesnay Schepler
creating a local environment via (Stream)ExecutionEnvironment#createLocalEnvironment? On 17/06/2020 17:05, Sourabh Mehta wrote: Hi Team, I'm  exploring flink for one of my use case, I'm facing some issues while running a flink job i

Re: flink job自动checkpoint是成功,手动checkpoint失败

2020-06-20 Thread Congxian Qiu
Hi 这里手动 Checkpoint 是指 Savepoint 吧。从栈看是因为超时了,有可能是 savepoint 比较慢导致的。 这个你可以看一下 JM log,看看是否 savepoint 很久才完成。 另外,可以描述下你们使用 savepoint 的主要场景吗? 1. 为什么要使用 savepoint 2. 在你们的场景中能否用 checkpoint 代替 savepoint 呢? Best, Congxian Zhou Zach 于2020年6月19日周五 下午3:25写道: > > > > > 2020-06-19 15:11:18,361 INFO

flink job自动checkpoint是成功,手动checkpoint失败

2020-06-19 Thread Zhou Zach
2020-06-19 15:11:18,361 INFO org.apache.flink.client.cli.CliFrontend - Triggering savepoint for job e229c76e6a1b43142cb4272523102ed1. 2020-06-19 15:11:18,378 INFO org.apache.flink.client.cli.CliFrontend - Waiting for response... 2020-06-19

Re: Unable to run flink job in dataproc cluster with jobmanager provided

2020-06-18 Thread Sourabh Mehta
ting a local environment via >>> (Stream)ExecutionEnvironment#createLocalEnvironment? >>> >>> On 17/06/2020 17:05, Sourabh Mehta wrote: >>> >>> Hi Team, >>> >>> I'm exploring flink for one of my use case, I'm facing some issues >>

Re: Unable to run flink job in dataproc cluster with jobmanager provided

2020-06-17 Thread Till Rohrmann
> wrote: > >> Are you by any chance creating a local environment via >> (Stream)ExecutionEnvironment#createLocalEnvironment? >> >> On 17/06/2020 17:05, Sourabh Mehta wrote: >> >> Hi Team, >> >> I'm exploring flink for one of my use case, I'm facing

Re: Unable to run flink job in dataproc cluster with jobmanager provided

2020-06-17 Thread Sourabh Mehta
one of my use case, I'm facing some issues while > running a flink job in cluster mode. Below are the steps I followed to > setup and run job in cluster mode : > 1. Setup flink on google cloud dataproc using > https://github.com/GoogleCloudDataproc/initialization-actions/tree/

Re: Unable to run flink job in dataproc cluster with jobmanager provided

2020-06-17 Thread Chesnay Schepler
Are you by any chance creating a local environment via (Stream)ExecutionEnvironment#createLocalEnvironment? On 17/06/2020 17:05, Sourabh Mehta wrote: Hi Team, I'm  exploring flink for one of my use case, I'm facing some issues while running a flink job in cluster mode. Below are the steps I

Unable to run flink job in dataproc cluster with jobmanager provided

2020-06-17 Thread Sourabh Mehta
Hi Team, I'm exploring flink for one of my use case, I'm facing some issues while running a flink job in cluster mode. Below are the steps I followed to setup and run job in cluster mode : 1. Setup flink on google cloud dataproc using https://github.com/GoogleCloudDataproc/initialization-actions

Re: The Flink job recovered with wrong checkpoint state.

2020-06-15 Thread Thomas Huang
@Yun Tang<mailto:myas...@live.com>,Thanks. From: Yun Tang Sent: Monday, June 15, 2020 11:30 To: Thomas Huang ; Flink Subject: Re: The Flink job recovered with wrong checkpoint state. Hi Thomas The answer is yes. Without high availability, once the job m

Re: The Flink job recovered with wrong checkpoint state.

2020-06-14 Thread Yun Tang
To: Flink Subject: The Flink job recovered with wrong checkpoint state. Hi Flink Community, Currently, I'm using yarn-cluster mode to submit flink job on yarn, and I haven't set high availability configuration (zookeeper), but set restart strategy: env.getConfig.setRestartStrategy

The Flink job recovered with wrong checkpoint state.

2020-06-14 Thread Thomas Huang
Hi Flink Community, Currently, I'm using yarn-cluster mode to submit flink job on yarn, and I haven't set high availability configuration (zookeeper), but set restart strategy: env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 3000)) the attempt time is 10 and the wait

Re: Is it possible to change 'connector.startup-mode' option in the flink job

2020-05-18 Thread Thomas Huang
Hi Jingsong, Cool, Thanks for your reply. Best wishes. From: Jingsong Li Sent: Tuesday, May 19, 2020 10:46 To: Thomas Huang Cc: Flink Subject: Re: Is it possible to change 'connector.startup-mode' option in the flink job Hi Thomas, Good to hear from you

Re: Is it possible to change 'connector.startup-mode' option in the flink job

2020-05-18 Thread Jingsong Li
Hi Thomas, Good to hear from you. This is a very common problem. In 1.11, we have two FLIP to solve your problem. [1][2] You can take a look. I think dynamic table options (table hints) is enough for your requirement. [1]

Is it possible to change 'connector.startup-mode' option in the flink job

2020-05-18 Thread Thomas Huang
Hi guys, I'm using hive to store kafka topic metadata as follows:: CREATE TABLE orders ( user_idBIGINT, productSTRING, order_time TIMESTAMP(3), WATERMARK FOR order_time AS order_time - '5' SECONDS ) WITH ( 'connector.type' = 'kafka',

Re: No Slots available exception in Apache Flink Job Manager while Scheduling

2020-05-08 Thread Xintong Song
Linking to the jira ticket, for the record. https://issues.apache.org/jira/browse/FLINK-17560 Thank you~ Xintong Song On Sat, May 9, 2020 at 2:14 AM Josson Paul wrote: > Set up > -- > Flink verson 1.8.3 > > Zookeeper HA cluster > > 1 ResourceManager/Dispatcher (Same Node) > 1

No Slots available exception in Apache Flink Job Manager while Scheduling

2020-05-08 Thread Josson Paul
Set up -- Flink verson 1.8.3 Zookeeper HA cluster 1 ResourceManager/Dispatcher (Same Node) 1 TaskManager 4 pipelines running with various parallelism's Issue -- Occationally when the Job Manager gets restarted we noticed that all the pipelines are not getting scheduled. The error that

Re: Casting from org.apache.flink.api.java.tuple.Tuple2 to scala.Product; using java and scala in flink job

2020-05-06 Thread Aljoscha Krettek
function) that converts between the two types. The Tuple2 type and the Scala tuple type, i.e. (foo, bar) have nothing in common when it comes to the type system. Best, Aljoscha On 06.05.20 01:42, Nick Bendtner wrote: Hi guys, In our flink job we use java source for deserializing a message from kafka

Re: Casting from org.apache.flink.api.java.tuple.Tuple2 to scala.Product; using java and scala in flink job

2020-05-06 Thread Aljoscha Krettek
when it comes to the type system. Best, Aljoscha On 06.05.20 01:42, Nick Bendtner wrote: Hi guys, In our flink job we use java source for deserializing a message from kafka using a kafka deserializer. Signature is as below. public class CustomAvroDeserializationSchema implements

Casting from org.apache.flink.api.java.tuple.Tuple2 to scala.Product; using java and scala in flink job

2020-05-05 Thread Nick Bendtner
Hi guys, In our flink job we use java source for deserializing a message from kafka using a kafka deserializer. Signature is as below. public class CustomAvroDeserializationSchema implements KafkaDeserializationSchema> The other parts of the streaming job are in scala. When data

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-21 Thread Yun Tang
istState.java#L115 Best Yun Tang From: Oleg Vysotsky Sent: Tuesday, April 21, 2020 13:53 To: Yun Tang ; Jacob Sevart ; Timo Walther ; user@flink.apache.org Cc: Long Nguyen ; Gurpreet Singh Subject: Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flin

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-20 Thread Yun Tang
ze (instead of 2 MB) and flink job fails during restoring from such checkpoint Hi Jacob & Timo, Thank you for checking! I don’t use union list state in my app. FlinkKafkaConsumerBase (from kafka connector) uses it to store offsets per partition, but partitions are small (input topic has 32 p

Re: FLINK JOB solved

2020-04-20 Thread Som Lima
I found the problem. in the flink1.0.0/conf There are two files. Masters and slaves the Masters contains localhost:8081 in the slaves just localhost. I changed them both to server ipaddress. Now the FLINK JOB link has full :8081 link and displays Apache Flink Dashboard in browser

Re: FLINK JOB

2020-04-20 Thread Som Lima
Yes exactly that is the change I am having to make. Changing FLINK JOB default localhost to ip of server computer in the browser. I followed the instructions as per your link. https://medium.com/@zjffdu/flink-on-zeppelin-part-1-get-started-2591aaa6aa47 i.e. 0.0.0.0 of zeppelin.server.addr

Re: FLINK JOB

2020-04-20 Thread Jeff Zhang
午4:44写道: > I am only running the zeppelin word count example by clicking the > zeppelin run arrow. > > > On Mon, 20 Apr 2020, 09:42 Jeff Zhang, wrote: > >> How do you run flink job ? It should not always be localhost:8081 >> >> Som Lima 于2020年4月20日周一 下午4:33写

Re: FLINK JOB

2020-04-20 Thread Som Lima
I am only running the zeppelin word count example by clicking the zeppelin run arrow. On Mon, 20 Apr 2020, 09:42 Jeff Zhang, wrote: > How do you run flink job ? It should not always be localhost:8081 > > Som Lima 于2020年4月20日周一 下午4:33写道: > >> Hi, >> >> FLINK J

Re: FLINK JOB

2020-04-20 Thread Jeff Zhang
How do you run flink job ? It should not always be localhost:8081 Som Lima 于2020年4月20日周一 下午4:33写道: > Hi, > > FLINK JOB url defaults to localhost > > i.e. localhost:8081. > > I have to manually change it to server :8081 to get Apache flink Web > Dashboard to disp

FLINK JOB

2020-04-20 Thread Som Lima
Hi, FLINK JOB url defaults to localhost i.e. localhost:8081. I have to manually change it to server :8081 to get Apache flink Web Dashboard to display.

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-18 Thread Yun Tang
" source? Best Yun Tang From: Jacob Sevart Sent: Saturday, April 18, 2020 9:22 To: Oleg Vysotsky Cc: Timo Walther ; user@flink.apache.org ; Long Nguyen Subject: Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fa

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-17 Thread Jacob Sevart
ppened > only on our largest flink job (which processes 6k-10k events per second). > Similar smallerjobs (same code) don't have this problem. E.g. the similar > job which processes about 3 times less events don't have this problem. As > a result, remote debugging is quite challenging

Re: Flink job didn't restart when a task failed

2020-04-17 Thread Till Rohrmann
; >> -Bruce >> >> >> >> -- >> >> >> >> >> >> *From: *Zhu Zhu >> *Date: *Monday, April 13, 2020 at 9:29 PM >> *To: *Till Rohrmann >> *Cc: *Aljoscha Krettek , user , >> Gary Yao >> *Subject: *Re: Flink jo

Re: Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-15 Thread Timo Walther
, Sometime our flink job starts creating large checkpoints which include 55 Gb (instead of 2 MB) related to kafka source. After the flink job creates first “abnormal” checkpoint all next checkpoints are “abnormal” as well. Flink job can’t be restored from such checkpoint. Restoring from the checkpoint

Re: Flink job didn't restart when a task failed

2020-04-15 Thread Zhu Zhu
; -Bruce > > > > -- > > > > > > *From: *Zhu Zhu > *Date: *Monday, April 13, 2020 at 9:29 PM > *To: *Till Rohrmann > *Cc: *Aljoscha Krettek , user , > Gary Yao > *Subject: *Re: Flink job didn't restart when a task failed > > > > Sorry

Checkpoints for kafka source sometimes get 55 GB size (instead of 2 MB) and flink job fails during restoring from such checkpoint

2020-04-14 Thread Oleg Vysotsky
Hello, Sometime our flink job starts creating large checkpoints which include 55 Gb (instead of 2 MB) related to kafka source. After the flink job creates first “abnormal” checkpoint all next checkpoints are “abnormal” as well. Flink job can’t be restored from such checkpoint. Restoring from

Re: Flink job didn't restart when a task failed

2020-04-14 Thread Hanson, Bruce
ser , Gary Yao Subject: Re: Flink job didn't restart when a task failed Sorry for not following this ML earlier. I think the cause might be that the final state ('FAILED') update message to JM is lost. TaskExecutor will simply fail the task (which does not take effect in this case sinc

Re: Flink job didn't restart when a task failed

2020-04-13 Thread Zhu Zhu
>> On Thu, Apr 9, 2020 at 1:57 PM Aljoscha Krettek >> wrote: >> >>> Hi, >>> >>> this indeed seems very strange! >>> >>> @Gary Could you maybe have a look at this since you work/worked quite a >>> bit on the scheduler? >>

Re: Flink job consuming all available memory on host

2020-04-12 Thread Xintong Song
another process, calling a JNI library or so? Thank you~ Xintong Song On Sat, Apr 11, 2020 at 3:56 AM Mitch Lloyd wrote: > We are having an issue with a Flink Job that gradually consumes all > available memory on a Docker host machine, crashing the machine. > > * We are running

Flink job consuming all available memory on host

2020-04-10 Thread Mitch Lloyd
We are having an issue with a Flink Job that gradually consumes all available memory on a Docker host machine, crashing the machine. * We are running Flink 1.10.0 * We are running Flink in a Docker container on AWS ECS with EC2 instances * The Flink task manager UI does not show high memory usage

Re: Flink job didn't restart when a task failed

2020-04-09 Thread Till Rohrmann
9, 2020 at 1:57 PM Aljoscha Krettek > wrote: > >> Hi, >> >> this indeed seems very strange! >> >> @Gary Could you maybe have a look at this since you work/worked quite a >> bit on the scheduler? >> >> Best, >> Aljoscha >> >> On 09.04.20 05:

Re: Flink job didn't restart when a task failed

2020-04-09 Thread Till Rohrmann
o Flink folks: > > > > We had a problem with a Flink job the other day that I haven’t seen > before. One task encountered a failure and switched to FAILED (see the full > exception below). After the failure, the task said it was notifying the Job > Manager: > > > > 202

Re: Flink job didn't restart when a task failed

2020-04-09 Thread Aljoscha Krettek
Hi, this indeed seems very strange! @Gary Could you maybe have a look at this since you work/worked quite a bit on the scheduler? Best, Aljoscha On 09.04.20 05:46, Hanson, Bruce wrote: Hello Flink folks: We had a problem with a Flink job the other day that I haven’t seen before. One task

Flink job didn't restart when a task failed

2020-04-08 Thread Hanson, Bruce
Hello Flink folks: We had a problem with a Flink job the other day that I haven’t seen before. One task encountered a failure and switched to FAILED (see the full exception below). After the failure, the task said it was notifying the Job Manager: 2020-04-06 08:21:04.329 [flink

Re: Flink job getting killed

2020-04-06 Thread Fabian Hueske
Hi Giriraj, This looks like the deserialization of a String failed. Can you isolate the problem to a pair of sending and receiving tasks? Best, Fabian Am So., 5. Apr. 2020 um 20:18 Uhr schrieb Giriraj Chauhan < graj.chau...@gmail.com>: > Hi, > > We are submitting a flink(1.9.1) job for data

Flink job getting killed

2020-04-05 Thread Giriraj Chauhan
Hi, We are submitting a flink(1.9.1) job for data processing. It runs fine and processes data for sometime i.e. ~30 mins and later it throws following exception and job gets killed. 2020-04-02 14:15:43,371 INFO org.apache.flink.runtime.taskmanager.Task - Sink: Unnamed (2/4)

Re: Start flink job from the latest checkpoint programmatically

2020-03-13 Thread Vijay Bhaskar
2 things you can do, stop flink job is going to generate savepoint. You need to save the save point directory path in some persistent store (because you are restarting the cluster, otherwise checkpoint monitoring api should give you save point file details) After spinning the cluster read

Re: Start flink job from the latest checkpoint programmatically

2020-03-12 Thread Flavio Pompermaier
is allow user to start and stop. > > The Flink job is running in job cluster (application jar is available to > flink upon startup). When stop a running application, it means exit the > program. > > When restart a stopped job, it means to spin up new job cluster with

Start flink job from the latest checkpoint programmatically

2020-03-12 Thread Eleanore Jin
Hi All, The setup of my flink application is allow user to start and stop. The Flink job is running in job cluster (application jar is available to flink upon startup). When stop a running application, it means exit the program. When restart a stopped job, it means to spin up new job cluster

Start flink job from the latest checkpoint programmatically

2020-03-12 Thread Eleanore Jin
Hi All, The setup of my flink application is allow user to start and stop. The Flink job is running in job cluster (application jar is available to flink upon startup). When stop a running application, it means exit the program. When restart a stopped job, it means to spin up new job cluster

<    1   2   3   4   5   6   7   8   9   >