controller is used to manage JobManager. Am I
right?
*From:* Chesnay Schepler
*Sent:* Saturday, August 22, 2020 12:58 AM
*To:* Alexey Trenikhun ; Piotr Nowojski
*Cc:* Flink User Mail List
*Subject:* Re: Flink Job cluster
: Flink Job cluster in HA mode - recovery vs upgrade
If, and only if, the cluster-id and JobId are identical then the JobGraph will
be recovered from ZooKeeper.
On 22/08/2020 06:12, Alexey Trenikhun wrote:
Not sure I that I understand your statement about "the HaServices are only
being
failed job will be over written by new one which will have same job-id?
*From:* Chesnay Schepler
*Sent:* Friday, August 21, 2020 12:16 PM
*To:* Alexey Trenikhun ; Piotr Nowojski
*Cc:* Flink User Mail List
*Subject:* Re: Flink J
0 12:16 PM
To: Alexey Trenikhun ; Piotr Nowojski
Cc: Flink User Mail List
Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade
The HaServices are only being given the JobGraph, to this is not possible.
Actually I have to correct myself. For a job cluster the state in HA should be
irrelevant w
Schepler
*Cc:* Alexey Trenikhun ; Flink User Mail List
*Subject:* Re: Flink Job cluster in HA mode - recovery vs upgrade
Thank you for the clarification Chesney and sorry for the incorrect
previous answer.
Piotrek
czw., 20 sie 2020 o 15:59 Chesnay Schepler <mailto:ches...@apache.org>>
: Piotr Nowojski
Sent: Thursday, August 20, 2020 7:04 AM
To: Chesnay Schepler
Cc: Alexey Trenikhun ; Flink User Mail List
Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade
Thank you for the clarification Chesney and sorry for the incorrect previous
answer.
Piotrek
czw., 20 sie
as long as the operator UIDs are the same before
> and after the upgrade (for operator state to match before and after the
> upgrade).
>
> Best, Piotrek
>
> czw., 20 sie 2020 o 06:34 Alexey Trenikhun napisał(a):
>
>> Hello,
>>
>> Let's say I run Flink Job clu
com>> napisał(a):
Hello,
Let's say I run Flink Job cluster with persistent storage and
Zookeeper HA on k8s with single JobManager and use externalized
checkpoints. When JM crashes, k8s will restart JM pod, and JM will
read JobId and JobGraph from ZK and restore from latest
c
before and
after the upgrade (for operator state to match before and after the
upgrade).
Best, Piotrek
czw., 20 sie 2020 o 06:34 Alexey Trenikhun napisał(a):
> Hello,
>
> Let's say I run Flink Job cluster with persistent storage and Zookeeper HA
> on k8s with single JobMan
Hello,
Let's say I run Flink Job cluster with persistent storage and Zookeeper HA on
k8s with single JobManager and use externalized checkpoints. When JM crashes,
k8s will restart JM pod, and JM will read JobId and JobGraph from ZK and
restore from latest checkpoint. Now let's say I want
Thanks, Arvid.
The guide was helpful in how to start working with Flink. I'm currently
exploring SQL/Table API.
Will surely come back for queries on it.
On Thu, Aug 13, 2020 at 1:25 PM Arvid Heise wrote:
> Hi,
>
> performance testing is quite vague. Usually you start by writing a small
>
Hi Flavio,
This is a daunting task to implement properly. There is an easy fix in
related workflow systems though. Assuming that it's a rerunning task, then
you simply store the run times of the last run, use some kind of low-pass
filter (=decaying average) and compare the current runtime with
Hi,
performance testing is quite vague. Usually you start by writing a small
first version of your pipeline and check how the well computation scales on
your data. Flink's web UI [1] already helps quite well for the first time.
Usually you'd also add some metric system and look for advanced
Hi Flavio,
I'm not aware of such a heuristic being implemented anywhere. You need to
come up with something yourself.
On Fri, Aug 7, 2020 at 12:55 PM Flavio Pompermaier
wrote:
> Hi to all,
> one of our customers asked us to see a percentage of completion of a Flink
> Batch job. Is there any
Hi,
I'm new to the streaming world, checking on Performance testing tools. Are
there any recommended Performance testing tools for Flink?
--
A.Narasimha Swamy
Hi to all,
one of our customers asked us to see a percentage of completion of a Flink
Batch job. Is there any already implemented heuristic I can use to compute
it? Will this be possible also when DataSet api will migrate to
DataStream..?
Thanks in advance,
Flavio
the monitoring? In production, I have hundreds of small
>> flink jobs running (2-8 TM pods) doing stateless processing, it is really
>> hard for us to expose ingress for each JM rest endpoint to periodically
>> query the job status for each flink job.
>>
>> Thanks a
the monitoring? In production, I have hundreds of small
>> flink jobs running (2-8 TM pods) doing stateless processing, it is really
>> hard for us to expose ingress for each JM rest endpoint to periodically
>> query the job status for each flink job.
>>
>> Thanks a
p, but if this will no longer be the case, how
> can we deal with the monitoring? In production, I have hundreds of small
> flink jobs running (2-8 TM pods) doing stateless processing, it is really
> hard for us to expose ingress for each JM rest endpoint to periodically
> quer
p, but if this will no longer be the case, how
> can we deal with the monitoring? In production, I have hundreds of small
> flink jobs running (2-8 TM pods) doing stateless processing, it is really
> hard for us to expose ingress for each JM rest endpoint to periodically
> quer
be the case, how
can we deal with the monitoring? In production, I have hundreds of small
flink jobs running (2-8 TM pods) doing stateless processing, it is really
hard for us to expose ingress for each JM rest endpoint to periodically
query the job status for each flink job.
Thanks a lot
be the case, how
can we deal with the monitoring? In production, I have hundreds of small
flink jobs running (2-8 TM pods) doing stateless processing, it is really
hard for us to expose ingress for each JM rest endpoint to periodically
query the job status for each flink job.
Thanks a lot
54写道:
>>>
>>>> Hi Yang & Till,
>>>>
>>>> Thanks for your prompt reply!
>>>>
>>>> Yang, regarding your question, I am actually not using k8s job, as I
>>>> put my app.jar and its dependencies under flink's lib dir
54写道:
>>>
>>>> Hi Yang & Till,
>>>>
>>>> Thanks for your prompt reply!
>>>>
>>>> Yang, regarding your question, I am actually not using k8s job, as I
>>>> put my app.jar and its dependencies under flink's lib dir
Till,
>>>
>>> Thanks for your prompt reply!
>>>
>>> Yang, regarding your question, I am actually not using k8s job, as I put
>>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
>>> deployment for job manager, and 1 k8s d
Till,
>>>
>>> Thanks for your prompt reply!
>>>
>>> Yang, regarding your question, I am actually not using k8s job, as I put
>>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
>>> deployment for job manager, and 1 k8s d
!
>>
>> Yang, regarding your question, I am actually not using k8s job, as I put
>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
>> deployment for job manager, and 1 k8s deployment for task manager, and 1
>> k8s service for job manager.
!
>>
>> Yang, regarding your question, I am actually not using k8s job, as I put
>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
>> deployment for job manager, and 1 k8s deployment for task manager, and 1
>> k8s service for job manager.
tion, I am actually not using k8s job, as I put
> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
> deployment for job manager, and 1 k8s deployment for task manager, and 1
> k8s service for job manager.
>
> As you mentioned above, if flink job is marked
tion, I am actually not using k8s job, as I put
> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
> deployment for job manager, and 1 k8s deployment for task manager, and 1
> k8s service for job manager.
>
> As you mentioned above, if flink job is marked
ager.
As you mentioned above, if flink job is marked as failed, it will cause the
job manager pod to be restarted. Which is not the ideal behavior.
Do you suggest that I should change the deployment strategy from using k8s
deployment to k8s job? In case the flink program exit with non-zero code
ager.
As you mentioned above, if flink job is marked as failed, it will cause the
job manager pod to be restarted. Which is not the ideal behavior.
Do you suggest that I should change the deployment strategy from using k8s
deployment to k8s job? In case the flink program exit with non-zero code
>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>> what are the suggestions for such scenario?
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-relea
>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>> what are the suggestions for such scenario?
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-relea
lib.
>>
>> So my question is more like, in this case, if the job is marked as
>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>> what are the suggestions for such scenario?
>>
>> Thanks a lot!
>> Eleanore
>>
>> [
lib.
>>
>> So my question is more like, in this case, if the job is marked as
>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>> what are the suggestions for such scenario?
>>
>> Thanks a lot!
>> Eleanore
>>
>> [
seems not help at all, what are
> the suggestions for such scenario?
>
> Thanks a lot!
> Eleanore
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>
> On Mon, Aug 3, 2020 at 2:13 AM Till
seems not help at all, what are
> the suggestions for such scenario?
>
> Thanks a lot!
> Eleanore
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>
> On Mon, Aug 3, 2020 at 2:13 AM Till
as FAILED,
which causes k8s to restart the pod, this seems not help at all, what are
the suggestions for such scenario?
Thanks a lot!
Eleanore
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
On Mon, Aug 3, 2020 at 2:13 AM
as FAILED,
which causes k8s to restart the pod, this seems not help at all, what are
the suggestions for such scenario?
Thanks a lot!
Eleanore
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
On Mon, Aug 3, 2020 at 2:13 AM
/rest_api.html#jobs-jobid
Cheers,
Till
On Fri, Jul 31, 2020 at 7:56 PM Vijay Balakrishnan
wrote:
> Hi,
> I am trying to figure how long it took a Flink Job to start up ?
> I used /jobs/overview and it gave me just the start-time as a long value.
> The Flink DashBoard UI shows the Start-T
nce the pods
> are not running. However, kubernetes will then restart the job again as the
> available replicas do not match the desired one.
>
> I wonder what are the suggestions for such a scenario? How should I
> configure the flink job running on k8s?
>
> Thanks a lot!
> Eleanore
>
nce the pods
> are not running. However, kubernetes will then restart the job again as the
> available replicas do not match the desired one.
>
> I wonder what are the suggestions for such a scenario? How should I
> configure the flink job running on k8s?
>
> Thanks a lot!
> Eleanore
>
. However, kubernetes will then restart the job again as the
available replicas do not match the desired one.
I wonder what are the suggestions for such a scenario? How should I
configure the flink job running on k8s?
Thanks a lot!
Eleanore
. However, kubernetes will then restart the job again as the
available replicas do not match the desired one.
I wonder what are the suggestions for such a scenario? How should I
configure the flink job running on k8s?
Thanks a lot!
Eleanore
Hi,
Flink
metrics里有一项是task相关的指标currentWatermark,从中可以知道subtask_index,task_name,watermark三项信息,应该能帮助排查watermark的推进情况。
Best,
shizk233
snack white 于2020年7月20日周一 下午3:51写道:
> HI:
> flink job 跑一段时间 watermark 不推进,任务没挂,source 是 kafka ,kafka 各个partition
> 均有数据, flink job statue backend 为
??idea??local??
----
??:
"user-zh"
HI:
flink job 跑一段时间 watermark 不推进,任务没挂,source 是 kafka ,kafka 各个partition
均有数据, flink job statue backend 为 memory 。有debug 的姿势推荐吗? 看过 CPU GC 等指标,看不出来有异常。
Best regards!
white
y, July 15, 2020 7:29
To: user@flink.apache.org
Subject: ERROR submmiting a flink job
Hello Guys,
I am trying to launch a FLINK app on a distance server, but I have this error
message.
org.apache.flink.client.program.ProgramInvocationException: Th
Hello Guys,
I am trying to launch a FLINK app on a distance server, but I have this
error message.
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error:
org.apache.flink.client.program.ProgramInvocationException: Job failed
(JobID:
3, 2020 at 10:48 AM noon cjihg wrote:
>
> > Hi,大佬们
> >
> > Flink job经常不定期重启,看了异常日志基本都是下面这种,可以帮忙解释下什么原因吗?
> >
> > 2020-07-01 20:20:43.875 [flink-akka.actor.default-dispatcher-27] INFO
> > akka.remote.RemoteActorRefProvider$RemotingTerminator
> > flink-akka.
从报错信息看是 Akka 的 RPC 调用超时,因为是 LocalFencedMessage 所以基本上可以排除网络问题。
建议看一下 JM 进程的 GC 压力以及线程数量,是否存在压力过大 RPC 来不及响应的情况。
Thank you~
Xintong Song
On Fri, Jul 3, 2020 at 10:48 AM noon cjihg wrote:
> Hi,大佬们
>
> Flink job经常不定期重启,看了异常日志基本都是下面这种,可以帮忙解释下什么原因吗?
>
> 2020-07-01 20:2
Hi,大佬们
Flink job经常不定期重启,看了异常日志基本都是下面这种,可以帮忙解释下什么原因吗?
2020-07-01 20:20:43.875 [flink-akka.actor.default-dispatcher-27] INFO
akka.remote.RemoteActorRefProvider$RemotingTerminator
flink-akka.remote.default-remote-dispatcher-22 - Remoting shut down.
2020-07-01 20:20:43.875 [flink-akka.actor.default
creating a local environment via
(Stream)ExecutionEnvironment#createLocalEnvironment?
On 17/06/2020 17:05, Sourabh Mehta wrote:
Hi Team,
I'm exploring flink for one of my use case, I'm facing
some issues while running a flink job i
Hi
这里手动 Checkpoint 是指 Savepoint 吧。从栈看是因为超时了,有可能是 savepoint 比较慢导致的。
这个你可以看一下 JM log,看看是否 savepoint 很久才完成。
另外,可以描述下你们使用 savepoint 的主要场景吗?
1. 为什么要使用 savepoint
2. 在你们的场景中能否用 checkpoint 代替 savepoint 呢?
Best,
Congxian
Zhou Zach 于2020年6月19日周五 下午3:25写道:
>
>
>
>
> 2020-06-19 15:11:18,361 INFO
2020-06-19 15:11:18,361 INFO org.apache.flink.client.cli.CliFrontend
- Triggering savepoint for job e229c76e6a1b43142cb4272523102ed1.
2020-06-19 15:11:18,378 INFO org.apache.flink.client.cli.CliFrontend
- Waiting for response...
2020-06-19
ting a local environment via
>>> (Stream)ExecutionEnvironment#createLocalEnvironment?
>>>
>>> On 17/06/2020 17:05, Sourabh Mehta wrote:
>>>
>>> Hi Team,
>>>
>>> I'm exploring flink for one of my use case, I'm facing some issues
>>
> wrote:
>
>> Are you by any chance creating a local environment via
>> (Stream)ExecutionEnvironment#createLocalEnvironment?
>>
>> On 17/06/2020 17:05, Sourabh Mehta wrote:
>>
>> Hi Team,
>>
>> I'm exploring flink for one of my use case, I'm facing
one of my use case, I'm facing some issues while
> running a flink job in cluster mode. Below are the steps I followed to
> setup and run job in cluster mode :
> 1. Setup flink on google cloud dataproc using
> https://github.com/GoogleCloudDataproc/initialization-actions/tree/
Are you by any chance creating a local environment via
(Stream)ExecutionEnvironment#createLocalEnvironment?
On 17/06/2020 17:05, Sourabh Mehta wrote:
Hi Team,
I'm exploring flink for one of my use case, I'm facing some issues
while running a flink job in cluster mode. Below are the steps I
Hi Team,
I'm exploring flink for one of my use case, I'm facing some issues while
running a flink job in cluster mode. Below are the steps I followed to
setup and run job in cluster mode :
1. Setup flink on google cloud dataproc using
https://github.com/GoogleCloudDataproc/initialization-actions
@Yun Tang<mailto:myas...@live.com>,Thanks.
From: Yun Tang
Sent: Monday, June 15, 2020 11:30
To: Thomas Huang ; Flink
Subject: Re: The Flink job recovered with wrong checkpoint state.
Hi Thomas
The answer is yes. Without high availability, once the job m
To: Flink
Subject: The Flink job recovered with wrong checkpoint state.
Hi Flink Community,
Currently, I'm using yarn-cluster mode to submit flink job on yarn, and I
haven't set high availability configuration (zookeeper), but set restart
strategy:
env.getConfig.setRestartStrategy
Hi Flink Community,
Currently, I'm using yarn-cluster mode to submit flink job on yarn, and I
haven't set high availability configuration (zookeeper), but set restart
strategy:
env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 3000))
the attempt time is 10 and the wait
Hi Jingsong,
Cool, Thanks for your reply.
Best wishes.
From: Jingsong Li
Sent: Tuesday, May 19, 2020 10:46
To: Thomas Huang
Cc: Flink
Subject: Re: Is it possible to change 'connector.startup-mode' option in the
flink job
Hi Thomas,
Good to hear from you
Hi Thomas,
Good to hear from you. This is a very common problem.
In 1.11, we have two FLIP to solve your problem. [1][2] You can take a look.
I think dynamic table options (table hints) is enough for your requirement.
[1]
Hi guys,
I'm using hive to store kafka topic metadata as follows::
CREATE TABLE orders (
user_idBIGINT,
productSTRING,
order_time TIMESTAMP(3),
WATERMARK FOR order_time AS order_time - '5' SECONDS
) WITH (
'connector.type' = 'kafka',
Linking to the jira ticket, for the record.
https://issues.apache.org/jira/browse/FLINK-17560
Thank you~
Xintong Song
On Sat, May 9, 2020 at 2:14 AM Josson Paul wrote:
> Set up
> --
> Flink verson 1.8.3
>
> Zookeeper HA cluster
>
> 1 ResourceManager/Dispatcher (Same Node)
> 1
Set up
--
Flink verson 1.8.3
Zookeeper HA cluster
1 ResourceManager/Dispatcher (Same Node)
1 TaskManager
4 pipelines running with various parallelism's
Issue
--
Occationally when the Job Manager gets restarted we noticed that all the
pipelines are not getting scheduled. The error that
function) that converts between the two types. The
Tuple2 type and the Scala tuple type, i.e. (foo, bar) have nothing in
common when it comes to the type system.
Best,
Aljoscha
On 06.05.20 01:42, Nick Bendtner wrote:
Hi guys,
In our flink job we use java source for deserializing a message from
kafka
when it comes to the type system.
Best,
Aljoscha
On 06.05.20 01:42, Nick Bendtner wrote:
Hi guys,
In our flink job we use java source for deserializing a message from kafka
using a kafka deserializer. Signature is as below.
public class CustomAvroDeserializationSchema implements
Hi guys,
In our flink job we use java source for deserializing a message from kafka
using a kafka deserializer. Signature is as below.
public class CustomAvroDeserializationSchema implements
KafkaDeserializationSchema>
The other parts of the streaming job are in scala. When data
istState.java#L115
Best
Yun Tang
From: Oleg Vysotsky
Sent: Tuesday, April 21, 2020 13:53
To: Yun Tang ; Jacob Sevart ; Timo Walther
; user@flink.apache.org
Cc: Long Nguyen ; Gurpreet Singh
Subject: Re: Checkpoints for kafka source sometimes get 55 GB size (instead of
2 MB) and flin
ze (instead of
2 MB) and flink job fails during restoring from such checkpoint
Hi Jacob & Timo,
Thank you for checking!
I don’t use union list state in my app. FlinkKafkaConsumerBase (from kafka
connector) uses it to store offsets per partition, but partitions are small
(input topic has 32 p
I found the problem.
in the flink1.0.0/conf
There are two files.
Masters
and slaves
the Masters contains localhost:8081
in the slaves just localhost.
I changed them both to server ipaddress.
Now the FLINK JOB link has full :8081 link and displays Apache Flink
Dashboard in browser
Yes exactly that is the change I am having to make. Changing FLINK JOB
default localhost to ip of server computer in the browser.
I followed the instructions as per your
link.
https://medium.com/@zjffdu/flink-on-zeppelin-part-1-get-started-2591aaa6aa47
i.e. 0.0.0.0 of zeppelin.server.addr
午4:44写道:
> I am only running the zeppelin word count example by clicking the
> zeppelin run arrow.
>
>
> On Mon, 20 Apr 2020, 09:42 Jeff Zhang, wrote:
>
>> How do you run flink job ? It should not always be localhost:8081
>>
>> Som Lima 于2020年4月20日周一 下午4:33写
I am only running the zeppelin word count example by clicking the zeppelin
run arrow.
On Mon, 20 Apr 2020, 09:42 Jeff Zhang, wrote:
> How do you run flink job ? It should not always be localhost:8081
>
> Som Lima 于2020年4月20日周一 下午4:33写道:
>
>> Hi,
>>
>> FLINK J
How do you run flink job ? It should not always be localhost:8081
Som Lima 于2020年4月20日周一 下午4:33写道:
> Hi,
>
> FLINK JOB url defaults to localhost
>
> i.e. localhost:8081.
>
> I have to manually change it to server :8081 to get Apache flink Web
> Dashboard to disp
Hi,
FLINK JOB url defaults to localhost
i.e. localhost:8081.
I have to manually change it to server :8081 to get Apache flink Web
Dashboard to display.
" source?
Best
Yun Tang
From: Jacob Sevart
Sent: Saturday, April 18, 2020 9:22
To: Oleg Vysotsky
Cc: Timo Walther ; user@flink.apache.org
; Long Nguyen
Subject: Re: Checkpoints for kafka source sometimes get 55 GB size (instead of
2 MB) and flink job fa
ppened
> only on our largest flink job (which processes 6k-10k events per second).
> Similar smallerjobs (same code) don't have this problem. E.g. the similar
> job which processes about 3 times less events don't have this problem. As
> a result, remote debugging is quite challenging
;
>> -Bruce
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>> *From: *Zhu Zhu
>> *Date: *Monday, April 13, 2020 at 9:29 PM
>> *To: *Till Rohrmann
>> *Cc: *Aljoscha Krettek , user ,
>> Gary Yao
>> *Subject: *Re: Flink jo
,
Sometime our flink job starts creating large checkpoints which include
55 Gb (instead of 2 MB) related to kafka source. After the flink job
creates first “abnormal” checkpoint all next checkpoints are “abnormal”
as well. Flink job can’t be restored from such checkpoint. Restoring
from the checkpoint
; -Bruce
>
>
>
> --
>
>
>
>
>
> *From: *Zhu Zhu
> *Date: *Monday, April 13, 2020 at 9:29 PM
> *To: *Till Rohrmann
> *Cc: *Aljoscha Krettek , user ,
> Gary Yao
> *Subject: *Re: Flink job didn't restart when a task failed
>
>
>
> Sorry
Hello,
Sometime our flink job starts creating large checkpoints which include 55 Gb
(instead of 2 MB) related to kafka source. After the flink job creates first
“abnormal” checkpoint all next checkpoints are “abnormal” as well. Flink job
can’t be restored from such checkpoint. Restoring from
ser , Gary
Yao
Subject: Re: Flink job didn't restart when a task failed
Sorry for not following this ML earlier.
I think the cause might be that the final state ('FAILED') update message to JM
is lost. TaskExecutor will simply fail the task (which does not take effect in
this case sinc
>> On Thu, Apr 9, 2020 at 1:57 PM Aljoscha Krettek
>> wrote:
>>
>>> Hi,
>>>
>>> this indeed seems very strange!
>>>
>>> @Gary Could you maybe have a look at this since you work/worked quite a
>>> bit on the scheduler?
>>
another process, calling a JNI library or so?
Thank you~
Xintong Song
On Sat, Apr 11, 2020 at 3:56 AM Mitch Lloyd wrote:
> We are having an issue with a Flink Job that gradually consumes all
> available memory on a Docker host machine, crashing the machine.
>
> * We are running
We are having an issue with a Flink Job that gradually consumes all
available memory on a Docker host machine, crashing the machine.
* We are running Flink 1.10.0
* We are running Flink in a Docker container on AWS ECS with EC2 instances
* The Flink task manager UI does not show high memory usage
9, 2020 at 1:57 PM Aljoscha Krettek
> wrote:
>
>> Hi,
>>
>> this indeed seems very strange!
>>
>> @Gary Could you maybe have a look at this since you work/worked quite a
>> bit on the scheduler?
>>
>> Best,
>> Aljoscha
>>
>> On 09.04.20 05:
o Flink folks:
> >
> > We had a problem with a Flink job the other day that I haven’t seen
> before. One task encountered a failure and switched to FAILED (see the full
> exception below). After the failure, the task said it was notifying the Job
> Manager:
> >
> > 202
Hi,
this indeed seems very strange!
@Gary Could you maybe have a look at this since you work/worked quite a
bit on the scheduler?
Best,
Aljoscha
On 09.04.20 05:46, Hanson, Bruce wrote:
Hello Flink folks:
We had a problem with a Flink job the other day that I haven’t seen before. One
task
Hello Flink folks:
We had a problem with a Flink job the other day that I haven’t seen before. One
task encountered a failure and switched to FAILED (see the full exception
below). After the failure, the task said it was notifying the Job Manager:
2020-04-06 08:21:04.329 [flink
Hi Giriraj,
This looks like the deserialization of a String failed.
Can you isolate the problem to a pair of sending and receiving tasks?
Best, Fabian
Am So., 5. Apr. 2020 um 20:18 Uhr schrieb Giriraj Chauhan <
graj.chau...@gmail.com>:
> Hi,
>
> We are submitting a flink(1.9.1) job for data
Hi,
We are submitting a flink(1.9.1) job for data processing. It runs fine and
processes data for sometime i.e. ~30 mins and later it throws following
exception and job gets killed.
2020-04-02 14:15:43,371 INFO org.apache.flink.runtime.taskmanager.Task
- Sink: Unnamed (2/4)
2 things you can do,
stop flink job is going to generate savepoint.
You need to save the save point directory path in some persistent store
(because you are restarting the cluster, otherwise checkpoint monitoring
api should give you save point file details)
After spinning the cluster read
is allow user to start and stop.
>
> The Flink job is running in job cluster (application jar is available to
> flink upon startup). When stop a running application, it means exit the
> program.
>
> When restart a stopped job, it means to spin up new job cluster with
Hi All,
The setup of my flink application is allow user to start and stop.
The Flink job is running in job cluster (application jar is available to
flink upon startup). When stop a running application, it means exit the
program.
When restart a stopped job, it means to spin up new job cluster
Hi All,
The setup of my flink application is allow user to start and stop.
The Flink job is running in job cluster (application jar is available to
flink upon startup). When stop a running application, it means exit the
program.
When restart a stopped job, it means to spin up new job cluster
301 - 400 of 850 matches
Mail list logo