Re: Unable to list jobs in flink cluster with multiple jobManagers

2023-01-12 Thread yidan zhao
I think it is a bug: https://issues.apache.org/jira/browse/FLINK-25732 Yael Adsl 于2022年12月12日周一 23:56写道: > > Hi, > > We are running a flink cluster (Flink version 1.14.3) on kubernetes with > high-availablity.type: kubernetes. We have 3 jobmanagers. When we send jobs > to the flink cluster, we

Re: Load Distribution in specific Slot of Taskmanager in flink(version 1.15.2)

2022-11-20 Thread yidan zhao
If no data skew exists, you can set the job's parallelism any times of the count of taskmanagers, and set `cluster.evenly-spread-out-slots` to true in flink-conf.yaml of your flink cluster. harshit.varsh...@iktara.ai 于2022年11月7日周一 20:41写道: > > Dear Team, > > > > I need some advice on setting up l

Re: How to get checkpoint stats after job has terminated

2022-11-09 Thread yidan zhao
First of all, you should trigger a savepoint before stopping the job, and then you can restart the job with the savepoint. For checkpoints, you need to set ‘execution.checkpointing.externalized-checkpoint-retention’ to 'RETAIN_ON_CANCELLATION'. You can get the checkpoints info via history server.

questions about FLINK-27341

2022-09-03 Thread yidan zhao
Hi, I want to know is there some way to avoid this problem now? I can not guarantee jobmanager and taskmanager do not run in the same machine.

Re: Issue with file system implementation

2022-09-01 Thread yidan zhao
Do you place the s3 jar to plugins/s3 dir? Darius Žalandauskas 于2022年8月31日周三 03:11写道: > > Hello apache-flink team, > For the last week I am really struggling with setting up EMR to store stream > output to AWS S3. > According to the documentation, if running flink with emr, no manual > adjustme

Re: standalone mode support in the kubernetes operator (FLIP-25)

2022-07-14 Thread yidan zhao
Hi all, Does 'standalone mode support in the kubernetes operator' means: Using flink-k8s-operator to manage jobs deployed in a standalone cluster? What is the advantag doing so. Yang Wang 于2022年7月14日周四 10:55写道: > > I think the standalone mode support is expected to be done in the version > 1.2.0

Re: DataStream.keyBy() with keys determined at run time

2022-07-10 Thread yidan zhao
You can use string, and serialize all keys to a string. Hemanga Borah 于2022年7月11日周一 09:49写道: > > Here is the documentation of the Tuple class: > https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/java/tuple/Tuple.html > > If you need a concrete class, you can go f

Re: sharing data between 2 pipelines

2022-05-10 Thread yidan zhao
why not put the two pipelines together. David Anderson 于2022年5月11日周三 00:13写道: > > This sounds like it might be a use case for something like a > KeyedCoProcessFunction (or possibly a KeyedBroadcastProcessFunction, > depending on the details). These operators can receive inputs from two > diffe

Re: Flink Checkpoint Timeout

2022-03-08 Thread yidan zhao
If the checkpoint timeout leads to the job's fail, then the job will be recovered and data will be reprocessed from the last completed checkpoint. If the job doesn't fail, then not. Mahantesh Patil 于2022年3月8日周二 14:47写道: > Hello Team, > > What happens after checkpoint timeout? > > Does Flink repr

Re: How to sort Iterable in ProcessWindowFunction?

2022-03-06 Thread yidan zhao
Collect the elements to a list, then sort, then collect out. HG 于2022年3月3日周四 22:13写道: > Hi, > I have need to sort the input of the ProcesWindowFunction by one of the > fields of the Tuple4 that is in the Iterator. > > Any advice as to what the best way is? > > static class MyProcessWindowFunc

Re: Flink job recovery after task manager failure

2022-03-03 Thread yidan zhao
] - Recovering checkpoints from > ZooKeeperStateHandleStore{namespace='flink/default/checkpoints/523f9e48274186bb97c13e3c2213be0e'}. > > 2022-02-24 12:20:16,712 INFO > org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore > [] - Found 1 checkpoints in > Zo

Re: Flink job recovery after task manager failure

2022-03-01 Thread yidan zhao
State backend can be set as hashMap or rocksDB. Checkpoint storage must be a shared file system(nfs or hdfs or something else). Afek, Ifat (Nokia - IL/Kfar Sava) 于2022年3月2日周三 05:55写道: > Hi, > > > > I’m trying to understand the guidelines for task manager recovery. > > From what I see in the docu

Re: Unaligned Tumbling Windows

2022-01-20 Thread yidan zhao
self-define the window assigners. Caizhi Weng 于2022年1月17日周一 13:11写道: > Hi! > > So you'd like to flatten the traffic by materializing the results of > different parallelisms at different times? > > As far as I know this is not possible. Could you please elaborate more on > the reason you'd like t

Re: "sink.partition-commit.success-file.name" option in the FileSystem connector does not work

2021-11-07 Thread yidan zhao
Actually, the success file is another file which is written done to the dir when the partition is done. It's content have nothing to do with your bussiness. Long Nguyễn 于2021年11月5日周五 下午5:07写道: > Thank you, Paul. > > The answer is so clear and helpful. But I'm still wondering what is the > purpos

Re: map concurrent modification exception analysis when checkpoint

2021-08-23 Thread yidan zhao
cious in your code. The stacktrace is also for a > MapSerializer. Do you have another operator where you put Map into a custom > state? > > On Fri, Aug 20, 2021 at 6:43 PM yidan zhao wrote: > >> But, I do not know why this leads to the job's failure and recovery >&g

Re: map concurrent modification exception analysis when checkpoint

2021-08-20 Thread yidan zhao
But, I do not know why this leads to the job's failure and recovery since I have set the tolerable failed checkpoint to Integer.MAX_VALUE. Due to the failure, my task manager failed because of the task cancel timeout, and about 80% of task managers went down due to cancel timeout. yidan zh

Re: map concurrent modification exception analysis when checkpoint

2021-08-20 Thread yidan zhao
hat the state was modified while a > snapshot was being taken. > > We usually see this when users hold on to some state value beyond a > single call to a user-defined function, particularly from different threads. > > We may be able to pinpoint the issue if you were to provide us wi

map concurrent modification exception analysis when checkpoint

2021-08-19 Thread yidan zhao
Flink web ui shows the exception as follows. In the task (ual_transform_UserLogBlackUidJudger -> ual_transform_IpLabel ), the first one is a broadcast process function, and the second one is an async function. I do not know whether the issues have some relation to it. And the issues not occurred b

Re: Is FlinkKafkaConsumer setStartFromLatest() method needed when we use auto.offset.reset=latest kafka properties

2021-08-06 Thread yidan zhao
it is not the same. Kafka's 'auto.offset.reset' is used when the configured consumer group id does not have offset info stored in kafka. not exist. If you want to consume from latest no matter whether there is group offset info in kafka, you should use flink's setStartFromLatest. suman shil 于2021

Re: after upgrade flink1.12 to flink1.13.1, flink web-ui's taskmanager detail page error

2021-06-17 Thread yidan zhao
looks like a bug. I will create a ticket. > > On Wed, Jun 16, 2021 at 5:21 AM yidan zhao wrote: >> >> does anyone has idea? Here I give another exception stack. >> >> >> Unhandled exception. >> org.apache.flink.runtime.rpc.akka.exception

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Ok, I will try. Yingjie Cao 于2021年6月16日周三 下午8:00写道: > > Maybe you can try to increase taskmanager.network.retries, > taskmanager.network.netty.server.backlog and > taskmanager.network.netty.sendReceiveBufferSize. These options are useful for > our jobs. > > yidan zhao 于

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
'LocalTransportException' or 'RemoteTransportException'. yidan zhao 于2021年6月16日周三 下午7:10写道: > > Hi, yingjie. > If the network is not stable, which config parameter I should adjust. > > yidan zhao 于2021年6月16日周三 下午6:56写道: > > > > 2: I use G1, and no full gc occurred, youn

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Hi, yingjie. If the network is not stable, which config parameter I should adjust. yidan zhao 于2021年6月16日周三 下午6:56写道: > > 2: I use G1, and no full gc occurred, young gc count: 422, time: > 142892, so it is not bad. > 3: stream job. > 4: I will try to config taskmanager.network.r

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
1.13/docs/deployment/config/ > [3] https://issues.apache.org/jira/browse/FLINK-22643 > > Best, > Yingjie > > yidan zhao 于2021年6月16日周三 下午3:36写道: >> >> Attachment is the exception stack from flink's web-ui. Does anyone >> have also met this problem? >> >> Flink1.12 - Flink1.13.1. Standalone Cluster, include 30 containers, >> each 28G mem.

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
写道: > > Hi Yidan, > it seems that the attachment did not make it through the mailing list. Can > you copy-paste the text of the exception here or upload the log somewhere? > > > > On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote: > > > Attachment is the exception stac

Re: after upgrade flink1.12 to flink1.13.1, flink web-ui's taskmanager detail page error

2021-06-15 Thread yidan zhao
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:387) ~[flink-dist_2.11-1.13.1.jar:1.13.1] ... 27 more yidan zhao 于2021年6月11日周五 下午4:10写道: > > I upgrade flink from 1.12 to 1.13.1, and the rest api > (http://xxx:8600/#/task-manager/xxx:34575-c53c6c/metrics) failed. > My standalone cluster include 30 Jobman

Re: Flink job restart when one ZK node is down

2021-06-15 Thread yidan zhao
Yes it is expected, I have also met such problems. Lu Niu 于2021年6月15日周二 上午4:53写道: > > HI, Flink Users > > We use a Zk cluster of 5 node for JM HA. When we terminate one node for > maintenance, we notice lots of flink job fully restarts. The error looks like: > ``` > org.apache.flink.util.FlinkEx

after upgrade flink1.12 to flink1.13.1, flink web-ui's taskmanager detail page error

2021-06-11 Thread yidan zhao
I upgrade flink from 1.12 to 1.13.1, and the rest api (http://xxx:8600/#/task-manager/xxx:34575-c53c6c/metrics) failed. My standalone cluster include 30 Jobmanagers and 30 Taskmanagers, and I found the api only works in the one jobmanager when it is the rest api leader. for example, jobmanager1(ht

Re: Flink stream processing issue

2021-06-03 Thread yidan zhao
Yes, if you use KeyedCoProcess, flink will ensure that. Qihua Yang 于2021年6月4日周五 上午12:32写道: > > Sorry for the confusion Yes, I mean multiple parallelism. Really thanks > for your help. > > Thanks, > Qihua > > On Thu, Jun 3, 2021 at 12:03 AM JING ZHANG wrote: >> >> Hi Qihua, >> >> I’m sorry I

Re: Evenly distribute task slots across task-manager

2021-03-30 Thread yidan zhao
I think currently flink doesn't support your case, and another idea is that you can set the parallelism of all operators to 64, then it will be evenly distributed to the two taskmanagers. Vignesh Ramesh 于2021年3月25日周四 上午1:05写道: > Hi Matthias, > > Thanks for your reply. In my case, yes the upstrea

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
Best, > > Yuan > > On Thu, Mar 4, 2021 at 4:26 PM yidan zhao wrote: > >> One more question, If I only need watermark's logic, not keyedStream, why >> not provide methods such as writeDataStream and readDataStream. It uses the >> similar methods for kafka produc

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
And do you know when kafka consumer/producer will be re implemented according to the new source/sink api? I am thinking whether I should adjust the code for now, since I need to re adjust the code when it is reconstructed to the new source/sink api. yidan zhao 于2021年3月4日周四 下午4:44写道: >

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
I uploaded a picture to describe that. https://ftp.bmp.ovh/imgs/2021/03/2068f2e22045e696.png >

Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread yidan zhao
gt; > [1] https://issues.apache.org/jira/browse/FLINK-21317 > > czw., 4 mar 2021 o 03:48 yidan zhao napisał(a): > >> Yes, you are right and thank you. I take a brief look at what >> FlinkKafkaShuffle is doing, it seems what I need and I will have a try. >> >>>

Re: how to propagate watermarks across multiple jobs

2021-03-03 Thread yidan zhao
Yes, you are right and thank you. I take a brief look at what FlinkKafkaShuffle is doing, it seems what I need and I will have a try. >

how to propagate watermarks across multiple jobs

2021-03-01 Thread yidan zhao
I have a job which includes about 50+ tasks. I want to split it to multiple jobs, and the data is transferred through Kafka, but how about watermark? Is anyone have do something similar and solved this problem? Here I give an example: The original job: kafkaStream1(src-topic) => xxxProcess => xxx

Re: Flink jobs organization and maintainability

2021-02-24 Thread yidan zhao
I used a yarm config file to describe my jobs, and using 'start xxxJobName' to start the job which is implemented by shell scripts. Arvid Heise 于2021年2月24日周三 下午10:09写道: > If you have many similar jobs, they should be in the same repo (especially > if they have the same development cycle). > > Fi

Re: Configure operator based on key

2021-02-21 Thread yidan zhao
You can self-define it using keyedStream.window(GlobalWindows.create() ).trigger(self-defined-trigger). Abhinav Sharma 于2021年2月21日周日 下午3:57写道: > Hi, > > Is there some way that I can configure an operator based on the key in a > stream? > Eg: If the key is 'abcd', then create a window of size X c

Re: Sharding of Operators

2021-02-17 Thread yidan zhao
Actually, we only need to ensure all records belonging to the same key will be forwarded to the same operator instance(i), and we do not need to guarantee that 'i' is the same with the 'i' in previous savepoints. When the job is restarted, the rule 'same key's record will be in together' is guarant

Re: question on ValueState

2021-02-08 Thread yidan zhao
I have a related question. Since fileStateBackend uses heap as the state storage and the checkpoint is finally stored in the filesystem, so whether the JobManager/TaskManager memory will limit the state size? The state size is limited by TM's memory * number of TMs? or limited by JM's memory. Kha

Re: question on ValueState

2021-02-08 Thread yidan zhao
nk I should use rocksDB, is it right? yidan zhao 于2021年2月9日周二 下午3:50写道: > I have a related question. > Since fileStateBackend uses heap as the state storage and the checkpoint > is finally stored in the filesystem, so whether the JobManager/TaskManager > memory will limit the state size? T