I think it is a bug: https://issues.apache.org/jira/browse/FLINK-25732
Yael Adsl 于2022年12月12日周一 23:56写道:
>
> Hi,
>
> We are running a flink cluster (Flink version 1.14.3) on kubernetes with
> high-availablity.type: kubernetes. We have 3 jobmanagers. When we send jobs
> to the flink cluster, we
If no data skew exists, you can set the job's parallelism any times of
the count of taskmanagers, and set `cluster.evenly-spread-out-slots`
to true in flink-conf.yaml of your flink cluster.
harshit.varsh...@iktara.ai 于2022年11月7日周一 20:41写道:
>
> Dear Team,
>
>
>
> I need some advice on setting up l
First of all, you should trigger a savepoint before stopping the job,
and then you can restart the job with the savepoint.
For checkpoints, you need to set
‘execution.checkpointing.externalized-checkpoint-retention’ to
'RETAIN_ON_CANCELLATION'. You can get the checkpoints info via history
server.
Hi, I want to know is there some way to avoid this problem now?
I can not guarantee jobmanager and taskmanager do not run in the same machine.
Do you place the s3 jar to plugins/s3 dir?
Darius Žalandauskas 于2022年8月31日周三 03:11写道:
>
> Hello apache-flink team,
> For the last week I am really struggling with setting up EMR to store stream
> output to AWS S3.
> According to the documentation, if running flink with emr, no manual
> adjustme
Hi all, Does 'standalone mode support in the kubernetes operator'
means: Using flink-k8s-operator to manage jobs deployed in a
standalone cluster?
What is the advantag doing so.
Yang Wang 于2022年7月14日周四 10:55写道:
>
> I think the standalone mode support is expected to be done in the version
> 1.2.0
You can use string, and serialize all keys to a string.
Hemanga Borah 于2022年7月11日周一 09:49写道:
>
> Here is the documentation of the Tuple class:
> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/java/tuple/Tuple.html
>
> If you need a concrete class, you can go f
why not put the two pipelines together.
David Anderson 于2022年5月11日周三 00:13写道:
>
> This sounds like it might be a use case for something like a
> KeyedCoProcessFunction (or possibly a KeyedBroadcastProcessFunction,
> depending on the details). These operators can receive inputs from two
> diffe
If the checkpoint timeout leads to the job's fail, then the job will be
recovered and data will be reprocessed from the last completed checkpoint.
If the job doesn't fail, then not.
Mahantesh Patil 于2022年3月8日周二 14:47写道:
> Hello Team,
>
> What happens after checkpoint timeout?
>
> Does Flink repr
Collect the elements to a list, then sort, then collect out.
HG 于2022年3月3日周四 22:13写道:
> Hi,
> I have need to sort the input of the ProcesWindowFunction by one of the
> fields of the Tuple4 that is in the Iterator.
>
> Any advice as to what the best way is?
>
> static class MyProcessWindowFunc
] - Recovering checkpoints from
> ZooKeeperStateHandleStore{namespace='flink/default/checkpoints/523f9e48274186bb97c13e3c2213be0e'}.
>
> 2022-02-24 12:20:16,712 INFO
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore
> [] - Found 1 checkpoints in
> Zo
State backend can be set as hashMap or rocksDB.
Checkpoint storage must be a shared file system(nfs or hdfs or something
else).
Afek, Ifat (Nokia - IL/Kfar Sava) 于2022年3月2日周三
05:55写道:
> Hi,
>
>
>
> I’m trying to understand the guidelines for task manager recovery.
>
> From what I see in the docu
self-define the window assigners.
Caizhi Weng 于2022年1月17日周一 13:11写道:
> Hi!
>
> So you'd like to flatten the traffic by materializing the results of
> different parallelisms at different times?
>
> As far as I know this is not possible. Could you please elaborate more on
> the reason you'd like t
Actually, the success file is another file which is written done to the dir
when the partition is done. It's content have nothing to do with your
bussiness.
Long Nguyễn 于2021年11月5日周五 下午5:07写道:
> Thank you, Paul.
>
> The answer is so clear and helpful. But I'm still wondering what is the
> purpos
cious in your code. The stacktrace is also for a
> MapSerializer. Do you have another operator where you put Map into a custom
> state?
>
> On Fri, Aug 20, 2021 at 6:43 PM yidan zhao wrote:
>
>> But, I do not know why this leads to the job's failure and recovery
>&g
But, I do not know why this leads to the job's failure and recovery
since I have set the tolerable failed checkpoint to Integer.MAX_VALUE.
Due to the failure, my task manager failed because of the task cancel
timeout, and about 80% of task managers went down due to cancel
timeout.
yidan zh
hat the state was modified while a
> snapshot was being taken.
>
> We usually see this when users hold on to some state value beyond a
> single call to a user-defined function, particularly from different threads.
>
> We may be able to pinpoint the issue if you were to provide us wi
Flink web ui shows the exception as follows.
In the task (ual_transform_UserLogBlackUidJudger ->
ual_transform_IpLabel ), the first one is a broadcast process
function, and the second one is an async function. I do not know
whether the issues have some relation to it.
And the issues not occurred b
it is not the same.
Kafka's 'auto.offset.reset' is used when the configured consumer group
id does not have offset info stored in kafka. not exist. If you want
to consume from latest no matter whether there is group offset info in
kafka, you should use flink's setStartFromLatest.
suman shil 于2021
looks like a bug. I will create a ticket.
>
> On Wed, Jun 16, 2021 at 5:21 AM yidan zhao wrote:
>>
>> does anyone has idea? Here I give another exception stack.
>>
>>
>> Unhandled exception.
>> org.apache.flink.runtime.rpc.akka.exception
Ok, I will try.
Yingjie Cao 于2021年6月16日周三 下午8:00写道:
>
> Maybe you can try to increase taskmanager.network.retries,
> taskmanager.network.netty.server.backlog and
> taskmanager.network.netty.sendReceiveBufferSize. These options are useful for
> our jobs.
>
> yidan zhao 于
'LocalTransportException' or
'RemoteTransportException'.
yidan zhao 于2021年6月16日周三 下午7:10写道:
>
> Hi, yingjie.
> If the network is not stable, which config parameter I should adjust.
>
> yidan zhao 于2021年6月16日周三 下午6:56写道:
> >
> > 2: I use G1, and no full gc occurred, youn
Hi, yingjie.
If the network is not stable, which config parameter I should adjust.
yidan zhao 于2021年6月16日周三 下午6:56写道:
>
> 2: I use G1, and no full gc occurred, young gc count: 422, time:
> 142892, so it is not bad.
> 3: stream job.
> 4: I will try to config taskmanager.network.r
1.13/docs/deployment/config/
> [3] https://issues.apache.org/jira/browse/FLINK-22643
>
> Best,
> Yingjie
>
> yidan zhao 于2021年6月16日周三 下午3:36写道:
>>
>> Attachment is the exception stack from flink's web-ui. Does anyone
>> have also met this problem?
>>
>> Flink1.12 - Flink1.13.1. Standalone Cluster, include 30 containers,
>> each 28G mem.
写道:
>
> Hi Yidan,
> it seems that the attachment did not make it through the mailing list. Can
> you copy-paste the text of the exception here or upload the log somewhere?
>
>
>
> On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote:
>
> > Attachment is the exception stac
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.serializeRemoteResultAndVerifySize(AkkaRpcActor.java:387)
~[flink-dist_2.11-1.13.1.jar:1.13.1] ... 27 more
yidan zhao 于2021年6月11日周五 下午4:10写道:
>
> I upgrade flink from 1.12 to 1.13.1, and the rest api
> (http://xxx:8600/#/task-manager/xxx:34575-c53c6c/metrics) failed.
> My standalone cluster include 30 Jobman
Yes it is expected, I have also met such problems.
Lu Niu 于2021年6月15日周二 上午4:53写道:
>
> HI, Flink Users
>
> We use a Zk cluster of 5 node for JM HA. When we terminate one node for
> maintenance, we notice lots of flink job fully restarts. The error looks like:
> ```
> org.apache.flink.util.FlinkEx
I upgrade flink from 1.12 to 1.13.1, and the rest api
(http://xxx:8600/#/task-manager/xxx:34575-c53c6c/metrics) failed.
My standalone cluster include 30 Jobmanagers and 30 Taskmanagers, and
I found the api only works in the one jobmanager when it is the rest
api leader.
for example, jobmanager1(ht
Yes, if you use KeyedCoProcess, flink will ensure that.
Qihua Yang 于2021年6月4日周五 上午12:32写道:
>
> Sorry for the confusion Yes, I mean multiple parallelism. Really thanks
> for your help.
>
> Thanks,
> Qihua
>
> On Thu, Jun 3, 2021 at 12:03 AM JING ZHANG wrote:
>>
>> Hi Qihua,
>>
>> I’m sorry I
I think currently flink doesn't support your case, and another idea is that
you can set the parallelism of all operators to 64, then it will be evenly
distributed to the two taskmanagers.
Vignesh Ramesh 于2021年3月25日周四 上午1:05写道:
> Hi Matthias,
>
> Thanks for your reply. In my case, yes the upstrea
Best,
>
> Yuan
>
> On Thu, Mar 4, 2021 at 4:26 PM yidan zhao wrote:
>
>> One more question, If I only need watermark's logic, not keyedStream, why
>> not provide methods such as writeDataStream and readDataStream. It uses the
>> similar methods for kafka produc
And do you know when kafka consumer/producer will be re implemented
according to the new source/sink api? I am thinking whether I should adjust
the code for now, since I need to re adjust the code when it is
reconstructed to the new source/sink api.
yidan zhao 于2021年3月4日周四 下午4:44写道:
>
I uploaded a picture to describe that.
https://ftp.bmp.ovh/imgs/2021/03/2068f2e22045e696.png
>
gt;
> [1] https://issues.apache.org/jira/browse/FLINK-21317
>
> czw., 4 mar 2021 o 03:48 yidan zhao napisał(a):
>
>> Yes, you are right and thank you. I take a brief look at what
>> FlinkKafkaShuffle is doing, it seems what I need and I will have a try.
>>
>>>
Yes, you are right and thank you. I take a brief look at what
FlinkKafkaShuffle is doing, it seems what I need and I will have a try.
>
I have a job which includes about 50+ tasks. I want to split it to multiple
jobs, and the data is transferred through Kafka, but how about watermark?
Is anyone have do something similar and solved this problem?
Here I give an example:
The original job: kafkaStream1(src-topic) => xxxProcess => xxx
I used a yarm config file to describe my jobs, and using 'start xxxJobName'
to start the job which is implemented by shell scripts.
Arvid Heise 于2021年2月24日周三 下午10:09写道:
> If you have many similar jobs, they should be in the same repo (especially
> if they have the same development cycle).
>
> Fi
You can self-define it using keyedStream.window(GlobalWindows.create()
).trigger(self-defined-trigger).
Abhinav Sharma 于2021年2月21日周日 下午3:57写道:
> Hi,
>
> Is there some way that I can configure an operator based on the key in a
> stream?
> Eg: If the key is 'abcd', then create a window of size X c
Actually, we only need to ensure all records belonging to the same key will
be forwarded to the same operator instance(i), and we do not need to
guarantee that 'i' is the same with the 'i' in previous savepoints. When
the job is restarted, the rule 'same key's record will be in together' is
guarant
I have a related question.
Since fileStateBackend uses heap as the state storage and the checkpoint is
finally stored in the filesystem, so whether the JobManager/TaskManager
memory will limit the state size? The state size is limited by TM's memory
* number of TMs? or limited by JM's memory.
Kha
nk I should use rocksDB,
is it right?
yidan zhao 于2021年2月9日周二 下午3:50写道:
> I have a related question.
> Since fileStateBackend uses heap as the state storage and the checkpoint
> is finally stored in the filesystem, so whether the JobManager/TaskManager
> memory will limit the state size? T
41 matches
Mail list logo