Usually, you should use the HDFS nameservice instead of the NameNode
hostname:port to avoid NN failover.
And you could find the supported nameservice in the hdfs-site.xml in the
key *dfs.nameservices*.
Best,
Yang
On Fri, Mar 22, 2024 at 8:33 PM Sachin Mittal wrote:
> So, when we create an EMR
这种一般是因为APIServer那边有问题导致单次的ConfigMap renew lease annotation的操作失败,Flink默认会重试的
如果你发现因为这个SocketTimeoutException原因导致了任务Failover,可以把下面两个参数调大
high-availability.kubernetes.leader-election.lease-duration: 60s
high-availability.kubernetes.leader-election.renew-deadline: 60s
Best,
Yang
On Tue, Mar 12,
If you could find the "Deregistering Flink Kubernetes cluster, clusterId"
in the JobManager log, then it is not the expected behavior.
Having the full logs of JobManager Pod before restarted will help a lot.
Best,
Yang
On Fri, Feb 2, 2024 at 1:26 PM Liting Liu (litiliu) via user <
I could share some metrics about Alibaba Cloud EMR clusters.
The ratio of Hadoop2 VS Hadoop3 is 1:3.
Best,
Yang
On Thu, Dec 28, 2023 at 8:16 PM Martijn Visser
wrote:
> Hi all,
>
> I want to get some insights on how many users are still using Hadoop 2
> vs how many users are using Hadoop 3.
Could you please configure the same HA configurations for TaskManager as
well?
It seems that the TaskManager container does not use a correct URL when
contacting with ResourceManager.
Best,
Yang
On Fri, Dec 29, 2023 at 11:13 PM Alessio Bernesco Làvore <
alessio.berne...@gmail.com> wrote:
>
Could you please directly use the JobManager Pod IP address instead of K8s
service name(basic-example.default) and have a try with curl/wget?
It seems that the JobManager K8s service could not be accessed.
Best,
Yang
On Sat, Jan 13, 2024 at 1:24 AM LINZ, Arnaud
wrote:
> Hi,
>
> Some more
The fabric8 K8s client is using PATCH to replace get-and-update in v6.6.2.
That's why you also need to give PATCH permission for the K8s service
account.
This would help to decrease the pressure of K8s APIServer. You could find
more information here[1].
[1].
I assume you are using "*bin/flink run-application*" to submit a Flink
application to K8s cluster. Then you could simply
update your local log4j-console.properties, it will be shipped and mounted
to JobManager/TaskManager pods via ConfigMap.
Best,
Yang
Vladislav Keda 于2023年6月20日周二
22:15写道:
>
可以通过JobResultStore[1]来获取任务最终的状态,flink-kubernetes-operator也是这样来获取的
[1].
https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore
Best,
Yang
Weihua Hu 于2023年3月22日周三 10:27写道:
> Hi
>
> 我们内部最初版本是通过 cluster-id 来唯一标识一个 application,同时认为流式任务是长时间运行的,不应该主动退出。如果该
>
可以通过给Prometheus server来配置metric_relabel_configs[1]来控制采集哪些metrics
[1].
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs
Best,
Yang
casel.chen 于2023年3月22日周三 13:47写道:
>
I assume you are using the standalone mode. Right?
For the native K8s mode, the leader address should be
*akka.tcp://flink@JM_POD_IP:6123/user/rpc/dispatcher_1
*when HA enabled.
Best,
Yang
Anton Ippolitov via user 于2023年1月31日周二 00:21写道:
> This is actually what I'm already doing, I'm only
The "JAR file does not exist" exception comes from the JobManager side, not
on the client.
Please be aware that the local:// scheme in the jarURI means the path in
the JobManager pod.
You could use an init-container to download your user jar and mount it to
the JobManager main-container.
Refer to
Do you build your own flink-kubernetes-operator image with the flink-s3-fs
plugin bundled[1]?
[1].
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.3/docs/custom-resource/overview/#flinksessionjob-spec-overview
Best,
Yang
Weihua Hu 于2023年1月17日周二 10:47写道:
> Hi, Rahul
First, JobManager does not store any persistent data to local when the
Kubernetes HA + S3 used.
It means that you do not need to mount a PV for JobMananger deployment.
Secondly, node failures or terminations should not cause
the CrashLoopBackOff status.
One possible reason I could imagine is a
The reason why the running jobs try to failover with zookeeper outage is
that the JobManager lost leadership.
Having a standby JobManager or not makes no difference.
Best,
Yang
Matthias Pohl via user 于2023年1月2日周一 20:51写道:
> And I screwed up the reply again. -.- Here's my previous response for
I think you might need a sidecar container or daemonset to collect the
Flink logs and store into a persistent storage.
You could find more information here[1].
[1].
https://www.alibabacloud.com/blog/best-practices-of-kubernetes-log-collection_596356
Best,
Yang
hjw 于2022年12月22日周四 23:28写道:
> On
IIUC, the fabric8 Kubernetes-client 5.5.0 should already support to reload
the latest kube config if received 401 error.
Refer to the following PR[1] for more information.
Please share your feedback here if it still could not work.
[1]. https://github.com/fabric8io/kubernetes-client/pull/2731
其实可以参考Flink Kubernetes
Operator里面的做法,设置execution.shutdown-on-application-finish参数为false
然后通过轮询Flink RestAPI拿到job的状态,job结束了再主动停掉Application cluster
Best,
Yang
JasonLee <17610775...@163.com> 于2022年11月24日周四 09:59写道:
> Hi
>
>
> 可以通过 Flink 的 Metric 和 Yarn 的 Api 去获取任务的状态(任务提交到 yarn 的话)
>
>
> Best
>
Just Kindly remind, you attached images could not show normally.
Given that *ApplicationDeployer* is not only used for Yarn application
mode, but also native Kubernetes, I am not sure which way you are referring
to return the applicationId.
We already print the applicationId in the client logs.
This is a known limit of the current Flink options parser. Refer to
FLINK-15358[1] for more information.
[1]. https://issues.apache.org/jira/browse/FLINK-15358
Best,
Yang
Gyula Fóra 于2022年11月8日周二 14:41写道:
> It is also possible that this is a problem of the Flink native Kubernetes
>
Thanks Jacky Lau for starting this discussion.
I understand that you are trying to find a convenient way to specify
dependency jars along with user jar. However,
let's try to narrow down by differentiating deployment modes.
# Standalone mode
No matter you are using the standalone mode on virtual
Maybe we could change the values of *taskmanager.numberOfTaskSlots*
and *parallelism.default
*in flink-conf.yaml of Kubernetes operator to 1, which are aligned with the
default values in Flink codebase.
Best,
Yang
Gyula Fóra 于2022年10月26日周三 15:17写道:
> Hi!
>
> I agree that this might be
从1.15开始,任务结束不会主动把JobManager删除掉了。所以Kubernetes Operator就可以正常查到Job状态并且更新
Best,
Yang
¥¥¥ 于2022年10月25日周二 15:58写道:
> 退订
>
>
>
>
> --原始邮件--
> 发件人:
> "user-zh"
>
> 发送时间:2022年10月25日(星期二) 下午3:33
> 收件人:"user-zh"
> 主题:batch
I have created a ticket[1] to fill the missing part in the native K8s
documentation.
[1]. https://issues.apache.org/jira/browse/FLINK-29705
Best,
Yang
Gyula Fóra 于2022年10月20日周四 13:37写道:
> Hi!
>
> As a reference you can look at how the Flink Kubernetes Operator manages
> RBAC settings:
>
>
>
Add some more information to Gyula's comment.
For application mode without checkpoint, you do not need to activate the HA
since it will not take any effect and the Flink job will be submitted again
after the JobManager restarted. Because the job submission happens on the
JobManager side.
For
Currently, exporting the env "HADOOP_CONF_DIR" could only work for native
K8s integration. The flink client will try to create the
hadoop-config-volume automatically if hadoop env found.
If you want to set the HADOOP_CONF_DIR in the docker image, please also
make sure the specified hadoop conf
你在Flink client端提交任务前设置一下HADOOP_CONF_DIR环境变量
然后再运行flink run-application命令
Best,
Yang
yanfei lei 于2022年9月22日周四 11:04写道:
> Hi Tino,
> 从org.apache.flink.core.fs.FileSystem.java
> <
>
ay to change that to use standalone K8s? I haven't
> seen anything about that in the docs, besides a mention that standalone
> support is coming in version 1.2 of the operator.
>
> Thanks,
>
> Javier
>
>
> On Thu, Sep 8, 2022, 22:50 Yang Wang wrote:
>
>> Since t
Since the flink-kubernetes-operator is using native K8s integration[1] by
default, you need to give the permissions of pod and deployment as well as
ConfigMap.
You could find more information about the RBAC here[2].
[1].
uld be a warning then
>
> What about the 1st error we encountered regarding the kube/config file
> exception?
>
>
> Thank you so much,
> Best,
> Tamir
>
> --
> *From:* Yang Wang
> *Sent:* Thursday, September 8, 2022 7:08 AM
> *To:*
Given that the "local://" schema means the jar is available in the
image/container of JobManager, so it could only be supported in the K8s
application mode.
If you configure the jarURI to "file://" schema for session cluster, it
means that this jar file should be available in the
For native K8s integration, the Flink ResourceManager will delete the
JobManager K8s deployment as well as the HA data once the job reached a
globally terminal state.
However, it is indeed a problem for standalone mode since the JobManager
will be restarted again even the job has finished. I
-events-insertion-cluster-config-map" already
> exists.
>
> Log file is enclosed.
>
> Thanks,
> Tamir.
>
> --
> *From:* Yang Wang
> *Sent:* Monday, September 5, 2022 3:03 PM
> *To:* Tamir Sagi
> *Cc:* user@flink.apache.org ; Lihi Peretz <
> lihi.per...@
Could you please check whether the "kubernetes.config.file" is configured
to /opt/flink/.kube/config in the Flink configmap?
It should be removed before creating the Flink configmap.
Best,
Yang
Tamir Sagi 于2022年9月4日周日 18:08写道:
> Hey All,
>
> We recently updated to Flink 1.15.1. We deploy
I do not think we could add an additional port to the rest service since it
is created by Flink internally.
Actually, I do not suggest scrapping the metrics from rest service.
Instead, the port in the pod needs to be used.
Because the metrics might not work correctly if multiple JobManagers are
"ip" might work. Let me try that. I
>>> believe there should be a reason to always override the
>>> "REST_SERVICE_EXPOSED_TYPE" to "ClusterIP".
>>>
>>> [1] https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html
>>&
I am afraid the current flink-kubernetes-operator always overrides the
"REST_SERVICE_EXPOSED_TYPE" to "ClusterIP".
Could you please share why the ingress[1] could not meet your requirements?
Compared with NodePort, I think it is a more graceful implementation.
[1].
我猜测你是因为没有给TM设置service account,导致TM没有权限从K8s ConfigMap拿到leader,从而注册到RM、JM
-Dkubernetes.taskmanager.service-account=wuzhiheng \
Best,
Yang
Xuyang 于2022年8月30日周二 23:22写道:
> Hi, 能贴一下TM的日志吗,看Warn的日志貌似是TM一直起不来
> 在 2022-08-30 03:45:43,"Wu,Zhiheng" 写道:
> >【问题描述】
> >启用HA配置之后,taskmanager
It is caused by the following assert. Maybe we could *File.pathSeparator*
instead of "/".
*assertThat(optional.get()).isEqualTo(hadoopHome + "/conf");*
Would you like to create a ticket and attach a PR for this issue?
Best,
Yang
hjw <1010445...@qq.com> 于2022年8月21日周日 19:44写道:
> When I run mvn
We have the *kubernetes.jobmanager.cpu.limit-factor* and
*kubernetes.jobmanager.memory.limit-factor* to control the limit value.
The resources limit memory will be set to memory/cpu * limit-factor.
Best,
Yang
PACE, JAMES 于2022年7月28日周四 01:26写道:
> That does not seem to work.
>
>
>
> For
Webhook主要的作用是做CR的校验,避免提交到K8s上之后才发现
例如:parallelism被错误的设置为负值,jarURI没有设置等
Best,
Yang
Kyle Zhang 于2022年7月27日周三 18:59写道:
> Hi,all
> 最近在看flink-k8s-operator[1],架构里有一个flink-webhook,请问这个container的作用是什么,如果配置
> webhook.create=false对整体功能有什么影响?
>
> Best regards
>
> [1]
>
>
Removing the nodePort for every different Flink application is necessary so
that it could pick up a random port.
Moreover, I believe you also need to change some other yamls. For example,
having a different name for JobManager/TaskManager yamls, update
the jobmanager-service.yaml and
Congrats! Thanks Gyula for driving this release, and thanks to all
contributors!
Best,
Yang
Gyula Fóra 于2022年7月25日周一 10:44写道:
> The Apache Flink community is very happy to announce the release of Apache
> Flink Kubernetes Operator 1.1.0.
>
> The Flink Kubernetes Operator allows users to
Congrats! Thanks Gyula for driving this release, and thanks to all
contributors!
Best,
Yang
Gyula Fóra 于2022年7月25日周一 10:44写道:
> The Apache Flink community is very happy to announce the release of Apache
> Flink Kubernetes Operator 1.1.0.
>
> The Flink Kubernetes Operator allows users to
e 方式提交运行的任务,则用 FlinkDeployment,并配置好 job 部分。 会自动创建
> flink 集群,并根据 job 配置运行job。
> 这种方式不需要考虑集群创建、任务提交的步骤,本身就是一体。
> (2)对于 session 集群的创建,也是用 FlinkDeployment ,只是不需要指定 job 配置即可。
> (3)配合通过(2)方式创建的 session 集群,则可以配合 FlinkSessionJob 提交任务。
>
> Yang Wang 于2022年7月12日周二 17:10写道:
> &g
r?
> What is the advantag doing so.
>
> Yang Wang 于2022年7月14日周四 10:55写道:
> >
> > I think the standalone mode support is expected to be done in the
> version 1.2.0[1], which will be released on Oct 1 (ETA).
> >
> > [1].
> https://cwiki.apache.org/confluence/display/FLIN
确认一下你是否正确设置了HADOOP_CONF_DIR环境变量
Best,
Yang
lishiyuan0506 于2022年7月14日周四 09:41写道:
> 打扰大家一下,请问一下各位在yarn提交flink的时候,有没有遇到过Retrying connect to server:
> 0.0.0.0/0.0.0.0:8030这个异常
>
>
> hadoop的classpath没问题,Spark和MR在Yarn上跑也没问题,就flink有这样的问题
>
>
> | |
> lishiyuan0506
> |
> |
> lishiyuan0...@163.com
> |
>
I think the standalone mode support is expected to be done in the version
1.2.0[1], which will be released on Oct 1 (ETA).
[1].
https://cwiki.apache.org/confluence/display/FLINK/Release+Schedule+and+Planning
Best,
Yang
Javier Vegas 于2022年7月14日周四 06:25写道:
> Hello! The operator docs
>
方式是什么原理。
> >
> > yidan zhao 于2022年7月12日周二 12:48写道:
> > >
> > > 我理解的 k8s 集群内是组成 k8s 的机器,是必须在 pod 内?我在k8s的node上也不可以是吧。
> > >
> > > Yang Wang 于2022年7月12日周二 12:07写道:
> > > >
> > > > 日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink
>
日志里面已经说明的比较清楚了,如果用的是ClusterIP的方式,那你的Flink
client必须在k8s集群内才能正常提交。例如:起一个Pod,然后再pod里面执行flink run
否则你就需要NodePort或者LoadBalancer的方式了
2022-07-12 10:23:23,021 WARN
org.apache.flink.kubernetes.KubernetesClusterDescriptor [] -
Please note that Flink client operations(e.g. cancel, list, stop,
Thanks Gyula for working on the first patch release for the Flink
Kubernetes Operator project.
Best,
Yang
Gyula Fóra 于2022年6月28日周二 00:22写道:
> The Apache Flink community is very happy to announce the release of Apache
> Flink Kubernetes Operator 1.0.1.
>
> The Flink Kubernetes Operator
Thanks Gyula for working on the first patch release for the Flink
Kubernetes Operator project.
Best,
Yang
Gyula Fóra 于2022年6月28日周二 00:22写道:
> The Apache Flink community is very happy to announce the release of Apache
> Flink Kubernetes Operator 1.0.1.
>
> The Flink Kubernetes Operator
Could you please share the JobManager logs of failed deployment? It will
also help a lot if you could show the pending pod status via "kubectl
describe ".
Given that the current Flink Kubernetes Operator is built on top of native
K8s integration[1], the Flink ResourceManager should allocate
Do you mean the HttpArtifactFetcher could not support HTTPS?
cc @Aitozi
Best,
Yang
calvin beloy 于2022年6月22日周三 22:10写道:
> Sorry typo "jarring" should be "jar url".
>
> Sent from Yahoo Mail on Android
>
Matyas's answer is on the point.
You need to mount a shared volume for all the JobManager pods so that the
uploaded jars are visible for them all.
Best,
Yang
Őrhidi Mátyás 于2022年6月23日周四 04:34写道:
> I guess the problem here is that your JM pods do not have access to a
> common upload folder.
Do you have installed the operator along with CRD[1]?
[1].
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.0/docs/try-flink-kubernetes-operator/quick-start/#deploying-the-operator
Best,
Yang
yu'an huang 于2022年6月23日周四 13:04写道:
> Hi,
>
> It seems that you can't find
just to get a few files into a pod container... I feel it's just a bit
> much.
> But again, this is my frustration with k8s, not with Flink ;-)
>
> Cheers,
> Matt
>
> On Wed, Jun 22, 2022 at 5:32 AM Yang Wang wrote:
>
>> Matyas and Gyula have shared many great informations a
Matyas and Gyula have shared many great informations about how to make the
Flink Kubernetes Operator work on the EKS.
One more input about how to prepare the user jars. If you are more familiar
with K8s, you could use persistent volume to provide the user jars and them
mount the volume to
Could you please have a try with high availability enabled[1]?
If HA enabled, the internal jobmanager rpc service will not be created.
Instead, the TaskManager retrieves the JobManager address via HA services
and connects to it via pod ip.
[1].
Zhanghao的回答已经非常全面了,我再补充小点,删除Deployment保留HA ConfigMap是预期内的行为,文档里面有说明[1]
之所以这样设计有两点原因:
(1.) 任务可能会被重启,但使用相同的cluster-id,并且希望从之前的checkpoint恢复
(2.) 单纯的删除ConfigMap会导致存储在DFS(e.g. HDFS、S3、OSS)上面的HA数据泄露
[1].
The Apache Flink community is very happy to announce the release of Apache
Flink Kubernetes Operator 1.0.0.
The Flink Kubernetes Operator allows users to manage their Apache Flink
applications and their lifecycle through native k8s tooling like kubectl.
This is the first production ready release
The Apache Flink community is very happy to announce the release of Apache
Flink Kubernetes Operator 1.0.0.
The Flink Kubernetes Operator allows users to manage their Apache Flink
applications and their lifecycle through native k8s tooling like kubectl.
This is the first production ready release
If everything goes well, I will close the VOTE for RC4 on Friday night,
which should run for more than 48 hours. And then finalize the release.
Best,
Yang
Gyula Fóra 于2022年6月1日周三 23:30写道:
> Hi Jeesmon!
>
> We are currently working through the release process. We are now in the
> middle of
The current application mode has the limitation that only one job could be
submitted when HA enabled[1].
So a feasible solution is to use the session mode[2], it will be supported
in the coming release-1.0.0.
However, I am afraid it still could not satisfy your requirement "2 task
managers (one
Maybe you could have a try on the flink-kubernetes-operator[1]. It is
designed for using Kubernetes CRD to manage the Flink applications.
[1].
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-0.1/
Best,
Yang
Devin Bost 于2022年5月18日周三 08:29写道:
> Hi,
>
> I'm looking at
The usrlib for YARN only works for 1.15.0 and later versions. Refer to the
ticket[1] for more information.
[1]. https://issues.apache.org/jira/browse/FLINK-24897
Best,
Yang
Pavel Penkov 于2022年5月16日周一 22:59写道:
> I can't manage to run an application on YARN because of classpath issues.
> Flink
e were no more logs there.
>
> How can I reproduce the issue ?
>
> On Thu, May 12, 2022 at 10:35 AM Yang Wang wrote:
>
>> The SUSPENDED state is usually caused by lost leadership. Maybe you could
>> find more information about leader in the JobManager and TaskManager logs.
The SUSPENDED state is usually caused by lost leadership. Maybe you could
find more information about leader in the JobManager and TaskManager logs.
Best,
Yang
Xiaolong Wang 于2022年5月11日周三 19:18写道:
> Hello,
>
> Recently our Flink jobs on Native K8s encountered failing in the
> `SUSPENDED`
可以临时通过-D "$internal.pipeline.job-id="来自定义job id,但是个内部参数
你可以看下[1],了解更多讨论的信息
[1]. https://issues.apache.org/jira/browse/FLINK-19358
Best,
Yang
谭家良 于2022年5月11日周三 22:17写道:
>
>
> 我使用的Application模式:Kubernetes
> 我使用的HA模式:Kubernetes HA
>
>
> 目前Application + HA发现所有的Job
Currently, the flink-kubernetes-operator is using Flink native K8s
integration[1], which means Flink ResourceManager will dynamically allocate
TaskManager on demand.
So the users do not need to specify the replicas of TaskManager.
Just like Gyula said, one possible solution to make "kubectl
rvers"
>> - "my.kafka.host:9093"
>> - "--kafka.sasl.username"
>> - "$(KAFKA_SASL_USERNAME)"
>> - "--kafka.sasl.password"
>> - "$(KAFKA_SASL_PASSWORD)"
>>
>>
>> It would be a great addition, simpli
Flink could not support environment replacement in the args. I think you
could access the env via "*System.getenv()*" in the user main method.
It should work since the user main method is executed in the JobManager
side.
Best,
Yang
Őrhidi Mátyás 于2022年4月28日周四 19:27写道:
> Also,
>
> just
I am afraid we do not handle the scenario that the JobManager deployment is
deleted externally.
Best,
Yang
Őrhidi Mátyás 于2022年5月2日周一 16:52写道:
> I filed a Jira for tracking this issue:
> https://issues.apache.org/jira/browse/FLINK-27468
>
> On Mon, May 2, 2022 at 10:31 AM Őrhidi Mátyás
>
* 使用flink run命令来提交任务到running的Session集群的话,只能是本地的jar
* 也可以使用rest接口来提交,先上传到JobManager端[1],然后运行上传的jar[2]
* 最后可以尝试一下flink-kubernetes-operator项目,目前Session job是支持远程jar的[3],项目还在不断完善
[1].
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/#jars-upload
[2].
Using the pod template to configure the local SSD(via host-path or local
PV) is the correct way.
After that, either "java.io.tmpdir" or "process.taskmanager.working-dir" in
CR should take effect.
Maybe you need to share the complete pod yaml and logs of failed
TaskManager.
nit: if the
目前Application模式确实不能支持已经生成好的JobGraph运行,我能想到一个work around的办法是就先写一个user
jar直接把JobGraph提交到local的集群里面
就像下面这样
public class JobGraphRunner {
private static final Logger LOG =
LoggerFactory.getLogger(JobGraphRunner.class);
public static void main(String[] args) throws Exception {
final
track this issue.
>
>
>
> BRs,
>
> Chenyu
>
>
>
> *From: *"Zheng, Chenyu"
> *Date: *Friday, April 22, 2022 at 6:26 PM
> *To: *Yang Wang
> *Cc: *"u...@flink.apache.org" , "
> user-zh@flink.apache.org"
> *Subject: *Re: JobManager d
track this issue.
>
>
>
> BRs,
>
> Chenyu
>
>
>
> *From: *"Zheng, Chenyu"
> *Date: *Friday, April 22, 2022 at 6:26 PM
> *To: *Yang Wang
> *Cc: *"user@flink.apache.org" , "
> user...@flink.apache.org"
> *Subject: *Re: JobManager d
The root cause might be you APIServer is overloaded or not running
normally. And then all the pods events of
taskmanager-1-9 and taskmanager-1-10 are not delivered to the watch in
FlinkResourceManager.
So the two taskmanagers are not recognized by ResourceManager and then
registration are
The root cause might be you APIServer is overloaded or not running
normally. And then all the pods events of
taskmanager-1-9 and taskmanager-1-10 are not delivered to the watch in
FlinkResourceManager.
So the two taskmanagers are not recognized by ResourceManager and then
registration are
Could you please configure a bigger memory to avoid OOM and use
NMTracker[1] to figure out the memory usage categories?
[1].
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html
Best,
Yang
Dan Hill 于2022年4月21日周四 07:42写道:
> Hi.
>
> I upgraded to Flink v1.14.4
你这个报错主要原因还是访问外部的一些镜像源失败导致的,你可以使用一些云厂商提供的代理来解决拉镜像失败的问题
或者使用--set webhook.create=false来关闭webhook功能
Best,
Yang
casel.chen 于2022年4月14日周四 15:46写道:
> The deployment 'cert-manager-webhook' shows
> Failed to pull image "quay.io/jetstack/cert-manager-webhook:v1.7.1": rpc
> error: code = Unknown desc =
If you are trying to submit a job to an already-running application via
"flink run", then it will not succeed. Because this is the by-design
behavior.
Please note that triggering a savepoint will also update the checkpoint
information in HA ConfigMap, so deleting the deployment(with HA ConfigMap
It seems that you have a typo when specifying the pipeline classpath.
"file:///flink-jar/flink-connector-rabbitmq_2.12-1.14.4.jar" ->
"file:///flink-jars/flink-connector-rabbitmq_2.12-1.14.4.jar"
If this is not the root cause, maybe you could have a try with downloading
the connector jars to
Thanks for the interest on the flink-kubernetes-operator project. I believe
you could leave a comment in the ticket FLINK-27049.
If the reporter has not start working on this ticket, then you could be
assigned.
Best,
Yang
Hao t Chang 于2022年4月6日周三 06:30写道:
> Hi Gyula
>
>
>
> Thanks for the
@Gyula Fóra is trying to prepare the preview
release(0.1) for flink-kubernetes-operator. It now is fully functional for
application mode.
You could have a try and share more feedback with the community.
The release-1.0 aims for production ready. And we still miss some important
pieces(e.g.
ubernetes.config.file=/home/devuser/.kube/config
> examples/batch/WordCount.jar
>
>
>
> Best regards,
>
> Burcu
>
>
>
> *From:* Yang Wang [mailto:danrtsey...@gmail.com]
> *Sent:* Saturday, March 26, 2022 7:48 AM
> *To:* Burcu Gul POLAT EGRI
> *Cc:* user@flink.a
Could you please verify whether the JobManager is going through a long full
GC or the Kubernetes APIServer is working well at that moment?
We are using Kubernetes HA service in the production and it seems stable
without your issue.
Best,
Yang
marco andreas 于2022年3月27日周日 18:35写道:
>
> Hello,
>
> In the example, we can pass args in the command, is there a way to do it
by using the flink-conf.yaml?
Yes. All the changes in the $FLINK_HOME/conf/flink-conf.yaml at the client
side will also be picked up when deploying a native K8s cluster.
For your use case, I am also suggesting the
Could you please share the result of "kubectl describe pods" when getting
stuck? It will be very useful to help to figure out the root cause.
I guess it might be related to insufficient resources for minikube.
Best,
Yang
Őrhidi Mátyás 于2022年3月26日周六 03:12写道:
> It's worth checking the
The root cause might be the LoadBalancer could not really work in your
environment. We already have a ticket to track this[1] and will try to get
it resolved in the next release.
For now, could you please have a try by adding
"-Dkubernetes.rest-service.exposed.type=NodePort" to your session and
This log means the Flink internal leader elector failed to renew the leader
ConfigMap to keep its leadership. It might be caused by a network issue,
long fullGC or the K8s APIServer internal error.
This blog[1] could help you to know how the Kubernetes HA works.
[1].
你用新版本试一下,看着是已经修复了
https://issues.apache.org/jira/browse/FLINK-19212
Best,
Yang
崔深圳 于2022年3月9日周三 10:31写道:
>
>
>
> web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server
> side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException:
> Failed to serialize the result for RPC call :
>
If you want to use the RestClusterClient to do the job submission and
lifecycle management, the implementation in the
flink-kubernetes-operator[1] project may give you some insights.
You could also use /jars/:jarid/run[2] to run a Flink job. It is a pure
HTTP interface.
[1].
Standalone Flink on K8s 和 native K8s都会有你说的这个问题
主要原因是标准输出打印到的pod console了,所以通过kubectl logs可以查看stdout日志,但webUI上就没有
你可以参考这个commit[1]自己编译一个Flink binary来实现
[1].
https://github.com/wangyang0918/flink/commit/2454b6daa2978f9ea9669435f92a9f2e78de357a
Best,
Yang
xinzhuxiansheng 于2022年3月2日周三 15:00写道:
This might be related with FLINK-21928 and seems already fixed in 1.14.0.
But it will have some limitations and users need to manually clean up the
HA entries.
Best,
Yang
Parag Somani 于2022年2月24日周四 13:42写道:
> Hello,
>
> Recently due to log4j vulnerabilities, we have upgraded to Apache Flink
>
last resort but didn't
> think to do it together with "--fromSavepoint".
>
> Thanks again!
>
> On Sun, Feb 20, 2022 at 9:49 PM Yang Wang wrote:
>
>> By design, we should support arbitrary config keys via the CLI when using
>> generic CLI mode.
>>
&g
> The only one i'm having problems with is
> "execution.savepoint.ignore-unclaimed-state".
>
> On Fri, Feb 18, 2022 at 3:42 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
>> Hi Andrey,
>>
>> It's unclear to me from the docs[1] if the flink
ge prior cluster creation; all logs' files are there.
> once the cluster is deployed, they are missing. (bug?)
>
> Best,
> Tamir.
> --
> *From:* Tamir Sagi
> *Sent:* Friday, January 21, 2022 7:19 PM
> *To:* Yang Wang
> *Cc:* user@flink.apache.org
> *
ePathList
> "$FLINK_TM_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" ${CLASS_TO_RUN}
> "${ARGS[@]}" "${FLINK_ENV_JAVA_OPTS}"
>
> Unless there are system configurations which not supposed to be overridden
> by user(And then having dedicated env variables
1 - 100 of 640 matches
Mail list logo