Re: Flink HA with Zookeeper and Docker Compose: unable to startup a working setup.

2024-01-15 Thread Yang Wang
wrote: > Hello, > i'm trying to setup a testing environment using: > > - Flink HA with Zookeeper > - Docker Compose > > While starting the TaskManager generates an exception and then after some > restarts if fails. > > The exception is: > "Caused by: org.apache.flink.

Flink HA with Zookeeper and Docker Compose: unable to startup a working setup.

2023-12-29 Thread Alessio Bernesco Làvore
Hello, i'm trying to setup a testing environment using: - Flink HA with Zookeeper - Docker Compose While starting the TaskManager generates an exception and then after some restarts if fails. The exception is: "Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fe

Flink HA on Kubernetes - RPC port

2023-01-20 Thread bastien dine
Hello, We are migrating our HA setup from ZK to K8S, and we have a question regarding the RPC port. Previously with ZK, the RPC connection config was the : high-availability.jobmanager.port We were expecting that the config will be the same with K8S HA, as the doc says : "The port (range) used

Re: Activate Flink HA without checkpoints on k8S

2022-10-19 Thread Yang Wang
Add some more information to Gyula's comment. For application mode without checkpoint, you do not need to activate the HA since it will not take any effect and the Flink job will be submitted again after the JobManager restarted. Because the job submission happens on the JobManager side. For

Re: Activate Flink HA without checkpoints on k8S

2022-10-13 Thread Gyula Fóra
Without HA, if the jobmanager goes down, job information is lost so the job won’t be restarted after the JM comes back up. Gyula On Thu, 13 Oct 2022 at 19:07, marco andreas wrote: > > > Hello, > > Can someone explain to me what is the point of using HA when deploying an > application cluster

Activate Flink HA without checkpoints on k8S

2022-10-13 Thread marco andreas
Hello, Can someone explain to me what is the point of using HA when deploying an application cluster with a single JM and the checkpoints are not activated. AFAK when the pod of the JM goes down kubernetes will restart it anyway so we don't need to activate the HA in this case. Maybe there's

??????k8s??????application????flink????HA??????????????

2022-08-28 Thread hjw
1.K8s??HAconfigMap. configMap??: cluster-id--config-map.jobgraph??checkpointhigh-availability.storageDir??

Re: Need help of deploying Flink HA on kubernetes cluster

2021-08-02 Thread Yang Wang
Could you please check that the allocated load balancer could be accessed locally(on the Flink client side)? Best, Yang Fabian Paul 于2021年7月29日周四 下午7:45写道: > Hi Dhiru, > > Sorry for the late reply. Once the cluster is successfully started the web > UI should be reachable if you somehow forward

Re: Need help of deploying Flink HA on kubernetes cluster

2021-07-29 Thread Fabian Paul
Hi Dhiru, Sorry for the late reply. Once the cluster is successfully started the web UI should be reachable if you somehow forward the port of the running pod. Although with the exception you have shared I suspect the cluster never fully runs (or not long enough). Can you share the full

Re: Need help of deploying Flink HA on kubernetes cluster

2021-07-22 Thread Fabian Paul
Hi Dhiru, No worries I completely understand your point. Usually all the executable scripts from Flink can be found in the main repository [1]. We also provide a community edition of our commercial product [2] which manages the lifecycle of the cluster and you do not have to use these scripts

Re: Need help of deploying Flink HA on kubernetes cluster

2021-07-22 Thread Fabian Paul
Hi Dhirendra, Thanks for reaching out. A good way to start is to have a look at [1] and [2]. Once you have everything setup it should be possible to delete the pod of the JobManager while an application is running and the job successfully recovers. You can use one of the example Flink

Need help of deploying Flink HA on kubernetes cluster

2021-07-21 Thread Dhiru
hi ,    I am very new to flink , I am planning to install Flink HA setup on eks cluster with 5 worker nodes . Please can some one point me to right materials or direction how to install as well as any sample job which I can run only for testing and confirm all things are working as expected

Re: Flink HA目录下数据不完整,导致JobManager启动失败。

2020-12-08 Thread 赵一旦
0.1 >> INFO [] - Loading configuration property: >> taskmanager.memory.network.min, 1gb >> INFO [] - Loading configuration property: >> taskmanager.memory.network.max, 8gb >> INFO [] - Loading configuration property: >> taskmanager.memory.framework.off-heap.size, 1gb >> INFO

Re: Flink HA目录下数据不完整,导致JobManager启动失败。

2020-12-08 Thread 赵一旦
: > taskmanager.memory.framework.heap.size, 1gb > INFO [] - Loading configuration property: high-availability, zookeeper > INFO [] - Loading configuration property: high-availability.storageDir, > bos://flink-bucket/flink/ha > INFO [] - Loading configuration property: > high-availabilit

Flink HA目录下数据不完整,导致JobManager启动失败。

2020-12-08 Thread 赵一旦
: taskmanager.memory.framework.heap.size, 1gb INFO [] - Loading configuration property: high-availability, zookeeper INFO [] - Loading configuration property: high-availability.storageDir, bos://flink-bucket/flink/ha INFO [] - Loading configuration property: high-availability.zookeeper.quorum, bjhw-aisecurity-cassandra01.bjhw:9681

Re: Uploading job jar via web UI in flink HA mode

2020-12-02 Thread sidhant gupta
andler.channelRead0(FileUploadHandler.java:159) >> [flink-dist_2.11-1.11.2.jar:1.11.2] >> >> at >> org.apache.flink.runtime.rest.FileUploadHandler.channelRead0(FileUploadHandler.java:68) >> [flink-dist_2.11-1.11.2.jar:1.11.2] >> >> at >> org.apache.

Re: Uploading job jar via web UI in flink HA mode

2020-12-02 Thread Till Rohrmann
-for-aws Cheers, Till On Wed, Dec 2, 2020 at 11:31 AM sidhant gupta wrote: > Hi All, > > I have 2 job managers in flink HA mode cluster setup. I have a load > balancer forwarding request to both (leader and stand by) the job managers > in default round-robin fashion. While upload

Uploading job jar via web UI in flink HA mode

2020-12-02 Thread sidhant gupta
Hi All, I have 2 job managers in flink HA mode cluster setup. I have a load balancer forwarding request to both (leader and stand by) the job managers in default round-robin fashion. While uploading the job jar the Web UI is fluctuating between the leader and standby page. Its difficult to upload

Re: Flink HA for Job Cluster

2020-02-10 Thread KristoffSC
Thanks you both for answers. So I just want to have this right. I can I achieve HA for Job Cluster Docker config having the zookeeper quorum configured like mentioned in [1] right (with s3 and zookeeper)? I assume to modify default Job Cluster config to match the [1] setup. [1]

Re: Flink HA for Job Cluster

2020-02-09 Thread KristoffSC
Thanks you both for answers. So I just want to have this right. I can I achieve HA for Job Cluster Docker config having the zookeeper quorum configured like mentioned in [1] right (with s3 and zookeeper)? I assume to modify default Job Cluster config to match the [1] setup. [1]

Re: Flink HA for Job Cluster

2020-02-09 Thread Yang Wang
Just like tison has said, you could use a deployment to restart the jobmanager pod. However, if you want to make the all jobs could recover from the checkpoint, you also need to use the zookeeper and HDFS/S3 to store the high-availability data. Also some Kubernetes native HA support is in

Re: Flink HA for Job Cluster

2020-02-09 Thread tison
Hi Krzysztof, Flink doesn't provide JM HA itself yet. For YARN deployment, you can rely on yarn.application-attempts configuration[1]; for Kubernetes deployment, Flink uses Kubernetes deployment to restart a failed JM. Though, such standalone mode doesn't tolerate JM failure and strategies

Flink HA for Job Cluster

2020-02-07 Thread KristoffSC
Hi, In [1] where we can find setup for Stand Alone an YARN clusters to achieve Job Manager's HA. Is Standalone Cluster High Availability with a zookeeper the same approach for Docker's Job Cluster approach with Kubernetes? [1]

Re: Multiple Job Managers in Flink HA Setup

2019-09-26 Thread Yang Wang
f09a356c73938@%3Cdev.flink.apache.org%3E > > On Fri, Sep 20, 2019 at 10:57 PM Steven Nelson > wrote: > >> Hello! >> >> I am having some difficulty with multiple job managers in an HA setup >> using Flink 1.9.0. >> >> I have 2 job managers and have setup

Re: Multiple Job Managers in Flink HA Setup

2019-09-25 Thread Gary Yao
ilability.cluster-id: /imet-enhance > high-availability.storageDir: hdfs:///flink/ha/ > high-availability.zookeeper.quorum: > flink-state-hdfs-zookeeper-1.flink-state-hdfs-zookeeper-headless.default.svc.cluster.local:2181,flink-state-hdfs-zookeeper-2.flink-state-hdfs-zookeeper-headless.default.

Multiple Job Managers in Flink HA Setup

2019-09-20 Thread Steven Nelson
Hello! I am having some difficulty with multiple job managers in an HA setup using Flink 1.9.0. I have 2 job managers and have setup the HA setup with the following config high-availability: zookeeper high-availability.cluster-id: /imet-enhance high-availability.storageDir: hdfs:///flink/ha

Re: Flink HA cluster on YARN is restarted more than yarn.application-attempts value

2019-06-02 Thread Kazunori Shinhira
test, I set “yarn.application-attempts” to 5, but Flink cluster >> was recovered more than 5 times. >> >> >> Does anyone know what “yarn.application-attempts” mean, and when Flink >> cluster’s attempts time will be incremented ? >> >> >> I asked sa

Re: Flink HA cluster on YARN is restarted more than yarn.application-attempts value

2019-06-02 Thread Shuyi Chen
t get it. > > > > https://stackoverflow.com/questions/56225088/why-is-flink-ha-cluster-on-yarn-recovered-more-than-the-maximum-number-of-attemp > > > > Best, > -- > Kazunori Shinhira > Mail : k.shinhira.1...@gmail.com >

Flink HA cluster on YARN is restarted more than yarn.application-attempts value

2019-06-02 Thread 新平和礼
/56225088/why-is-flink-ha-cluster-on-yarn-recovered-more-than-the-maximum-number-of-attemp Best, -- Kazunori Shinhira Mail : k.shinhira.1...@gmail.com

Re: flink ha hdfs目录权限问题

2019-04-01 Thread Yun Tang
-putting-files-on-hdfs-from-a-remote-machine From: 孙森 Sent: Monday, April 1, 2019 16:16 To: user-zh@flink.apache.org Subject: Re: flink ha hdfs目录权限问题 修改目录权限对已有的文件是生效的,新生成的目录还是没有写权限。 [root@hdp1 ~]# hadoop fs -ls /flink/ha Found 15 items drwxrwxrwx - hdfs hdfs

Re: flink ha hdfs目录权限问题

2019-04-01 Thread 孙森
修改目录权限对已有的文件是生效的,新生成的目录还是没有写权限。 [root@hdp1 ~]# hadoop fs -ls /flink/ha Found 15 items drwxrwxrwx - hdfs hdfs 0 2019-04-01 15:13 /flink/ha/0e950900-c00e-4f24-a0bd-880ba9029a92 drwxrwxrwx - hdfs hdfs 0 2019-04-01 15:13 /flink/ha/42e61028-e063-4257-864b-05f46e121a4e

Re: flink ha hdfs目录权限问题

2019-04-01 Thread Yun Tang
Hi 孙森, 将提交用户root加到hadoop的hdfs用户组内,或者使用hadoop的hdfs用户提交程序[1],或者修改整个目录HDFS:///flink/ha的权限[2] 放开到任意用户应该可以解决问题,记得加上 -R ,保证对子目录都生效。 [1] https://stackoverflow.com/questions/11371134/how-to-specify-username-when-putting-files-on-hdfs-from-a-remote-machine [2] https://hadoop.apache.org/docs/r2.4.1

flink ha hdfs目录权限问题

2019-04-01 Thread 孙森
Hi all : 我使用flink on yarn 模式启动flink,并且配置了高可用。当向flink cluster提交job时,会出现permission denied的异常。原因是HDFS:///flink/ha下创建的文件夹的权限都是755,没有写权限。所以每启动一个新的flink cluster的时候,就会生成一个新的目录 ,比如:/flink/ha/application_1553766783203_0026。需要修改/flink/ha/application_1553766783203_0026的权限才能成功提交job。请问这个问题应该怎么解决呢

Re: Re: flink ha模式进程hang!!!

2019-03-26 Thread Han Xiao
非常谢谢您的解答,这个问题是zk中有失败任务的jobGraph,导致每次启动群集就会去检索,删除zk中残余后重启即可解决。 Thank you for your reply! 发件人: baiyg25...@hundsun.com 发送时间: 2019-03-26 09:40 收件人: user-zh 主题: Re: Re: flink ha模式进程hang!!! 是不是跟这个访问控制有关? high-availability.zookeeper.client.acl: open baiyg25...@hundsun.com 发件人: Han Xiao 发送时间

Re: Re: flink ha模式进程hang!!!

2019-03-25 Thread Han Xiao
这个问题早上的时候已经解决,就是因为zk中有残余的失败jobGraph,删除即可恢复群集。 真的非常谢谢您,以后还要多和您请教学习。 Thank you for your reply! 发件人: Zili Chen 发送时间: 2019-03-26 09:46 收件人: user-zh@flink.apache.org 主题: Re: Re: flink ha模式进程hang!!! 如果没有清理此前的 zk 数据的话,有可能是此前你把 high-availability.storageDir 配置成 /flink/ha/zookeeper,随后清理了 hdfs 但是 zk 上还有过期

Re: Re: flink ha模式进程hang!!!

2019-03-25 Thread baiyg25...@hundsun.com
是不是跟这个访问控制有关? high-availability.zookeeper.client.acl: open baiyg25...@hundsun.com 发件人: Han Xiao 发送时间: 2019-03-26 09:33 收件人: user-zh@flink.apache.org 主题: Re: Re: flink ha模式进程hang!!! Hi,早上好,谢谢您的回复,以下是我的配置项及参数: flink-conf.yaml common: jobmanager.rpc.address: test10 jobmanager.rpc.port: 6123

Re: Re: flink ha模式进程hang!!!

2019-03-25 Thread Zili Chen
如果没有清理此前的 zk 数据的话,有可能是此前你把 high-availability.storageDir 配置成 /flink/ha/zookeeper,随后清理了 hdfs 但是 zk 上还有过期的 handler 的信息 Best, tison. Han Xiao 于2019年3月26日周二 上午9:33写道: > Hi,早上好,谢谢您的回复,以下是我的配置项及参数: > > flink-conf.yaml > common: > jobmanager.rpc.address: test10 > jobmana

Re: Re: flink ha模式进程hang!!!

2019-03-25 Thread Han Xiao
-availability: zookeeper high-availability.storageDir: hdfs://test10:8020/flink/ha/ ##此文件目录可以正常生成,但无jobGraph相关目录; high-availability.zookeeper.quorum: ip1:2181,ip2:2181,ip3:2181,ip4:2181,ip5:2181 high-availability.zookeeper.client.acl: open Fault tolerance and checkpointing: state.backend

Re: flink ha模式进程hang!!!

2019-03-25 Thread Zili Chen
能提供你的 ha 配置吗?特别是 high-availability.storageDir,我怀疑是不是没有配置这个啊 Best, tison. Han Xiao 于2019年3月25日周一 下午7:26写道: > 各位朋友大家好,我是flink初学者,部署flink ha的过程中出现一些问题,麻烦大家帮忙看下; > 启动flink ha后,jobmanager进程直接hang,使用的flink 1.7.2版本,下面log中有一处出现此错误 File does > not exist: /flink/ha/

Flink HA setup on Kubernetes

2018-12-31 Thread Steven Nelson
-availability.cluster-id: /cluster1 high-availability.storageDir: /flink/ha/ high-availability.zookeeper.quorum: flink-state-hdfs-zookeeper-1.flink-state-hdfs-zookeeper-headless.default.svc.cluster.local:2181,flink-state-hdfs-zookeeper-2.flink-state-hdfs-zookeeper-headless.default.svc.cluster.local:2181

Re: Unable to start Flink HA cluster with Zookeeper

2018-08-22 Thread mozer
Thanks for the info, I have managed to launch a HA cluster with adding rpc.address for all job managers. But it did not work with start-cluster.sh, I had to add one by one. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Unable to start Flink HA cluster with Zookeeper

2018-08-22 Thread Dawid Wysakowicz
Hi, It will use HA settings as long as you specify the high-availability: zookeeper. The jobmanager.rpc.adress is used by the jobmanager as a binding address. You can verify it by starting two jobmanagers and then killing the leader. Best, Dawid On Tue, 21 Aug 2018 at 17:46, mozer wrote: >

Re: Unable to start Flink HA cluster with Zookeeper

2018-08-21 Thread mozer
Yeah, you are right. I have already tried to set up jobmanager.rpc.adress and it works in that case, but if I use this setting I will not be able to use HA, am i right ? How the job manager can register to zookeeper with the right address but not localhost ? -- Sent from:

Re: Unable to start Flink HA cluster with Zookeeper

2018-08-21 Thread Dawid Wysakowicz
Hi, In your case the jobmanager binds itself to localhost and that's what it writes to zookeeper. Try starting the jobmanager manually with jobmanager.rpc.address set to the ip of machine you are running the jobmanager. In other words make sure the jobmanager binds itself to the right ip.

Re: Unable to start Flink HA cluster with Zookeeper

2018-08-21 Thread mozer
FQD or full ip; tried all of them, still no changes ... For ssh connection, I can connect to each machine without passwords. Do you think that the problem can come from : *high-availability.storageDir: file:///shareflink/recovery* ? I don't use a HDFS storage but NAS file system which is

Re: Unable to start Flink HA cluster with Zookeeper

2018-08-21 Thread miki haiat
First of all try with FQD or full ip. Also in order to run HA cluster you need to make sure that you have password less ssh access to your slaves and master communication. . On Tue, Aug 21, 2018 at 4:15 PM mozer wrote: > I am trying to install a Flink HA cluster (Zookeeper mode) but the t

Unable to start Flink HA cluster with Zookeeper

2018-08-21 Thread mozer
I am trying to install a Flink HA cluster (Zookeeper mode) but the task manager cannot find the job manager. Here I give you the architecture; - Machine 1 : Job Manager + Zookeeper - Machine 2 : Task Manager masters: Machine1 slaves : Machine2 flink-conf.yaml

Unable to start Flink HA cluster with Zookeeper

2018-08-21 Thread mozer
I am trying to install a Flink HA cluster (Zookeeper mode) but the task manager cannot find the job manager. Here I give you the architecture; - Machine 1 : Job Manager + Zookeeper - Machine 2 : Task Manager masters: Machine1 slaves : Machine2 flink-conf.yaml

Zookeeper DR backup needed for Flink HA mode?

2018-05-15 Thread David Corley
We're looking at DR scenarios for our Flink cluster. We already use Zookeeper for JM HA. We use a HDFS cluster that's replicated off-site, and our high-availability.zookeeper.storageDir property is configure to use HDFS. However, in the event of a site-failure, is it also essential that we have a

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Rohil Surana
ctory. Also uploaded job jars are >>>stored in the directory if not overridden. By default, the temporary >>>directory is used. >>>- >>> >>>jobmanager.web.upload.dir: The config parameter defining the >>>directory for uploading th

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Sampath Bhat
- >> >>jobmanager.web.upload.dir: The config parameter defining the >>directory for uploading the job jars. If not specified a dynamic directory >>will be used under the directory specified by jobmanager.web.tmpdir. >> >> >> Regards, >> >&

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Rohil Surana
rs. If not specified a dynamic directory will be >used under the directory specified by jobmanager.web.tmpdir. > > > Regards, > > Chirag > > > > On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana > <rohilsuran...@gmail.com> <rohilsuran...@gmail.com> w

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Rohil Surana
:29:43 AM IST, Rohil Surana < > rohilsuran...@gmail.com> wrote: > > > Hi, > > I have a very basic Flink HA setup on Kubernetes and wanted to retain job > jars on JobManager Restarts. > > For HA I am using a Zookeeper and a NFS drive mounted on all pods > (JobManager a

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Chirag Dewan
der the directory specified by jobmanager.web.tmpdir. Regards, Chirag On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana <rohilsuran...@gmail.com> wrote: Hi, I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts. For HA I am using a

Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-05 Thread Rohil Surana
Hi, I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars on JobManager Restarts. For HA I am using a Zookeeper and a NFS drive mounted on all pods (JobManager and TaskManagers), that is being used for checkpoints and have also set the `web.upload.dir: /data/flink

Re: accessing flink HA cluster with scala shell/zeppelin notebook

2018-03-01 Thread santoshg
Hi Alexis, Were you able to make this work ? I am also looking for zepplin integration with Flink and this might be helpful. Thanks Santosh -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink HA Zookeeper Connection Timeout

2017-11-13 Thread Nico Kruber
) wrote: > Hi – We’re currently testing Flink HA and running into a zookeeper timeout > issue. Error log below. > Is there a production checklist or any information on parameters that are > related to flink HA that I need to pay attention to? > Any pointers would really help. Ple

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-23 Thread Ufuk Celebi
ugh the PATCH API combined with update if condition. >> >> If you don’t want to actually rip way into the code for the Job Manager >> the ETCD Operator would be a good way to bring up an ETCD cluster that is >> separate from the core Kubernetes ETCD database. Combined with zetcd yo

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-22 Thread Hao Sun
th zetcd you could probably have that > up and running quickly. > > Thanks, > James Bucher > > From: Hao Sun <ha...@zendesk.com> > Date: Monday, August 21, 2017 at 9:45 AM > To: Stephan Ewen <se...@apache.org>, Shannon Carey <sca...@expedia.com> > Cc: &qu

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-22 Thread James Bucher
: "user@flink.apache.org<mailto:user@flink.apache.org>" <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: Re: Flink HA with Kubernetes, without Zookeeper Thanks Shannon for the https://github.com/coreos/zetcd tips, I will check that out and share my results if

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-21 Thread Hao Sun
on top of Kubernetes' etcd cluster so that we don't have to rely >> on a separate Zookeeper cluster. However, we haven't tried it yet. >> >> -Shannon >> >> From: Hao Sun <ha...@zendesk.com> >> Date: Sunday, August 20, 2017 at 9:04 PM >> To: "user@flink.

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-21 Thread Stephan Ewen
ugust 20, 2017 at 9:04 PM > To: "user@flink.apache.org" <user@flink.apache.org> > Subject: Flink HA with Kubernetes, without Zookeeper > > Hi, I am new to Flink and trying to bring up a Flink cluster on top of > Kubernetes. > > For HA setup, with kubernetes,

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-21 Thread Shannon Carey
rg<mailto:user@flink.apache.org>> Subject: Flink HA with Kubernetes, without Zookeeper Hi, I am new to Flink and trying to bring up a Flink cluster on top of Kubernetes. For HA setup, with kubernetes, I think I just need one job manager and do not need Zookeeper? I will store all sta

Flink HA with Kubernetes, without Zookeeper

2017-08-20 Thread Hao Sun
Hi, I am new to Flink and trying to bring up a Flink cluster on top of Kubernetes. For HA setup, with kubernetes, I think I just need one job manager and do not need Zookeeper? I will store all states to S3 buckets. So in case of failure, kubernetes can just bring up a new job manager without

Re: accessing flink HA cluster with scala shell/zeppelin notebook

2017-03-27 Thread Alexis Gendronneau
Hi Robert, Hi Till, I tried to setup high-availibility options in zepplin, but i guess it's just a matter of flink version compatibility on zepplin side. I'll try to compile zepplin with 1.2 and add needed parameter to see if its better. Thanks for your help ! 2017-03-27 15:09 GMT+02:00 Till

Re: accessing flink HA cluster with scala shell/zeppelin notebook

2017-03-20 Thread Alexis Gendronneau
Hello users, As Maciek, I'm currently trying to make apache Zeppelin 0.7 working with Flink. I have two versions of flink available (1.1.2 and 1.2.0). Each one is running in High-availability mode. When running jobs from Zeppelin in Flink local mode, everything works fine. But when trying to

Re: Starting flink HA cluster with start-cluster.sh

2017-03-08 Thread Ufuk Celebi
Shouldn't the else branch ``` else HIGH_AVAILABILITY=${DEPRECATED_HA} fi ``` set it to `zookeeper`? Of course, the truth is whatever the script execution prints out. ;-) PS Emails like this should either go to the dev list or it's also fine to open an issue and discuss there (and potentially

Starting flink HA cluster with start-cluster.sh

2017-03-08 Thread Dawid Wysakowicz
Hi, I've tried to start cluster with HA mode as described in the doc, but with a current state of bin/config.sh I failed. I think there is a bug with configuring the HIGH_AVAILABILITY variable in block (bin/config.sh): if [ -z "${HIGH_AVAILABILITY}" ]; then

Re: accessing flink HA cluster with scala shell/zeppelin notebook

2017-01-24 Thread Aljoscha Krettek
+Till Rohrmann , do you know what can be used to access a HA cluster from that setting. Adding Till since he probably knows the HA stuff best. On Sun, 22 Jan 2017 at 15:58 Maciek Próchniak wrote: > Hi, > > I have standalone Flink cluster configured with HA

accessing flink HA cluster with scala shell/zeppelin notebook

2017-01-22 Thread Maciek Próchniak
Hi, I have standalone Flink cluster configured with HA setting (i.e. with zookeeper recovery). How should I access it remotely, e.g. with Zeppelin notebook or scala shell? There are settings for host/port, but with HA setting they are not fixed - if I check which is *current leader* host

Re: Flink HA

2016-02-22 Thread Robert Metzger
Hi Thomas, To avoid having jobs forever restarting, you have to cancel them manually (from the web interface or the /bin/flink client). Also, you can set an appropriate restart strategy (in 1.0-SNAPSHOT), which limits the number of retries. This way the retrying will eventually stop. On Fri, Feb

Re: Flink HA

2016-02-18 Thread Ufuk Celebi
On Thu, Feb 18, 2016 at 6:59 PM, Thomas Lamirault wrote: > We are trying flink in HA mode. Great to hear! > We set in the flink yaml : > > state.backend: filesystem > > recovery.mode: zookeeper > recovery.zookeeper.quorum: > > recovery.zookeeper.path.root: > >

Flink HA

2016-02-18 Thread Thomas Lamirault
Hi ! We are trying flink in HA mode. Our application is a streaming application with windowing mechanism. We set in the flink yaml : state.backend: filesystem recovery.mode: zookeeper recovery.zookeeper.quorum: recovery.zookeeper.path.root: recovery.zookeeper.storageDir:

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-16 Thread Stephan Ewen
org> >>> wrote: >>> >>> >>> >>> Hi Stefano, >>> >>> >>> >>> The Job should stop temporarily but then be resumed by the new >>> >>> JobManager. Have you increased the number of execution retries? >

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-16 Thread Stefano Baghino
gt; The Job should stop temporarily but then be resumed by the new >> >>> JobManager. Have you increased the number of execution retries? AFAIK, >> >>> it is set to 0 by default. This will not re-run the job, even in HA >> >>> mode. You can enable it on the StreamExecutio

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Ufuk Celebi
> On 15 Feb 2016, at 13:40, Stefano Baghino > wrote: > > Hi Ufuk, thanks for replying. > > Regarding the masters file: yes, I've specified all the masters and checked > out that they were actually running after the start-cluster.sh. I'll gladly > share the

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
, Stefano Baghino >> <stefano.bagh...@radicalbit.io> wrote: >> > Hello everyone, >> > >> > last week I've ran some tests with Apache ZooKeeper to get a grip on >> > Flink >> > HA features. My tests went bad so far and I can't sort out the re

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
, 2016 at 12:35 PM, Stefano Baghino > <stefano.bagh...@radicalbit.io> wrote: > > Hello everyone, > > > > last week I've ran some tests with Apache ZooKeeper to get a grip on > Flink > > HA features. My tests went bad so far and I can't sort out the reason. > > &g

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
n, Feb 15, 2016 at 12:35 PM, Stefano Baghino > <stefano.bagh...@radicalbit.io> wrote: > > Hello everyone, > > > > last week I've ran some tests with Apache ZooKeeper to get a grip on > Flink > > HA features. My tests went bad so far and I can't sort out the reason. >

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
k I've ran some tests with Apache ZooKeeper to get a grip on Flink > HA features. My tests went bad so far and I can't sort out the reason. > > My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3 > masters and 4 slaves. The 3 masters are also the ZooKeeper (3.4.

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Ufuk Celebi
. Can you please share the job manager logs of all started job managers? – Ufuk On Mon, Feb 15, 2016 at 12:35 PM, Stefano Baghino <stefano.bagh...@radicalbit.io> wrote: > Hello everyone, > > last week I've ran some tests with Apache ZooKeeper to get a grip on Flink > HA featu

Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
Hello everyone, last week I've ran some tests with Apache ZooKeeper to get a grip on Flink HA features. My tests went bad so far and I can't sort out the reason. My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3 masters and 4 slaves. The 3 masters are also the ZooKeeper

Re: Flink HA mode

2015-09-09 Thread Ufuk Celebi
> On 09 Sep 2015, at 04:48, Emmanuel wrote: > > my questions is: how critical is the bootstrap ip list in masters? Hey Emmanuel, good questions. I read over the docs for this again [1] and you are right that we should make this clearer. The “masters" file is only relevant

Re: Flink HA mode

2015-09-09 Thread Till Rohrmann
The only necessary information for the JobManager and TaskManager is to know where to find the ZooKeeper quorum to do leader election and retrieve the leader address from. This will be configured via the config parameter `ha.zookeeper.quorum`. On Wed, Sep 9, 2015 at 10:15 AM, Stephan Ewen

RE: Flink HA mode

2015-09-09 Thread Fabian Hueske
ing 0.9.1 right now > > > -- > From: ele...@msn.com > To: user@flink.apache.org > Subject: RE: Flink HA mode > Date: Wed, 9 Sep 2015 16:11:38 -0700 > > Been playing with the HA... > I find the UIs confusing here: > in the dashboard on one side I se

Flink HA mode

2015-09-08 Thread Emmanuel
Looking at Flink HA mode. Why do you need to have the list of masters in the config if zookeeper is used to keep track of them? In an environment like Google Cloud or Container Engine, the JM may come back up but will likely have another IP address. Is the masters config file only

re: Flink HA mode

2015-09-08 Thread Zhangrucong
[mailto:ele...@msn.com] 发送时间: 2015年9月9日 7:59 收件人: user@flink.apache.org 主题: Flink HA mode Looking at Flink HA mode. Why do you need to have the list of masters in the config if zookeeper is used to keep track of them? In an environment like Google Cloud or Container Engine, the JM may come back up