不好意思,我忘记贴图了。 我的flink standalone集群挂了,查看日志,看到截图上的错误 我自己分析不明白,谷歌也查不到对应的问题。希望能得到你们的帮助,谢谢!
问题描述:我正在运行任务的flink集群跑了两天后挂掉了,原因是所有taskmanager进程全部挂了,只有一个jobmanager还在。 集群环境:5台centos7的机器,32核,256GB内存,2个jobmanager,5个taskmanager,每台机器32个slots。jobmanager使用zookeeper做了高可用。 初步分析原因:zookeeper的问题 另外:不小心把日志清理了,没法粘贴文字了~ Xintong Song <tonysong...@gmail.com> 于2019年4月22日周一 下午1:27写道: > Hi naisili, > > This is the user-zh mailing list, so if you speak Chinese you can ask > questions in Chinese. If you prefer using English, you can send emails to > u...@flink.apache.org. Hope that helps you. > > BTW, I think you forgot to attache the screenshot. > > Thank you~ > > Xintong Song > > > > On Mon, Apr 22, 2019 at 10:53 AM naisili Yuan <yuanlong1...@gmail.com> > wrote: > > > I use standalone cluster on flink, and i use zookeeper for the jobmanager > > HA. > > The Screenshot is my taskmanager proccess down log, falte a error. > > And is don't know why it failed, even i google the error. > > Ask for help, thanks. > > > > > > >