Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread 周思华
/1) (3e9374d1bf5fdb359e3a624a4d5d659b) switched from RUNNING to FAILED. java.lang.Exception: TaskManager was lost/killed: c51d3879b6244828eb9fc78c943007ad @ kens-mbp.hsd1.ca.comcast.net (dataPort=63782) — Ken On Apr 9, 2018, at 12:48 PM, Chesnay Schepler <ches...@apache.org> wrote: We wil

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Ted Yu
ction refused: >>> kens-mbp.hsd1.ca.comcast.net/192.168.3.177:63780 >>> 2018-04-07 21:59:21,049 WARN akka.remote.ReliableDeliverySupervisor >>> - Association with remote system [akka.tcp:// >>> fl...@kens-mbp.hsd1.ca.comcast.net:63780] has failed, addres

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Lasse Nedergaard
> Date: 4/10/18 12:25 AM (GMT-08:00) > To: Ken Krugler <kkrugler_li...@transpac.com> > Cc: user <user@flink.apache.org>, Chesnay Schepler <ches...@apache.org> > Subject: Re: java.lang.Exception: TaskManager was lost/killed > > > This graph shows Non-Heap . If the s

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Ted Yu
ay Schepler <ches...@apache.org> Subject: Re: java.lang.Exception: TaskManager was lost/killed This graph shows Non-Heap . If the same pattern exists it make sense that it will try to allocate more memory and then exceed the limit. I can see the trend for all other containers tha

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Lasse Nedergaard
...@kens-mbp.hsd1.ca.comcast.net:63780] has failed, address is now >> gated for [5000] ms. Reason: [Association failed with [akka.tcp:// >> fl...@kens-mbp.hsd1.ca.comcast.net:63780]] Caused by: [Connection >> refused: kens-mbp.hsd1.ca.comcast.net/192.168.3.177:63780] >> 2018-04-07 21:59:21,056 WARN akka.remote.Rem

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Lasse Nedergaard
hsd1.ca.comcast.net:63780] > 2018-04-07 21:59:21,063 INFO org.apache.flink.runtime.jobmanager.JobManager >- Task manager akka.tcp://flink@kens-mbp. > hsd1.ca.comcast.net:63780/user/taskmanager terminated. > 2018-04-07 21:59:21,064 INFO > org.apache.fli

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Ken Krugler
G to FAILED. java.lang.Exception: TaskManager was lost/killed: c51d3879b6244828eb9fc78c943007ad @ kens-mbp.hsd1.ca.comcast.net (dataPort=63782) — Ken > On Apr 9, 2018, at 12:48 PM, Chesnay Schepler <ches...@apache.org> wrote: > > We will need more information to offer any solution. The

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Hao Sun
Same story here, 1.3.2 on K8s. Very hard to find reasons on why a TM is killed. Not likely caused by memory leak. If there is a logger I have turn on please let me know. On Mon, Apr 9, 2018, 13:41 Lasse Nedergaard wrote: > We see the same running 1.4.2 on Yarn hosted

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Lasse Nedergaard
We see the same running 1.4.2 on Yarn hosted on Aws EMR cluster. The only thing I can find in the logs from are SIGTERM with the code 15 or -100. Today our simple job reading from Kinesis and writing to Cassandra was killed. The other day in another job I identified a map state.remove command

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Chesnay Schepler
We will need more information to offer any solution. The exception simply means that a TaskManager shut down, for which there are a myriad of possible explanations. Please have a look at the TaskManager logs, they may contain a hint as to why it shut down. On 09.04.2018 16:01, Javier Lopez

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Javier Lopez
Hi, "are you moving the job jar to the ~/flink-1.4.2/lib path ? " -> Yes, to every node in the cluster. On 9 April 2018 at 15:37, miki haiat wrote: > Javier > "adding the jar file to the /lib path of every task manager" > are you moving the job jar to the*

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread miki haiat
Javier "adding the jar file to the /lib path of every task manager" are you moving the job jar to the* ~/flink-1.4.2/lib path* ? On Mon, Apr 9, 2018 at 12:23 PM, Javier Lopez wrote: > Hi, > > We had the same metaspace problem, it was solved by adding the jar file to >

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Javier Lopez
Hi, We had the same metaspace problem, it was solved by adding the jar file to the /lib path of every task manager, as explained here https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html#avoiding-dynamic-classloading. As well we added these java

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Alexander Smirnov
I've seen similar problem, but it was not a heap size, but Metaspace. It was caused by a job restarting in a loop. Looks like for each restart, Flink loads new instance of classes and very soon in runs out of metaspace. I've created a JIRA issue for this problem, but got no response from the

Re:Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread 王凯
thanks a lot,i will try it 在 2018-04-09 00:06:02,"TechnoMage" 写道: I have seen this when my task manager ran out of RAM. Increase the heap size. flink-conf.yaml: taskmanager.heap.mb jobmanager.heap.mb Michael On Apr 8, 2018, at 2:36 AM, 王凯 wrote:

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-08 Thread TechnoMage
I have seen this when my task manager ran out of RAM. Increase the heap size. flink-conf.yaml: taskmanager.heap.mb jobmanager.heap.mb Michael > On Apr 8, 2018, at 2:36 AM, 王凯 wrote: > > > hi all, recently, i found a problem,it runs well when

java.lang.Exception: TaskManager was lost/killed

2018-04-08 Thread 王凯
hi all, recently, i found a problem,it runs well when start. But after long run,the exception display as above,how can resolve it?