Hi Till, I will have to test it with flink 1.7.1 and get back to you. Thanks!
Best, Ethan > On Feb 15, 2019, at 4:01 AM, Till Rohrmann <trohrm...@apache.org> wrote: > > Hi Ethan, > > can you observe a similar behaviour with Flink 1.7.1? Flink 1.4.2 is no > longer supported by the community. > > Cheers, > Till > > On Thu, Feb 14, 2019 at 5:06 PM Ethan Li <ethanopensou...@gmail.com > <mailto:ethanopensou...@gmail.com>> wrote: > The related job manager log is > https://gist.github.com/Ethanlm/86a10e786ad9025ddaa27c113c536da8 > <https://gist.github.com/Ethanlm/86a10e786ad9025ddaa27c113c536da8> > >> On Feb 14, 2019, at 9:40 AM, Ethan Li <ethanopensou...@gmail.com >> <mailto:ethanopensou...@gmail.com>> wrote: >> >> Hello, >> >> I have a standalone flink-1.4.2 cluster with one JobManager, one >> TaskManager, and zookeeper. I first started JM and TM and waited for them >> to be stable. Then I restarted JM. It’s when the TM got confused. >> >> TM got notified that Leader node has changed and it tried to register to the >> new Leader (the new rpc port is 34561). Then it got the acknowledge says >> it’s already registered. And it then kept trying to associate with the old >> JM roc port (35213) and fail. >> >> 2019-02-14 14:56:54,059 INFO >> org.apache.flink.runtime.taskmanager.TaskManager - Trying to >> register at JobManager >> akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager >> <> (attempt 1, timeout: 500 milliseconds) >> 2019-02-14 14:56:54,157 DEBUG >> org.apache.flink.shaded.akka.org.jboss.netty.handler.ssl.SslHandler - [id: >> 0x77ac93ae, /10.215.68.243:46796 <http://10.215.68.243:46796/> => >> openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561 >> <http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561>] >> HANDSHAKEN: TLS_RSA_WITH_AES_128_CBC_SHA >> 2019-02-14 14:56:54,276 INFO >> org.apache.flink.runtime.taskmanager.TaskManager - Successful >> registration at JobManager >> (akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager >> <>), starting network stack and library cache. >> 2019-02-14 14:56:54,276 INFO >> org.apache.flink.runtime.taskmanager.TaskManager - Determined >> BLOB server address to be >> openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100 >> <http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100>. >> Starting BLOB cache. >> 2019-02-14 14:56:54,278 INFO >> org.apache.flink.runtime.blob.PermanentBlobCache - Created BLOB >> cache storage directory >> /home/y/var/flink/blobstorage/blobStore-927b523f-f3ff-4ccc-83a0-362e09a3b858 >> 2019-02-14 14:56:54,279 INFO >> org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB >> cache storage directory >> /home/y/var/flink/blobstorage/blobStore-8492465e-0e94-4792-a346-66e6da299f7a >> 2019-02-14 14:56:54,572 DEBUG >> org.apache.flink.runtime.taskmanager.TaskManager - TaskManager >> was triggered to register at JobManager, but is already registered >> 2019-02-14 14:56:56,359 WARN akka.remote.transport.netty.NettyTransport >> - Remote connection to [null] failed with >> java.net.ConnectException: Connection refused: >> openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213 >> <http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213> >> 2019-02-14 14:56:56,360 DEBUG >> org.apache.flink.runtime.taskmanager.TaskManager - The >> association error event's root cause is not of type >> InvalidAssociationException. >> >> >> >> Full Task manage log: >> https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643 >> <https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643> >> >> >> Is this expected or is this a bug? >> >> Thank you! >> >> Ethan >