Hello,

I have a standalone flink-1.4.2 cluster with one JobManager, one TaskManager, 
and zookeeper.  I first started JM and TM and waited for them to be stable. 
Then I restarted JM. It’s when the TM got confused.

TM got notified that Leader node has changed and it tried to register to the 
new Leader (the new rpc port is 34561). Then it got the acknowledge says it’s 
already registered. And it then kept trying to associate with the old JM roc 
port (35213) and fail.

2019-02-14 14:56:54,059 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Trying to register at JobManager 
akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager
 (attempt 1, timeout: 500 milliseconds)
2019-02-14 14:56:54,157 DEBUG 
org.apache.flink.shaded.akka.org.jboss.netty.handler.ssl.SslHandler  - [id: 
0x77ac93ae, /10.215.68.243:46796 => 
openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561] HANDSHAKEN: 
TLS_RSA_WITH_AES_128_CBC_SHA
2019-02-14 14:56:54,276 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Successful registration at JobManager 
(akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager),
 starting network stack and library cache.
2019-02-14 14:56:54,276 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
            - Determined BLOB server address to be 
openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100. Starting BLOB cache.
2019-02-14 14:56:54,278 INFO  org.apache.flink.runtime.blob.PermanentBlobCache  
            - Created BLOB cache storage directory 
/home/y/var/flink/blobstorage/blobStore-927b523f-f3ff-4ccc-83a0-362e09a3b858
2019-02-14 14:56:54,279 INFO  org.apache.flink.runtime.blob.TransientBlobCache  
            - Created BLOB cache storage directory 
/home/y/var/flink/blobstorage/blobStore-8492465e-0e94-4792-a346-66e6da299f7a
2019-02-14 14:56:54,572 DEBUG org.apache.flink.runtime.taskmanager.TaskManager  
            - TaskManager was triggered to register at JobManager, but is 
already registered
2019-02-14 14:56:56,359 WARN  akka.remote.transport.netty.NettyTransport        
            - Remote connection to [null] failed with 
java.net.ConnectException: Connection refused: 
openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213
2019-02-14 14:56:56,360 DEBUG org.apache.flink.runtime.taskmanager.TaskManager  
            - The association error event's root cause is not of type 
InvalidAssociationException.



Full Task manage log:  
https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643 
<https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643>


Is this expected or is this a bug? 

Thank you!

Ethan

Reply via email to