Based on your description, you have two clusters: - old cluster B, with mesos 0.25, and the master ip is 10.88.169.195 - new cluster A, with mesos 0.22, and the master ip is 10.90.12.29
Also you have a slave S, 10.90.5.19, which was originally in cluster B, and you have reconfigured it to join cluster A, but forgot to cleanup the slave work dir. >From the logs, S is now registered with cluster A (which is what you intended), but S is still shown in the slaves list of cluster B (which is confusing), and the master of cluster B is still sending messages to S: ``` W0105 19:05:38.207882 6450 slave.cpp:1973] Ignoring shutdown framework message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0116 from master@10.90.12.29:5050 because it is not from the registered master ( master@10.88.169.195:5050) ``` What's in the master logs of cluster A and B? That could help others understand the problem. On Fri, Jan 15, 2016 at 12:27 PM, X Brick <ngdoc...@gmail.com> wrote: > sorry for the wrong api response of cluster A > > { >> "active": true, >> "attributes": { >> "apps": "logstash", >> "colo": "cn5", >> "type": "prod" >> }, >> "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com", >> "id": "20151230-034049-3282655242-5050-1802-S7", >> "pid": "slave(1)@10.90.5.19:5051", >> "registered_time": 1452094227.39161, >> "reregistered_time": 1452831994.32924, >> "resources": { >> "cpus": 32, >> "disk": 2728919, >> "mem": 128126, >> "ports": "[8100-10000, 31000-32000]" >> } >> } >> > > 2016-01-15 12:22 GMT+08:00 X Brick <ngdoc...@gmail.com>: > >> Hi folks, >> >> I meet a very strange issue when I migrated two nodes from one cluster to >> another about one week ago. >> >> Two nodes: >> >> l-bu128g3-10k10.ops.cn2 >> l-bu128g5-10k10.ops.cn2 >> >> I did not clean the mesos data dir before they join the another cluster, >> then I found the nodes live in two cluster at the same time. >> >> Cluster A (Mesos 0.22): >> >> >> Cluster B (Mesos 0.25): >> >> >> >> >> I thought maybe the old data make these happened, so I clear up these two >> nodes data dir and rejoin the cluster A. But nothing changed, they still >> come back to the old cluster(Cluster B). >> >> >> Here is the "/master/slaves" response: >> >> Cluster A: >> >> { >>> "slaves": [ >>> { >>> "active": true, >>> "attributes": { >>> "apps": "logstash", >>> "colo": "cn5", >>> "type": "prod" >>> }, >>> "hostname": "l-bu128g9-10k10.ops.cn2.qunar.com", >>> "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S5", >>> "pid": "slave(1)@10.90.5.23:5051", >>> "registered_time": 1451990379.49813, >>> "reregistered_time": 1452093251.39516, >>> "resources": { >>> "cpus": 32, >>> "disk": 2728919, >>> "mem": 128126, >>> "ports": "[8100-10000, 31000-32000]" >>> } >>> }, >>> >>> >> Cluster B: >> >> { >>> "slaves": [ >>> { >>> "active": false, >>> "attributes": { >>> "apps": "logstash", >>> "colo": "cn5", >>> "type": "prod" >>> }, >>> "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com", >>> "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2", >>> "offered_resources": { >>> "cpus": 0, >>> "disk": 0, >>> "mem": 0 >>> }, >>> "pid": "slave(1)@10.90.5.19:5051", >>> "registered_time": 1451988622.66323, >>> "reserved_resources": {}, >>> "resources": { >>> "cpus": 32.0, >>> "disk": 2728919.0, >>> "mem": 128126.0, >>> "ports": "[8100-10000, 31000-32000]" >>> }, >>> "unreserved_resources": { >>> "cpus": 32.0, >>> "disk": 2728919.0, >>> "mem": 128126.0, >>> "ports": "[8100-10000, 31000-32000]" >>> }, >>> "used_resources": { >>> "cpus": 0, >>> "disk": 0, >>> "mem": 0 >>> } >>> }, >>> ..... >>> >>> >> >> I found some useful logs: >> >> >>> I0105 18:36:22.683724 6452 slave.cpp:2248] Updated checkpointed >>> resources from to >>> I0105 18:37:09.900497 6459 slave.cpp:3926] Current disk usage 0.06%. Max >>> allowed age: 1.798706758587755days >>> I0105 18:37:22.678374 6453 slave.cpp:3146] Master marked the slave as >>> disconnected but the slave considers itself registered! Forcing >>> re-registration. >>> I0105 18:37:22.678699 6453 slave.cpp:694] Re-detecting master >>> I0105 18:37:22.678715 6471 status_update_manager.cpp:176] Pausing >>> sending status updates >>> I0105 18:37:22.678753 6453 slave.cpp:741] Detecting new master >>> I0105 18:37:22.678977 6456 status_update_manager.cpp:176] Pausing >>> sending status updates >>> I0105 18:37:22.679047 6455 slave.cpp:705] New master detected at >>> master@10.88.169.195:5050 >>> I0105 18:37:22.679108 6455 slave.cpp:768] Authenticating with master >>> master@10.88.169.195:5050 >>> I0105 18:37:22.679136 6455 slave.cpp:773] Using default CRAM-MD5 >>> authenticatee >>> I0105 18:37:22.679239 6455 slave.cpp:741] Detecting new master >>> I0105 18:37:22.679354 6464 authenticatee.cpp:115] Creating new client >>> SASL connection >>> I0105 18:37:22.680883 6461 authenticatee.cpp:206] Received SASL >>> authentication mechanisms: CRAM-MD5 >>> I0105 18:37:22.680946 6461 authenticatee.cpp:232] Attempting to >>> authenticate with mechanism 'CRAM-MD5' >>> I0105 18:37:22.681759 6455 authenticatee.cpp:252] Received SASL >>> authentication step >>> I0105 18:37:22.682874 6454 authenticatee.cpp:292] Authentication success >>> I0105 18:37:22.682986 6441 slave.cpp:836] Successfully authenticated >>> with master master@10.88.169.195:5050 >>> I0105 18:37:22.684303 6454 slave.cpp:980] Re-registered with master >>> master@10.88.169.195:5050 >>> I0105 18:37:22.684455 6454 slave.cpp:1016] Forwarding total >>> oversubscribed resources >>> I0105 18:37:22.684471 6468 status_update_manager.cpp:183] Resuming >>> sending status updates >>> I0105 18:37:22.684649 6454 slave.cpp:2152] Updating framework >>> 20150610-204949-3299432458-5050-25057-0000 pid to >>> scheduler-1bef8172-5068-44c6-93f5-e97a3910ed79@10.88.169.195:35708 >>> I0105 18:37:22.685025 6452 status_update_manager.cpp:183] Resuming >>> sending status updates >>> I0105 18:37:22.685117 6454 slave.cpp:2248] Updated checkpointed >>> resources from to >>> I0105 18:38:09.901587 6464 slave.cpp:3926] Current disk usage 0.06%. Max >>> allowed age: 1.798706755730266days >>> I0105 18:38:22.679468 6451 slave.cpp:3146] Master marked the slave as >>> disconnected but the slave considers itself registered! Forcing >>> re-registration. >>> I0105 18:38:22.679739 6451 slave.cpp:694] Re-detecting master >>> I0105 18:38:22.679754 6453 status_update_manager.cpp:176] Pausing >>> sending status updates >>> I0105 18:38:22.679785 6451 slave.cpp:741] Detecting new master >>> I0105 18:38:22.680054 6461 slave.cpp:705] New master detected at >>> master@10.88.169.195:5050 >>> I0105 18:38:22.680106 6470 status_update_manager.cpp:176] Pausing >>> sending status updates >>> I0105 18:38:22.680107 6461 slave.cpp:768] Authenticating with master >>> master@10.88.169.195:5050 >>> I0105 18:38:22.680197 6461 slave.cpp:773] Using default CRAM-MD5 >>> authenticatee >>> I0105 18:38:22.680271 6461 slave.cpp:741] Detecting new master >>> >>> ................. >>> >>> W0105 19:05:38.207882 6450 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0116 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 09:12:38.666767 6468 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0002 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 12:13:35.782218 6441 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0117 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 12:23:22.348956 6444 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0118 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 12:35:36.660111 6443 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0119 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 12:40:43.735994 6461 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0121 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 12:42:09.539126 6456 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0120 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 12:52:40.787961 6465 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0122 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 12:58:10.425287 6461 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0123 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 13:03:32.236495 6456 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0125 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 13:10:58.501510 6472 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0126 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 13:16:04.233232 6460 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0127 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 14:17:24.198786 6472 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0115 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 14:18:57.036814 6464 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0005 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 14:36:19.755764 6460 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0112 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> W0106 14:46:54.420217 6462 slave.cpp:1973] Ignoring shutdown framework >>> message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0129 from >>> master@10.90.12.29:5050 because it is not from the registered master ( >>> master@10.88.169.195:5050) >>> >>> >> >> Did you meet this issue before ? >> > >