On 18/11/2019 21:31, Jean-Francois Malouin wrote:
Hi,

Maybe not directly a pacemaker question but maybe some of you have seen this
problem:

A 2 node pacemaker cluster running corosync-3.0.1 with dual communication ring
sometimes reports errors like this in the corosync log file:

[KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366
[KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366
[KNET  ] pmtud: Global data MTU changed to: 1366
[CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at 
run-time
[CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at 
run-time

Those do not happen very frequenly, once a week or so...


Those messages are caused by a config file reload (corosync-cfgtool -R) being triggered by something. If they're happening once a week then check your cron jobs.

However the system log on the nodes reports those much more frequently, a few
times a day:

Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] link: host: 2 link: 1 is down
Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) best 
link: 0 (pri: 0)
Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] rx: host: 2 link: 1 is up
Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) best 
link: 1 (pri: 1)


Those don't look good. having a link down for 6 seconds looks like a serious network outage that needs looking into, especially if they are that frequent, or it could be a bug. You don't say which version of libknet you have installed but make sure it's the latest one.

The fencing event in your other message was caused because both links were down at the same time, which is a worrying co-incidence. Changing the token timeout won't make any difference to the knet link events, but if the knet links are down for long enough then that will trigger a token timeout and a fence event.

Definitely look for something odd in your networking - the corosync.conf file looks sane (though having knet_transport in the top-level totem stanza is doing nothing), so it's not that.

It's hard to make a judgement with just that info, but look for dropped packets on the interfaces, slow response to other network services or very high load on one of the nodes. If you can't see anything on the systems then enable debug logging and get back to us. If it is a bug we want it fixed!

Chrissie


Are those to be dismissed or are they indicative of a network misconfig/problem?
I tried setting 'knet_transport: udpu' in the totem section (the default value)
but it didn't seem to make a difference...Hard coding netmtu to 1500 and
allowing for longer (10s) token timeout also didn't seem to affect the issue.


Corosync config follows:

/etc/corosync/corosync.conf

totem {
     version: 2
     cluster_name: bicha
     transport: knet
     link_mode: passive
     ip_version: ipv4
     token: 10000
     netmtu: 1500
     knet_transport: sctp
     crypto_model: openssl
     crypto_hash: sha256
     crypto_cipher: aes256
     keyfile: /etc/corosync/authkey
     interface {
         linknumber: 0
         knet_transport: udp
         knet_link_priority: 0
     }
     interface {
         linknumber: 1
         knet_transport: udp
         knet_link_priority: 1
     }
}
quorum {
     provider: corosync_votequorum
     two_node: 1
#    expected_votes: 2
}
nodelist {
     node {
         ring0_addr: xxx.xxx.xxx.xxx
         ring1_addr: zzz.zzz.zzz.zzx
         name: node1
         nodeid: 1
     }
     node {
         ring0_addr: xxx.xxx.xxx.xxy
         ring1_addr: zzz.zzz.zzz.zzy
         name: node2
         nodeid: 2
     }
}
logging {
     to_logfile: yes
     to_syslog: yes
     logfile: /var/log/corosync/corosync.log
     syslog_facility: daemon
     debug: off
     timestamp: on
     logger_subsys {
         subsys: QUORUM
         debug: off
     }
}
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to