I also can confirm that this is:
3.13.0-123-generic: /sys/kernel/config/cluster/ocfs2/heartbeat/dead_threshold
4.4.0-81-generic: /sys/kernel/config/cluster/ocfs2/heartbeat/threshold

So where does the change come from and what needs to adapt (kernel or ocfs 
tools).
It was reported that upstream in the kernel this still would be dead_threshold.

I compared:
Trusty: git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git
Xenial: git://kernel.ubuntu.com/ubuntu/ubuntu-xenial.git
Upstream: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

I found that on that level Upstream agrees to have dead_threshold just as 
trusty does.
So why is Xenial off of that - I found an upstream commit [1] that broke it 
upstream.
That broken state is in since 4.4 and fixed about now in 4.12 with commit [2].

There also is stable kernel activity on this at [3] for 4.11, [4] for
4.9 and [5] for 4.4

Given that pre-analysis I think it is the kernel team that has/want to look to 
include updates.
I hope my analysis helps to do so, and I have reassigned by adapting the bug 
tasks accordingly.

[1]: https://github.com/torvalds/linux/commit/45b997737a80
[2]: https://github.com/torvalds/linux/commit/33496c3c3d7b
[3]: https://www.spinics.net/lists/stable/msg179361.html
[4]: https://www.spinics.net/lists/stable/msg179433.html
[5]: https://www.spinics.net/lists/stable/msg179582.html

** Package changed: ocfs2-tools (Ubuntu) => linux (Ubuntu)

** Changed in: linux (Ubuntu)
       Status: Triaged => New

** Tags removed: server-next

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to ocfs2-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1614038

Title:
  o2cb configuration options ignored in 16.04

Status in linux package in Ubuntu:
  New

Bug description:
  We've been trying to add a 16.04 node (ocfs2-tools 1.6.4-3.1) to our
  existing OCFS2 filesystem based on Ubuntu 13.04 (ocfs2-tools
  1.6.4-2ubuntu1) and , Ubuntu 14.04 (ocfs2-tool 1.6.4-3ubuntu1)

   * Node1: Ubuntu 16.04, Slot 1, 10.22.44.21
   * Node2: Ubuntu 13.04, Slot 2, 10.22.44.22
   * Node3: Ubuntu 14.04, Slot 6, 10.22.44.23
   * Node4: Ubuntu 14.04, Slot 7, 10.22.44.24

  
  The exiting system has a O2CB_HEARTBEAT_THRESHOLD=61 setting, but when adding 
the new system these tweaks seem to be ignored.  Here's the syslog section:

  
  Aug 16 15:58:02 node1 kernel: [  936.294820] 
(o2hb-37AAEB0304,5741,7):o2hb_check_slot:895 ERROR: Node 2 on device sdc has a 
dead count of 122000 ms, but our count is 62000 ms.
  Aug 16 15:58:02 node1 kernel: [  936.294820] Please double check your 
configuration values for 'O2CB_HEARTBEAT_THRESHOLD'
  Aug 16 15:58:02 node1 kernel: [  936.294949] 
(o2hb-37AAEB0304,5741,7):o2hb_check_slot:895 ERROR: Node 6 on device sdc has a 
dead count of 122000 ms, but our count is 62000 ms.
  Aug 16 15:58:02 node1 kernel: [  936.294949] Please double check your 
configuration values for 'O2CB_HEARTBEAT_THRESHOLD'
  Aug 16 15:58:02 node1 kernel: [  936.295071] 
(o2hb-37AAEB0304,5741,7):o2hb_check_slot:895 ERROR: Node 7 on device sdc has a 
dead count of 122000 ms, but our count is 62000 ms.
  Aug 16 15:58:02 node1 kernel: [  936.295071] Please double check your 
configuration values for 'O2CB_HEARTBEAT_THRESHOLD'
  Aug 16 15:58:03 node1 kernel: [  937.123350] o2net: node node3 (num 6) at 
10.22.44.23:7777 uses a heartbeat timeout of 120000 ms, but we use 60000 ms 
locally. Disconnecting.
  Aug 16 15:58:03 node1 kernel: [  937.393608] o2net: node node2 (num 2) at 
10.22.44.22:7777 uses a heartbeat timeout of 120000 ms, but we use 60000 ms 
locally. Disconnecting.
  Aug 16 15:58:04 node1 kernel: [  938.055983] o2net: node node4 (num 7) at 
10.22.44.24:7777 uses a heartbeat timeout of 120000 ms, but we use 60000 ms 
locally. Disconnecting.
  Aug 16 15:58:29 node1 kernel: [  963.213554] o2net: node node3 (num 6) at 
10.22.44.23:7777 uses a heartbeat timeout of 120000 ms, but we use 60000 ms 
locally. Disconnecting.
  Aug 16 15:58:30 node1 kernel: [  964.057995] o2net: node node4 (num 7) at 
10.22.44.24:7777 uses a heartbeat timeout of 120000 ms, but we use 60000 ms 
locally. Disconnecting.
  Aug 16 15:58:32 node1 kernel: [  966.404380] o2net: No connection established 
with node 2 after 30.0 seconds, check network and cluster configuration.
  Aug 16 15:58:32 node1 kernel: [  966.404390] o2net: No connection established 
with node 6 after 30.0 seconds, check network and cluster configuration.
  Aug 16 15:58:32 node1 kernel: [  966.404393] o2net: No connection established 
with node 7 after 30.0 seconds, check network and cluster configuration.
  Aug 16 15:58:59 node1 kernel: [  993.296012] o2net: node node3 (num 6) at 
10.22.44.23:7777 uses a heartbeat timeout of 120000 ms, but we use 60000 ms 
locally. Disconnecting.
  Aug 16 15:59:00 node1 kernel: [  994.060435] o2net: node node4 (num 7) at 
10.22.44.24:7777 uses a heartbeat timeout of 120000 ms, but we use 60000 ms 
locally. Disconnecting.
  Aug 16 15:59:02 node1 kernel: [  996.486396] o2net: No connection established 
with node 2 after 30.0 seconds, check network and cluster configuration.
  Aug 16 15:59:02 node1 kernel: [  996.486405] o2net: No connection established 
with node 6 after 30.0 seconds, check network and cluster configuration.
  Aug 16 15:59:02 node1 kernel: [  996.486409] o2net: No connection established 
with node 7 after 30.0 seconds, check network and cluster configuration.
  Aug 16 15:59:05 node1 kernel: [  999.582560] o2cb: This node could not 
connect to nodes: 2 6 7.
  Aug 16 15:59:05 node1 kernel: [  999.582607] o2cb: Cluster check failed. Fix 
errors before retrying.
  Aug 16 15:59:05 node1 kernel: [  999.582647] 
(mount.ocfs2,5740,1):ocfs2_dlm_init:3025 ERROR: status = -107
  Aug 16 15:59:05 node1 kernel: [  999.582814] 
(mount.ocfs2,5740,1):ocfs2_mount_volume:1863 ERROR: status = -107
  Aug 16 15:59:05 node1 kernel: [  999.582895] ocfs2: Unmounting device (8,32) 
on (node 0)
  Aug 16 15:59:05 node1 kernel: [  999.582905] 
(mount.ocfs2,5740,1):ocfs2_fill_super:1219 ERROR: status = -107

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1614038/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

Reply via email to