** Description changed:

+ [Description]
  
- I have a pair of DHCP serevrs running in a cluster on ubuntu 20.04, All 
worked perfectly until recently, when they started stopping with code=killed, 
status=6/ABRT.
- This is being fixed by 
+ isc-dhcp-server uses libisc-export (coming from bin9-libs package) for 
handling the socket event(s) when configured in peer mode (master/secondary). 
It's possible that a sequence of messages dispatched by the master that 
requires acknowledgment from its peers holds a socket
+ in a pending to send state, a timer or a subsequent write request can be 
scheduled into this socket and the !sock->pending_send assertion
+ will be raised when trying to write again at the time data hasn't been 
flushed entirely and the pending_send flag hasn't been reset to 0 state.
  
- https://bugs.launchpad.net/bugs/1870729
+ If this race condition happens, the following stacktrace will be
+ hit:
  
- However now one stops after a few hours with the following errors. One
- can stay on line but not both.
+ (gdb) info threads
+   Id Target Id Frame
+ * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:50
+   2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait 
(futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52
+   3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, 
processes_to_wake=1, futex_word=<optimized out>) at 
../sysdeps/nptl/futex-internal.h:364
+   4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable 
(private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at 
../sysdeps/nptl/futex-internal.h:183
+ 
+ (gdb) frame 2
+ #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
+     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
+ (gdb) bt
+ #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79
+ #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
+     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
+ #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
+ #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054
+ #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211
+ #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397
+ #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
+ #8 0x00007fb4deba4103 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
+ 
+ (gdb) frame 3
+ #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
+ 4041 in ../../../../lib/isc/unix/socket.c
+ (gdb) p sock->pending_send
+ $2 = 1
+ 
+ [TEST CASE]
+ 
+ 1) Install isc-dhcp-server in 2 focal machine(s).
+ 2) Configure peer/cluster mode as follows:
+    Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/
+    Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/
+ 2) Run dhcpd as follows in both machine(s)
+ 
+ # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4
+ 
+ 3) Leave the cluster running for a long (2h) period until the crash/race
+ condition is reproduced.
  
  
+ [REGRESSION POTENTIAL]
  
- Syslog shows 
- Apr 10 17:20:15 dhcp-primary sh[6828]: 
../../../../lib/isc/unix/socket.c:3361: INSIST(!sock->pending_send) failed, 
back trace
- Apr 10 17:20:15 dhcp-primary sh[6828]: #0 0x7fbe78702a4a in ??
- Apr 10 17:20:15 dhcp-primary sh[6828]: #1 0x7fbe78702980 in ??
- Apr 10 17:20:15 dhcp-primary sh[6828]: #2 0x7fbe7873e7e1 in ??
- Apr 10 17:20:15 dhcp-primary sh[6828]: #3 0x7fbe784e5609 in ??
- Apr 10 17:20:15 dhcp-primary sh[6828]: #4 0x7fbe78621103 in ??
- 
- 
- nothing in kern.log
- 
- 
- apport.log shows
- ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: called for pid 6828, 
signal 6, core limit 0, dump mode 2
- ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: not creating core for pid 
with dump mode of 2
- ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: executable: 
/usr/sbin/dhcpd (command line "dhcpd -user dhcpd -group dhcpd -f -4 -pf 
/run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf")
- ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: is_closing_session(): no 
DBUS_SESSION_BUS_ADDRESS in environment
- ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: wrote report 
/var/crash/_usr_sbin_dhcpd.0.crash
- 
- 
- /var/crash/_usr_sbin_dhcpd.0.crash shows
- 
- ProblemType: Crash
- Architecture: amd64
- CrashCounter: 1
- Date: Fri Apr 10 17:20:15 2020
- DistroRelease: Ubuntu 20.04
- ExecutablePath: /usr/sbin/dhcpd
- ExecutableTimestamp: 1586210315
- ProcCmdline: dhcpd -user dhcpd -group dhcpd -f -4 -pf 
/run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf
- ProcEnviron: Error: [Errno 13] Permission denied: 'environ'
- ProcMaps: Error: [Errno 13] Permission denied: 'maps'
- ProcStatus:
-  Name:  dhcpd
-  Umask: 0022
-  State: D (disk sleep)
-  Tgid:  6828
-  Ngid:  0
-  Pid:   6828
-  PPid:  1
-  TracerPid:     0
-  Uid:   113     113     113     113
-  Gid:   118     118     118     118
-  FDSize:        128
-  Groups:
-  NStgid:        6828
-  NSpid: 6828
-  NSpgid:        6828
-  NSsid: 6828
-  VmPeak:          236244 kB
-  VmSize:          170764 kB
-  VmLck:        0 kB
-  VmPin:        0 kB
-  VmHWM:    12064 kB
-  VmRSS:    12064 kB
-  RssAnon:           5940 kB
-  RssFile:           6124 kB
-  RssShmem:             0 kB
-  VmData:           30792 kB
-  VmStk:      132 kB
-  VmExe:      592 kB
-  VmLib:     5424 kB
-  VmPTE:       76 kB
-  VmSwap:               0 kB
-  HugetlbPages:         0 kB
-  CoreDumping:   1
-  THP_enabled:   1
-  Threads:       4
-  SigQ:  0/7609
-  SigPnd:        0000000000000000
-  ShdPnd:        0000000000000000
-  SigBlk:        0000000000000000
-  SigIgn:        0000000000001000
-  SigCgt:        0000000180000000
-  CapInh:        0000000000000000
-  CapPrm:        0000000000000000
-  CapEff:        0000000000000000
-  CapBnd:        0000003fffffffff
-  CapAmb:        0000000000000000
-  NoNewPrivs:    0
-  Seccomp:       0
-  Speculation_Store_Bypass:      thread vulnerable
-  Cpus_allowed:  3
-  Cpus_allowed_list:     0-1
-  Mems_allowed:  
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000
- 0000,00000000,00000000,00000001
-  Mems_allowed_list:     0
-  voluntary_ctxt_switches:       111
-  nonvoluntary_ctxt_switches:    144
- Signal: 6
- Uname: Linux 5.4.0-21-generic x86_64
- UserGroups:
+ * The fix will prevent the assertion to happen in the dispatch_send
+ path, later versions of isch-dhcp upstream lack this logic and entirely 
removed the existence of this flag. Therefore, removing the need for this
+ assertion at process_fd shouldn't be problematic.

** Description changed:

  [Description]
  
- isc-dhcp-server uses libisc-export (coming from bin9-libs package) for 
handling the socket event(s) when configured in peer mode (master/secondary). 
It's possible that a sequence of messages dispatched by the master that 
requires acknowledgment from its peers holds a socket
+ isc-dhcp-server uses libisc-export (coming from bind9-libs package) for 
handling the socket event(s) when configured in peer mode (master/secondary). 
It's possible that a sequence of messages dispatched by the master that 
requires acknowledgment from its peers holds a socket
  in a pending to send state, a timer or a subsequent write request can be 
scheduled into this socket and the !sock->pending_send assertion
  will be raised when trying to write again at the time data hasn't been 
flushed entirely and the pending_send flag hasn't been reset to 0 state.
  
  If this race condition happens, the following stacktrace will be
  hit:
  
  (gdb) info threads
-   Id Target Id Frame
+   Id Target Id Frame
  * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:50
-   2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait 
(futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52
-   3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, 
processes_to_wake=1, futex_word=<optimized out>) at 
../sysdeps/nptl/futex-internal.h:364
-   4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable 
(private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at 
../sysdeps/nptl/futex-internal.h:183
+   2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait 
(futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52
+   3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, 
processes_to_wake=1, futex_word=<optimized out>) at 
../sysdeps/nptl/futex-internal.h:364
+   4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable 
(private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at 
../sysdeps/nptl/futex-internal.h:183
  
  (gdb) frame 2
  #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
-     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
+     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
  (gdb) bt
  #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79
  #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
-     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
+     cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
  #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
  #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054
  #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211
  #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397
  #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
  #8 0x00007fb4deba4103 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  
  (gdb) frame 3
  #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
  4041 in ../../../../lib/isc/unix/socket.c
  (gdb) p sock->pending_send
  $2 = 1
  
  [TEST CASE]
  
  1) Install isc-dhcp-server in 2 focal machine(s).
  2) Configure peer/cluster mode as follows:
-    Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/
-    Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/
+    Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/
+    Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/
  2) Run dhcpd as follows in both machine(s)
  
  # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4
  
  3) Leave the cluster running for a long (2h) period until the crash/race
  condition is reproduced.
  
- 
  [REGRESSION POTENTIAL]
  
  * The fix will prevent the assertion to happen in the dispatch_send
  path, later versions of isch-dhcp upstream lack this logic and entirely 
removed the existence of this flag. Therefore, removing the need for this
  assertion at process_fd shouldn't be problematic.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1872118

Title:
  [SRU] DHCP Cluster crashes after a few hours

Status in DHCP:
  New
Status in bind9-libs package in Ubuntu:
  In Progress
Status in isc-dhcp package in Ubuntu:
  In Progress
Status in bind9-libs source package in Focal:
  In Progress
Status in isc-dhcp source package in Focal:
  In Progress
Status in bind9-libs source package in Groovy:
  In Progress
Status in isc-dhcp source package in Groovy:
  In Progress

Bug description:
  [Description]

  isc-dhcp-server uses libisc-export (coming from bind9-libs package) for 
handling the socket event(s) when configured in peer mode (master/secondary). 
It's possible that a sequence of messages dispatched by the master that 
requires acknowledgment from its peers holds a socket
  in a pending to send state, a timer or a subsequent write request can be 
scheduled into this socket and the !sock->pending_send assertion
  will be raised when trying to write again at the time data hasn't been 
flushed entirely and the pending_send flag hasn't been reset to 0 state.

  If this race condition happens, the following stacktrace will be
  hit:

  (gdb) info threads
    Id Target Id Frame
  * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:50
    2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait 
(futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52
    3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, 
processes_to_wake=1, futex_word=<optimized out>) at 
../sysdeps/nptl/futex-internal.h:364
    4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable 
(private=<optimized out>, expected=0, futex_word=0x7fb4de6cd0d0) at 
../sysdeps/nptl/futex-internal.h:183

  (gdb) frame 2
  #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
      cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
  (gdb) bt
  #1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79
  #2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
      cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
  #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
  #4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054
  #5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211
  #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397
  #7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
  #8 0x00007fb4deba4103 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

  (gdb) frame 3
  #3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
  4041 in ../../../../lib/isc/unix/socket.c
  (gdb) p sock->pending_send
  $2 = 1

  [TEST CASE]

  1) Install isc-dhcp-server in 2 focal machine(s).
  2) Configure peer/cluster mode as follows:
     Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/
     Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/
  2) Run dhcpd as follows in both machine(s)

  # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4

  3) Leave the cluster running for a long (2h) period until the
  crash/race condition is reproduced.

  [REGRESSION POTENTIAL]

  * The fix will prevent the assertion to happen in the dispatch_send
  path, later versions of isch-dhcp upstream lack this logic and entirely 
removed the existence of this flag. Therefore, removing the need for this
  assertion at process_fd shouldn't be problematic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to