Hello,

I checked the backtrace of a crashed dhcpd running on 4.4.1-2.1ubuntu5.

(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:50
  2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait 
(futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52
  3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=<optimized out>, 
processes_to_wake=1, futex_word=<optimized out>) at 
../sysdeps/nptl/futex-internal.h:364
  4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=<optimized 
out>, expected=0, futex_word=0x7fb4de6cd0d0) at 
../sysdeps/nptl/futex-internal.h:183

(gdb) frame 2
#2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
    cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
(gdb) bt
#1 0x00007fb4deaa7859 in __GI_abort () at abort.c:79
#2 0x00007fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 
"../../../../lib/isc/unix/socket.c", line=line@entry=3361, 
type=type@entry=isc_assertiontype_insist,
    cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at 
../../../lib/isc/assertions.c:52
#3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
#4 process_fd (writeable=<optimized out>, readable=<optimized out>, fd=11, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054
#5 process_fds (writefds=<optimized out>, readfds=0x7fb4de6d1090, maxfd=13, 
manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211
#6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397
#7 0x00007fb4dea68609 in start_thread (arg=<optimized out>) at 
pthread_create.c:477
#8 0x00007fb4deba4103 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) frame 3
#3 0x00007fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at 
../../../../lib/isc/unix/socket.c:4041
4041 in ../../../../lib/isc/unix/socket.c
(gdb) p sock->pending_send
$2 = 1

The code is crashing on this assertion: 
https://gitlab.isc.org/isc-projects/bind9/-/blob/v9_11_3/lib/isc/unix/socket.c#L3364
This was already reported and marked as fixed in debian (?) via [0]

""Now if a wakeup event occurres the socket would be dispatched for
processing regardless which kind of event (timer?) triggered the wakeup.
At least I did not find any sanity checks in process_fds() except
SOCK_DEAD(sock).

This leads to the following situation: The sock is not dead yet but it
is still pending when it is dispatched again.

I would now check sock->pending_send before calling dispatch_send().This
 would at least prevent the assertion failure - well knowing that the
situation described above ( not dead but still pending and alerting ) is
not a very pleasant one - until someone comes up with a better solution.
"""

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430065#20

** Follow up questions:

0) The reproducer doesn't seems consistent and seems to be related to a race
condition associated with a internal timer/futex.
1) Can anyone confirm that a pristine upstream 4.4.1 doesn't reproduces the 
issue?

** Bug watch added: Debian Bug tracker #430065
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430065

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu.
https://bugs.launchpad.net/bugs/1872118

Title:
  DHCP Cluster crashes after a few hours

Status in DHCP:
  New
Status in bind9-libs package in Ubuntu:
  New
Status in isc-dhcp package in Ubuntu:
  Confirmed
Status in bind9-libs source package in Focal:
  New
Status in isc-dhcp source package in Focal:
  New
Status in bind9-libs source package in Groovy:
  New
Status in isc-dhcp source package in Groovy:
  Confirmed

Bug description:
  
  I have a pair of DHCP serevrs running in a cluster on ubuntu 20.04, All 
worked perfectly until recently, when they started stopping with code=killed, 
status=6/ABRT.
  This is being fixed by 

  https://bugs.launchpad.net/bugs/1870729

  However now one stops after a few hours with the following errors. One
  can stay on line but not both.


  
  Syslog shows 
  Apr 10 17:20:15 dhcp-primary sh[6828]: 
../../../../lib/isc/unix/socket.c:3361: INSIST(!sock->pending_send) failed, 
back trace
  Apr 10 17:20:15 dhcp-primary sh[6828]: #0 0x7fbe78702a4a in ??
  Apr 10 17:20:15 dhcp-primary sh[6828]: #1 0x7fbe78702980 in ??
  Apr 10 17:20:15 dhcp-primary sh[6828]: #2 0x7fbe7873e7e1 in ??
  Apr 10 17:20:15 dhcp-primary sh[6828]: #3 0x7fbe784e5609 in ??
  Apr 10 17:20:15 dhcp-primary sh[6828]: #4 0x7fbe78621103 in ??

  
  nothing in kern.log

  
  apport.log shows
  ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: called for pid 6828, 
signal 6, core limit 0, dump mode 2
  ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: not creating core for pid 
with dump mode of 2
  ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: executable: 
/usr/sbin/dhcpd (command line "dhcpd -user dhcpd -group dhcpd -f -4 -pf 
/run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf")
  ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: is_closing_session(): no 
DBUS_SESSION_BUS_ADDRESS in environment
  ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: wrote report 
/var/crash/_usr_sbin_dhcpd.0.crash


  /var/crash/_usr_sbin_dhcpd.0.crash shows

  ProblemType: Crash
  Architecture: amd64
  CrashCounter: 1
  Date: Fri Apr 10 17:20:15 2020
  DistroRelease: Ubuntu 20.04
  ExecutablePath: /usr/sbin/dhcpd
  ExecutableTimestamp: 1586210315
  ProcCmdline: dhcpd -user dhcpd -group dhcpd -f -4 -pf 
/run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf
  ProcEnviron: Error: [Errno 13] Permission denied: 'environ'
  ProcMaps: Error: [Errno 13] Permission denied: 'maps'
  ProcStatus:
   Name:  dhcpd
   Umask: 0022
   State: D (disk sleep)
   Tgid:  6828
   Ngid:  0
   Pid:   6828
   PPid:  1
   TracerPid:     0
   Uid:   113     113     113     113
   Gid:   118     118     118     118
   FDSize:        128
   Groups:
   NStgid:        6828
   NSpid: 6828
   NSpgid:        6828
   NSsid: 6828
   VmPeak:          236244 kB
   VmSize:          170764 kB
   VmLck:        0 kB
   VmPin:        0 kB
   VmHWM:    12064 kB
   VmRSS:    12064 kB
   RssAnon:           5940 kB
   RssFile:           6124 kB
   RssShmem:             0 kB
   VmData:           30792 kB
   VmStk:      132 kB
   VmExe:      592 kB
   VmLib:     5424 kB
   VmPTE:       76 kB
   VmSwap:               0 kB
   HugetlbPages:         0 kB
   CoreDumping:   1
   THP_enabled:   1
   Threads:       4
   SigQ:  0/7609
   SigPnd:        0000000000000000
   ShdPnd:        0000000000000000
   SigBlk:        0000000000000000
   SigIgn:        0000000000001000
   SigCgt:        0000000180000000
   CapInh:        0000000000000000
   CapPrm:        0000000000000000
   CapEff:        0000000000000000
   CapBnd:        0000003fffffffff
   CapAmb:        0000000000000000
   NoNewPrivs:    0
   Seccomp:       0
   Speculation_Store_Bypass:      thread vulnerable
   Cpus_allowed:  3
   Cpus_allowed_list:     0-1
   Mems_allowed:  
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000
  0000,00000000,00000000,00000001
   Mems_allowed_list:     0
   voluntary_ctxt_switches:       111
   nonvoluntary_ctxt_switches:    144
  Signal: 6
  Uname: Linux 5.4.0-21-generic x86_64
  UserGroups:

To manage notifications about this bug go to:
https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to