[I caused nfsd to having things shifted in mmeory some to see it it tracked content vs. page boundary for where the zeros stop. Non-nfsd examples omitted.]
> . . . >> nfsd hit an assert, failing ret == sz_size2index_compute(size) > > [Correction: That should have referenced sz_index2size_lookup(index).] > >> (also, but a different caller of sz_size2index): > > [Correction: The "also" comment should be ignored: > sz_index2size_lookup(index) is referenced below.] > >> >> (gdb) bt >> #0 thr_kill () at thr_kill.S:4 >> #1 0x502b2170 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 >> #2 0x50211cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 >> #3 0x50206104 in sz_index2size_lookup (index=<optimized out>) at >> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200 >> #4 sz_index2size (index=<optimized out>) at >> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207 >> #5 ifree (tsd=0x50094018, ptr=0x50041028, tcache=0x50094138, >> slow_path=<optimized out>) at jemalloc_jemalloc.c:2583 >> #6 0x50205cac in __je_free_default (ptr=0x50041028) at >> jemalloc_jemalloc.c:2784 >> #7 0x50206294 in __free (ptr=0x50041028) at jemalloc_jemalloc.c:2852 >> #8 0x50287ec8 in ns_src_free (src=0x50329004, srclistsize=<optimized out>) >> at /usr/src/lib/libc/net/nsdispatch.c:452 >> #9 ns_dbt_free (dbt=0x50329000) at /usr/src/lib/libc/net/nsdispatch.c:436 >> #10 vector_free (vec=0x50329000, count=<optimized out>, esize=12, >> free_elem=<optimized out>) at /usr/src/lib/libc/net/nsdispatch.c:253 >> #11 nss_atexit () at /usr/src/lib/libc/net/nsdispatch.c:578 >> #12 0x5028d958 in __cxa_finalize (dso=0x0) at >> /usr/src/lib/libc/stdlib/atexit.c:240 >> #13 0x502117f8 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:74 >> #14 0x10013f9c in child_cleanup (signo=<optimized out>) at >> /usr/src/usr.sbin/nfsd/nfsd.c:969 >> #15 <signal handler called> >> #16 0x00000000 in ?? () >> >> (gdb) up 3 >> #3 0x50206104 in sz_index2size_lookup (index=<optimized out>) at >> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200 >> 200 assert(ret == sz_index2size_compute(index)); >> >> (ret is optimized out.) >> >> 197 JEMALLOC_ALWAYS_INLINE size_t >> 198 sz_index2size_lookup(szind_t index) { >> 199 size_t ret = (size_t)sz_index2size_tab[index]; >> 200 assert(ret == sz_index2size_compute(index)); >> 201 return ret; >> 202 } > > (gdb) print/x __je_sz_index2size_tab > $3 = {0x0 <repeats 104 times>} > > Also: > > (gdb) x/4x __je_arenas+16368/4 > 0x5030cab0 <__je_arenas+16368>: 0x00000000 0x00000000 > 0x00000000 0x00000000 > (gdb) print/x __je_arenas_lock > > $8 = {{{prof_data = {tot_wait_time = {ns = 0x0}, max_wait_time = {ns = 0x0}, > n_wait_times = 0x0, n_spin_acquired = 0x0, max_n_thds = 0x0, n_waiting_thds = > {repr = 0x0}, n_owner_switches = 0x0, > prev_owner = 0x0, n_lock_ops = 0x0}, lock = 0x0, postponed_next = 0x0, > locked = {repr = 0x0}}}, witness = {name = 0x0, rank = 0x0, comp = 0x0, > opaque = 0x0, link = {qre_next = 0x0, > qre_prev = 0x0}}, lock_order = 0x0} > (gdb) print/x __je_narenas_auto > $9 = 0x0 > (gdb) print/x malloc_conf > $10 = 0x0 > (gdb) print/x __je_ncpus > $11 = 0x0 > (gdb) print/x __je_manual_arena_base > $12 = 0x0 > (gdb) print/x __je_sz_pind2sz_tab > $13 = {0x0 <repeats 72 times>} > (gdb) print/x __je_sz_size2index_tab > $1 = {0x0 <repeats 384 times>, 0x1a, 0x1b <repeats 64 times>, 0x1c <repeats > 64 times>} > >> Booting and immediately trying something like: >> >> service nfsd stop >> >> did not lead to a failure. But may be after >> a while it would and be less drastic than a >> reboot or power down. > > More detail: > > So, for rpcbind and nfds at some point a large part of > __je_sz_size2index_tab is being stomped on, as is all of > __je_sz_index2size_tab and more. > > . . . > > For nfsd, it is similar (again showing the partially > non-zero live process context instead of the all-zeros > from the .core file): > > 0x5030cab0 <__je_arenas+16368>: 0x00000000 0x00000000 > 0x00000000 0x00000009 > 0x5030cac0 <__je_arenas_lock>: 0x00000000 0x00000000 > 0x00000000 0x00000000 > 0x5030cad0 <__je_arenas_lock+16>: 0x00000000 0x00000000 > 0x00000000 0x00000000 > 0x5030cae0 <__je_arenas_lock+32>: 0x00000000 0x00000000 > 0x00000000 0x00000000 > 0x5030caf0 <__je_arenas_lock+48>: 0x00000000 0x00000000 > 0x00000000 0x00000000 > 0x5030cb00 <__je_arenas_lock+64>: 0x00000000 0x502ff070 > 0x00000000 0x00000000 > 0x5030cb10 <__je_arenas_lock+80>: 0x500ebb04 0x00000003 > 0x00000000 0x00000000 > 0x5030cb20 <__je_arenas_lock+96>: 0x5030cb10 0x5030cb10 > 0x00000000 0x00000000 > > Then the memory in the crash continues to be zero until: > > 0x5030d000 <__je_sz_size2index_tab+384>: 0x1a1b1b1b 0x1b1b1b1b > 0x1b1b1b1b 0x1b1b1b1b > > Notice the interesting page boundary for where non-zero > is first available again! > > Between __je_arenas_lock and __je_sz_size2index_tab are: > > 0x5030cb30 __je_narenas_auto > 0x5030cb38 malloc_conf > 0x5030cb3c __je_ncpus > 0x5030cb40 __je_manual_arena_base > 0x5030cb80 __je_sz_pind2sz_tab > 0x5030ccc0 __je_sz_index2size_tab > 0x5030ce80 __je_sz_size2index_tab > > > Note: because __je_arenas is normally > mostly zero for these contexts, I can > not tell where the memory trashing > started, only where it replaced non-zero > values with zeros. > . . . I caused the memory content to have shifted some in nfsd. The resultant zeros-stop-at from the failure look like: (gdb) x/128x __je_sz_size2index_tab 0x5030cf00 <__je_sz_size2index_tab>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf10 <__je_sz_size2index_tab+16>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf20 <__je_sz_size2index_tab+32>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf30 <__je_sz_size2index_tab+48>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf40 <__je_sz_size2index_tab+64>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf50 <__je_sz_size2index_tab+80>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf60 <__je_sz_size2index_tab+96>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf70 <__je_sz_size2index_tab+112>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf80 <__je_sz_size2index_tab+128>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cf90 <__je_sz_size2index_tab+144>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cfa0 <__je_sz_size2index_tab+160>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cfb0 <__je_sz_size2index_tab+176>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cfc0 <__je_sz_size2index_tab+192>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cfd0 <__je_sz_size2index_tab+208>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cfe0 <__je_sz_size2index_tab+224>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030cff0 <__je_sz_size2index_tab+240>: 0x00000000 0x00000000 0x00000000 0x00000000 0x5030d000 <__je_sz_size2index_tab+256>: 0x18191919 0x19191919 0x19191919 0x19191919 0x5030d010 <__je_sz_size2index_tab+272>: 0x19191919 0x19191919 0x19191919 0x19191919 0x5030d020 <__je_sz_size2index_tab+288>: 0x19191919 0x19191919 0x19191919 0x19191919 0x5030d030 <__je_sz_size2index_tab+304>: 0x19191919 0x19191919 0x19191919 0x19191919 0x5030d040 <__je_sz_size2index_tab+320>: 0x191a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x5030d050 <__je_sz_size2index_tab+336>: 0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x5030d060 <__je_sz_size2index_tab+352>: 0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x5030d070 <__je_sz_size2index_tab+368>: 0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x1a1a1a1a 0x5030d080 <__je_sz_size2index_tab+384>: 0x1a1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5030d090 <__je_sz_size2index_tab+400>: 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5030d0a0 <__je_sz_size2index_tab+416>: 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5030d0b0 <__je_sz_size2index_tab+432>: 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x1b1b1b1b 0x5030d0c0 <__je_sz_size2index_tab+448>: 0x1b1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x5030d0d0 <__je_sz_size2index_tab+464>: 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x5030d0e0 <__je_sz_size2index_tab+480>: 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x5030d0f0 <__je_sz_size2index_tab+496>: 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c 0x1c1c1c1c So, it is the page boundary that it tracks, not the detailed placement of the memory contents. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) _______________________________________________ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"