[I caused nfsd to having things shifted in mmeory some to
see it it tracked content vs. page boundary for where the
zeros stop. Non-nfsd examples omitted.]

> . . .
>> nfsd hit an assert, failing ret == sz_size2index_compute(size)
> 
> [Correction: That should have referenced sz_index2size_lookup(index).]
> 
>> (also, but a different caller of sz_size2index):
> 
> [Correction: The "also" comment should be ignored:
> sz_index2size_lookup(index) is referenced below.]
> 
>> 
>> (gdb) bt
>> #0  thr_kill () at thr_kill.S:4
>> #1  0x502b2170 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
>> #2  0x50211cc0 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
>> #3  0x50206104 in sz_index2size_lookup (index=<optimized out>) at 
>> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> #4  sz_index2size (index=<optimized out>) at 
>> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207
>> #5  ifree (tsd=0x50094018, ptr=0x50041028, tcache=0x50094138, 
>> slow_path=<optimized out>) at jemalloc_jemalloc.c:2583
>> #6  0x50205cac in __je_free_default (ptr=0x50041028) at 
>> jemalloc_jemalloc.c:2784
>> #7  0x50206294 in __free (ptr=0x50041028) at jemalloc_jemalloc.c:2852
>> #8  0x50287ec8 in ns_src_free (src=0x50329004, srclistsize=<optimized out>) 
>> at /usr/src/lib/libc/net/nsdispatch.c:452
>> #9  ns_dbt_free (dbt=0x50329000) at /usr/src/lib/libc/net/nsdispatch.c:436
>> #10 vector_free (vec=0x50329000, count=<optimized out>, esize=12, 
>> free_elem=<optimized out>) at /usr/src/lib/libc/net/nsdispatch.c:253
>> #11 nss_atexit () at /usr/src/lib/libc/net/nsdispatch.c:578
>> #12 0x5028d958 in __cxa_finalize (dso=0x0) at 
>> /usr/src/lib/libc/stdlib/atexit.c:240
>> #13 0x502117f8 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:74
>> #14 0x10013f9c in child_cleanup (signo=<optimized out>) at 
>> /usr/src/usr.sbin/nfsd/nfsd.c:969
>> #15 <signal handler called>
>> #16 0x00000000 in ?? ()
>> 
>> (gdb) up 3
>> #3  0x50206104 in sz_index2size_lookup (index=<optimized out>) at 
>> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
>> 200          assert(ret == sz_index2size_compute(index));
>> 
>> (ret is optimized out.)
>> 
>> 197  JEMALLOC_ALWAYS_INLINE size_t
>> 198  sz_index2size_lookup(szind_t index) {
>> 199          size_t ret = (size_t)sz_index2size_tab[index];
>> 200          assert(ret == sz_index2size_compute(index));
>> 201          return ret;
>> 202  }
> 
> (gdb) print/x __je_sz_index2size_tab
> $3 = {0x0 <repeats 104 times>}
> 
> Also:
> 
> (gdb) x/4x __je_arenas+16368/4
> 0x5030cab0 <__je_arenas+16368>:       0x00000000      0x00000000      
> 0x00000000      0x00000000
> (gdb) print/x __je_arenas_lock                                                
>                                                                  
> $8 = {{{prof_data = {tot_wait_time = {ns = 0x0}, max_wait_time = {ns = 0x0}, 
> n_wait_times = 0x0, n_spin_acquired = 0x0, max_n_thds = 0x0, n_waiting_thds = 
> {repr = 0x0}, n_owner_switches = 0x0, 
>       prev_owner = 0x0, n_lock_ops = 0x0}, lock = 0x0, postponed_next = 0x0, 
> locked = {repr = 0x0}}}, witness = {name = 0x0, rank = 0x0, comp = 0x0, 
> opaque = 0x0, link = {qre_next = 0x0, 
>     qre_prev = 0x0}}, lock_order = 0x0}
> (gdb) print/x __je_narenas_auto
> $9 = 0x0
> (gdb) print/x malloc_conf      
> $10 = 0x0
> (gdb) print/x __je_ncpus 
> $11 = 0x0
> (gdb) print/x __je_manual_arena_base
> $12 = 0x0
> (gdb) print/x __je_sz_pind2sz_tab   
> $13 = {0x0 <repeats 72 times>}
> (gdb) print/x __je_sz_size2index_tab
> $1 = {0x0 <repeats 384 times>, 0x1a, 0x1b <repeats 64 times>, 0x1c <repeats 
> 64 times>}
> 
>> Booting and immediately trying something like:
>> 
>> service nfsd stop
>> 
>> did not lead to a failure. But may be after
>> a while it would and be less drastic than a
>> reboot or power down.
> 
> More detail:
> 
> So, for rpcbind and nfds at some point a large part of
> __je_sz_size2index_tab is being stomped on, as is all of
> __je_sz_index2size_tab and more.
> 
> . . .
> 
> For nfsd, it is similar (again showing the partially
> non-zero live process context instead of the all-zeros
> from the .core file):
> 
> 0x5030cab0 <__je_arenas+16368>:       0x00000000      0x00000000      
> 0x00000000      0x00000009
> 0x5030cac0 <__je_arenas_lock>:        0x00000000      0x00000000      
> 0x00000000      0x00000000
> 0x5030cad0 <__je_arenas_lock+16>:     0x00000000      0x00000000      
> 0x00000000      0x00000000
> 0x5030cae0 <__je_arenas_lock+32>:     0x00000000      0x00000000      
> 0x00000000      0x00000000
> 0x5030caf0 <__je_arenas_lock+48>:     0x00000000      0x00000000      
> 0x00000000      0x00000000
> 0x5030cb00 <__je_arenas_lock+64>:     0x00000000      0x502ff070      
> 0x00000000      0x00000000
> 0x5030cb10 <__je_arenas_lock+80>:     0x500ebb04      0x00000003      
> 0x00000000      0x00000000
> 0x5030cb20 <__je_arenas_lock+96>:     0x5030cb10      0x5030cb10      
> 0x00000000      0x00000000
> 
> Then the memory in the crash continues to be zero until:
> 
> 0x5030d000 <__je_sz_size2index_tab+384>:      0x1a1b1b1b      0x1b1b1b1b      
> 0x1b1b1b1b      0x1b1b1b1b
> 
> Notice the interesting page boundary for where non-zero
> is first available again!
> 
> Between __je_arenas_lock and __je_sz_size2index_tab are:
> 
> 0x5030cb30 __je_narenas_auto
> 0x5030cb38 malloc_conf
> 0x5030cb3c __je_ncpus
> 0x5030cb40 __je_manual_arena_base
> 0x5030cb80 __je_sz_pind2sz_tab
> 0x5030ccc0 __je_sz_index2size_tab
> 0x5030ce80 __je_sz_size2index_tab
> 
> 
> Note: because __je_arenas is normally
> mostly zero for these contexts, I can
> not tell where the memory trashing
> started, only where it replaced non-zero
> values with zeros.
> . . .

I caused the memory content to have shifted some in nfsd.
The resultant zeros-stop-at from the failure look like:

(gdb) x/128x __je_sz_size2index_tab
0x5030cf00 <__je_sz_size2index_tab>:    0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf10 <__je_sz_size2index_tab+16>: 0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf20 <__je_sz_size2index_tab+32>: 0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf30 <__je_sz_size2index_tab+48>: 0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf40 <__je_sz_size2index_tab+64>: 0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf50 <__je_sz_size2index_tab+80>: 0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf60 <__je_sz_size2index_tab+96>: 0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf70 <__je_sz_size2index_tab+112>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf80 <__je_sz_size2index_tab+128>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cf90 <__je_sz_size2index_tab+144>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cfa0 <__je_sz_size2index_tab+160>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cfb0 <__je_sz_size2index_tab+176>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cfc0 <__je_sz_size2index_tab+192>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cfd0 <__je_sz_size2index_tab+208>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cfe0 <__je_sz_size2index_tab+224>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030cff0 <__je_sz_size2index_tab+240>:        0x00000000      0x00000000      
0x00000000      0x00000000
0x5030d000 <__je_sz_size2index_tab+256>:        0x18191919      0x19191919      
0x19191919      0x19191919
0x5030d010 <__je_sz_size2index_tab+272>:        0x19191919      0x19191919      
0x19191919      0x19191919
0x5030d020 <__je_sz_size2index_tab+288>:        0x19191919      0x19191919      
0x19191919      0x19191919
0x5030d030 <__je_sz_size2index_tab+304>:        0x19191919      0x19191919      
0x19191919      0x19191919
0x5030d040 <__je_sz_size2index_tab+320>:        0x191a1a1a      0x1a1a1a1a      
0x1a1a1a1a      0x1a1a1a1a
0x5030d050 <__je_sz_size2index_tab+336>:        0x1a1a1a1a      0x1a1a1a1a      
0x1a1a1a1a      0x1a1a1a1a
0x5030d060 <__je_sz_size2index_tab+352>:        0x1a1a1a1a      0x1a1a1a1a      
0x1a1a1a1a      0x1a1a1a1a
0x5030d070 <__je_sz_size2index_tab+368>:        0x1a1a1a1a      0x1a1a1a1a      
0x1a1a1a1a      0x1a1a1a1a
0x5030d080 <__je_sz_size2index_tab+384>:        0x1a1b1b1b      0x1b1b1b1b      
0x1b1b1b1b      0x1b1b1b1b
0x5030d090 <__je_sz_size2index_tab+400>:        0x1b1b1b1b      0x1b1b1b1b      
0x1b1b1b1b      0x1b1b1b1b
0x5030d0a0 <__je_sz_size2index_tab+416>:        0x1b1b1b1b      0x1b1b1b1b      
0x1b1b1b1b      0x1b1b1b1b
0x5030d0b0 <__je_sz_size2index_tab+432>:        0x1b1b1b1b      0x1b1b1b1b      
0x1b1b1b1b      0x1b1b1b1b
0x5030d0c0 <__je_sz_size2index_tab+448>:        0x1b1c1c1c      0x1c1c1c1c      
0x1c1c1c1c      0x1c1c1c1c
0x5030d0d0 <__je_sz_size2index_tab+464>:        0x1c1c1c1c      0x1c1c1c1c      
0x1c1c1c1c      0x1c1c1c1c
0x5030d0e0 <__je_sz_size2index_tab+480>:        0x1c1c1c1c      0x1c1c1c1c      
0x1c1c1c1c      0x1c1c1c1c
0x5030d0f0 <__je_sz_size2index_tab+496>:        0x1c1c1c1c      0x1c1c1c1c      
0x1c1c1c1c      0x1c1c1c1c

So, it is the page boundary that it tracks, not the detailed
placement of the memory contents.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to