[stress alone is sufficient to have jemalloc asserts fail in stress, no need for a multi-socket G4 either. No need to involve nfsd, mountd, rpcbind or the like. This is not a claim that I know all the problems to be the same, just that a jemalloc reported failure in this simpler context happens and zeroed pages are involved.]
Reminder: head -r360311 based context. First I show a single CPU/core PowerMac G4 context failing in stress. (I actually did this later, but it is the simpler context.) I simply moved the media from the 2-socket G4 to this slower, single-cpu/core one. cpu0: Motorola PowerPC 7400 revision 2.9, 466.42 MHz cpu0: Features 9c000000<PPC32,ALTIVEC,FPU,MMU> cpu0: HID0 8094c0a4<EMCP,DOZE,DPM,EIEC,ICE,DCE,SGE,BTIC,BHT> real memory = 1577857024 (1504 MB) avail memory = 1527508992 (1456 MB) # stress -m 1 --vm-bytes 1792M stress: info: [1024] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)" stress: FAIL: [1024] (415) <-- worker 1025 got signal 6 stress: WARN: [1024] (417) now reaping child worker processes stress: FAIL: [1024] (451) failed run completed in 243s (Note: 1792 is the biggest it allowed with M.) The following still pages in and out and fails: # stress -m 1 --vm-bytes 1290M stress: info: [1163] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)" . . . By contrast, the following had no problem for as long as I let it run --and did not page in or out: # stress -m 1 --vm-bytes 1280M stress: info: [1181] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd The 2 socket PowerMac G4 context with 2048 MiByte of RAM . . . stress -m 1 --vm-bytes 1792M did not (quickly?) fail or page. 1792 is as large as it would allow with M. The following also did not (quickly?) fail (and were not paging): stress -m 2 --vm-bytes 896M stress -m 4 --vm-bytes 448M stress -m 8 --vm-bytes 224M (Only 1 example was run at a time.) But the following all did quickly fail (and were paging): stress -m 8 --vm-bytes 225M stress -m 4 --vm-bytes 449M stress -m 2 --vm-bytes 897M (Only 1 example was run at a time.) I'll note that when I exited an su process I ended up with a: <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200: Failed assertion: "ret == sz_index2size_compute(index)" Abort trap (core dumped) and a matching su.core file. It appears that stress's activity leads to other processes also seeing examples of the zeroed-page(s) problem (probably su had paged some or had been fully swapped out): (gdb) bt #0 thr_kill () at thr_kill.S:4 #1 0x503821d0 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x502e1d20 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 #3 0x502d6144 in sz_index2size_lookup (index=<optimized out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200 #4 sz_index2size (index=<optimized out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207 #5 ifree (tsd=0x5008b018, ptr=0x50041460, tcache=0x5008b138, slow_path=<optimized out>) at jemalloc_jemalloc.c:2583 #6 0x502d5cec in __je_free_default (ptr=0x50041460) at jemalloc_jemalloc.c:2784 #7 0x502d62d4 in __free (ptr=0x50041460) at jemalloc_jemalloc.c:2852 #8 0x501050cc in openpam_destroy_chain (chain=0x50041480) at /usr/src/contrib/openpam/lib/libpam/openpam_load.c:113 #9 0x50105094 in openpam_destroy_chain (chain=0x500413c0) at /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #10 0x50105094 in openpam_destroy_chain (chain=0x50041320) at /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #11 0x50105094 in openpam_destroy_chain (chain=0x50041220) at /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #12 0x50105094 in openpam_destroy_chain (chain=0x50041120) at /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #13 0x50105094 in openpam_destroy_chain (chain=0x50041100) at /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111 #14 0x50105014 in openpam_clear_chains (policy=0x50600004) at /usr/src/contrib/openpam/lib/libpam/openpam_load.c:130 #15 0x50101230 in pam_end (pamh=0x50600000, status=<optimized out>) at /usr/src/contrib/openpam/lib/libpam/pam_end.c:83 #16 0x1001225c in main (argc=<optimized out>, argv=0x0) at /usr/src/usr.bin/su/su.c:477 (gdb) print/x __je_sz_size2index_tab $1 = {0x0 <repeats 513 times>} Notes: Given that the original problem did not involve paging to the swap partition, may be just making it to the Laundry list or some such is sufficient, something that is also involved when the swap space is partially in use (according to top). Or sitting in the inactive list for a long time, if that has some special status. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) _______________________________________________ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"