On 2018-02-25 22:56, Allan Jude wrote: > On 2017-03-17 08:34, Steven Hartland wrote: >> Author: smh >> Date: Fri Mar 17 12:34:57 2017 >> New Revision: 315449 >> URL: https://svnweb.freebsd.org/changeset/base/315449 >> >> Log: >> Reduce ARC fragmentation threshold >> >> As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during zfs >> recv, >> update the threshold at which we start agressive reclamation to use >> SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which >> defaults to 1M. >> >> PR: 194513 >> Reviewed by: avg, mav >> MFC after: 1 month >> Sponsored by: Multiplay >> Differential Revision: https://reviews.freebsd.org/D10012 >> >> Modified: >> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c >> >> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c >> ============================================================================== >> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Mar >> 17 12:34:56 2017 (r315448) >> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Fri Mar >> 17 12:34:57 2017 (r315449) >> @@ -3978,7 +3978,7 @@ arc_available_memory(void) >> * Start aggressive reclamation if too little sequential KVA left. >> */ >> if (lowest > 0) { >> - n = (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ? >> + n = (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ? >> -((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) : >> INT64_MAX; >> if (n < lowest) { >> > > I have some users reporting excessive ARC shrinking in 11.1 vs 11.0 due > to this change. > > Memory seems quite fragmented, and this change makes it much more > sensitive to that, but the problem seems to be that is can get to > aggressive. > > The most recent case, the machine has 128GB of ram, and no other major > processes running, just ZFS zvols being served over iSCIS by ctld. > > arc_max set to 85GB, rather conservative. After running for a few days, > fragmentation seems to trip this line, when there are no 16mb contiguous > blocks, and it shrinks the ARC by 1/16th of memory, but this does not > result in a 16mb contiguous chunk, so it shrinks the ARC by another > 1/16th, and again until it hits arc_min. Apparently eventually the ARC > does regrow, but then crashes again later. > > You can see the ARC oscillating between arc_max and arc_min, with some > long periods pinned at arc_min: https://imgur.com/a/emztF > > > [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0; > cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used > += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c} > END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e > 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print > $_, " "} print "\n"' | column -t | tail
TOTAL NAME USED Cache > 1,723,367,424 zio_data_buf_49152 1,722,875,904 491,520 > 1,827,057,664 zio_buf_4096 1,826,848,768 208,896 > 2,289,459,200 zio_data_buf_40960 2,289,090,560 368,640 > 3,642,736,640 zio_data_buf_81920 3,642,408,960 327,680 > 6,713,180,160 zio_data_buf_98304 6,712,688,640 491,520 > 9,388,195,840 zio_buf_8192 9,388,064,768 131,072 > 11,170,152,448 zio_data_buf_114688 11,168,890,880 1,261,568 > 29,607,329,792 zio_data_buf_131072 29,606,674,432 655,360 > 32,944,750,592 zio_buf_65536 32,943,833,088 917,504 > 114,235,296,752 TOTAL 111,787,212,900 2,448,083,852 > > > [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0; > cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used > += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c} > END { print total, "TOTAL", used, cache } ' | sort -n +3 | perl -a -p -e > 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print > $_, " "} print "\n"' | column -t | tail Sorted by cache (waste) TOTAL NAME USED Cache > 71,565,312 cblk15 0 71,565,312 > 72,220,672 cblk16 0 72,220,672 > 72,351,744 cblk18 131,072 72,220,672 > 72,744,960 cblk3 0 72,744,960 > 75,497,472 cblk8 0 75,497,472 > 76,283,904 cblk22 0 76,283,904 > 403,696,384 128 286,225,792 117,470,592 > 229,519,360 mbuf_jumbo_page 67,043,328 162,476,032 > 1,196,795,160 arc_buf_hdr_t_l2only 601,620,624 595,174,536 > 114,220,354,544 TOTAL 111,778,349,508 2,442,005,036 > > > Maybe the right thing to do is call the new kmem_cache_reap_soon() or > other functions that might actually reduce fragmentation, or rate limit > how quickly the ARC will shrink? > > What kind of tools do we have to look at why memory is so fragmented > that ZFS feels the need to tank the ARC? > > > > I know this block and the FMR_ZIO_FRAG reason have been removed from > -CURRENT as part of the NUMA work, but I am worried about addressing > this issue for the upcoming 11.2-RELEASE. > > > Does anyone have any thoughts on this? The 11.2 code slush starts in 1 week, so we really need to decide what to do here. -- Allan Jude
signature.asc
Description: OpenPGP digital signature