On 2018-02-25 22:56, Allan Jude wrote:
> On 2017-03-17 08:34, Steven Hartland wrote:
>> Author: smh
>> Date: Fri Mar 17 12:34:57 2017
>> New Revision: 315449
>> URL: https://svnweb.freebsd.org/changeset/base/315449
>>
>> Log:
>>   Reduce ARC fragmentation threshold
>>   
>>   As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during zfs 
>> recv,
>>   update the threshold at which we start agressive reclamation to use
>>   SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which
>>   defaults to 1M.
>>   
>>   PR:                194513
>>   Reviewed by:       avg, mav
>>   MFC after: 1 month
>>   Sponsored by:      Multiplay
>>   Differential Revision:     https://reviews.freebsd.org/D10012
>>
>> Modified:
>>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>>
>> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>> ==============================================================================
>> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        Fri Mar 
>> 17 12:34:56 2017        (r315448)
>> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        Fri Mar 
>> 17 12:34:57 2017        (r315449)
>> @@ -3978,7 +3978,7 @@ arc_available_memory(void)
>>       * Start aggressive reclamation if too little sequential KVA left.
>>       */
>>      if (lowest > 0) {
>> -            n = (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ?
>> +            n = (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ?
>>                  -((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) :
>>                  INT64_MAX;
>>              if (n < lowest) {
>>
> 
> I have some users reporting excessive ARC shrinking in 11.1 vs 11.0 due
> to this change.
> 
> Memory seems quite fragmented, and this change makes it much more
> sensitive to that, but the problem seems to be that is can get to
> aggressive.
> 
> The most recent case, the machine has 128GB of ram, and no other major
> processes running, just ZFS zvols being served over iSCIS by ctld.
> 
> arc_max set to 85GB, rather conservative. After running for a few days,
> fragmentation seems to trip this line, when there are no 16mb contiguous
> blocks, and it shrinks the ARC by 1/16th of memory, but this does not
> result in a 16mb contiguous chunk, so it shrinks the ARC by another
> 1/16th, and again until it hits arc_min. Apparently eventually the ARC
> does regrow, but then crashes again later.
> 
> You can see the ARC oscillating between arc_max and arc_min, with some
> long periods pinned at arc_min: https://imgur.com/a/emztF
> 
> 
> [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0;
> cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used
> += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c}
> END { print total, "TOTAL", used, cache } ' | sort -n | perl -a -p -e
> 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print
> $_, " "} print "\n"' | column -t | tail

TOTAL              NAME                   USED             Cache

> 1,723,367,424    zio_data_buf_49152     1,722,875,904    491,520
> 1,827,057,664    zio_buf_4096           1,826,848,768    208,896
> 2,289,459,200    zio_data_buf_40960     2,289,090,560    368,640
> 3,642,736,640    zio_data_buf_81920     3,642,408,960    327,680
> 6,713,180,160    zio_data_buf_98304     6,712,688,640    491,520
> 9,388,195,840    zio_buf_8192           9,388,064,768    131,072
> 11,170,152,448   zio_data_buf_114688    11,168,890,880   1,261,568
> 29,607,329,792   zio_data_buf_131072    29,606,674,432   655,360
> 32,944,750,592   zio_buf_65536          32,943,833,088   917,504
> 114,235,296,752  TOTAL                  111,787,212,900  2,448,083,852
> 
> 
> [root@ZFS-AF ~]# vmstat -z | tail +3 | awk -F '[:,] *' 'BEGIN { total=0;
> cache=0; used=0 } {u = $2 * $4; c = $2 * $5; t = u + c; cache += c; used
> += u; total += t; name=$1; gsub(" ", "_", name); print t, name, u, c}
> END { print total, "TOTAL", used, cache } ' | sort -n +3 | perl -a -p -e
> 'while (($j, $_) = each(@F)) { 1 while s/^(-?\d+)(\d{3})/$1,$2/; print
> $_, " "} print "\n"' | column -t | tail

Sorted by cache (waste)

TOTAL              NAME                   USED             Cache

> 71,565,312       cblk15                 0                71,565,312
> 72,220,672       cblk16                 0                72,220,672
> 72,351,744       cblk18                 131,072          72,220,672
> 72,744,960       cblk3                  0                72,744,960
> 75,497,472       cblk8                  0                75,497,472
> 76,283,904       cblk22                 0                76,283,904
> 403,696,384      128                    286,225,792      117,470,592
> 229,519,360      mbuf_jumbo_page        67,043,328       162,476,032
> 1,196,795,160    arc_buf_hdr_t_l2only   601,620,624      595,174,536
> 114,220,354,544  TOTAL                  111,778,349,508  2,442,005,036
> 
> 
> Maybe the right thing to do is call the new kmem_cache_reap_soon() or
> other functions that might actually reduce fragmentation, or rate limit
> how quickly the ARC will shrink?
> 
> What kind of tools do we have to look at why memory is so fragmented
> that ZFS feels the need to tank the ARC?
> 
> 
> 
> I know this block and the FMR_ZIO_FRAG reason have been removed from
> -CURRENT as part of the NUMA work, but I am worried about addressing
> this issue for the upcoming 11.2-RELEASE.
> 
> 
> 

Does anyone have any thoughts on this? The 11.2 code slush starts in 1
week, so we really need to decide what to do here.

-- 
Allan Jude

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to