On Friday 29 August 2014 21:42:15 Steven Hartland wrote: > ----- Original Message ----- > From: "Peter Wemm" <pe...@wemm.org> > > > On Friday 29 August 2014 20:51:03 Steven Hartland wrote: > snip.. > > > > Does Karl's explaination as to why this doesn't work above change > > > your > > > mind? > > > > Actually no, I would expect the code as committed would *cause* the > > undesirable behavior that Karl described. > > > > ie: access a few large files and cause them to reside in cache. Say > > 50GB or so > > on a 200G ram machine. We now have the state where: > > > > v_cache = 50GB > > v_free = 1MB > > > > The rest of the vm system looks at vm_paging_needed(), which is: do > > we have > > enough "v_cache + v_free"? Since there's 50.001GB free, the answer is > > no. > > It'll let v_free run right down to v_free_min because of the giant > > pool of > > v_cache just sitting there, waiting to be used. > > > > The zfs change, as committed will ignore all the free memory in the > > form of > > v_cache.. and will be freaking out about how low v_free is getting and > > will be > > sacrificing ARC in order to put more memory into the v_free pool. > > > > As long as ARC keeps sacrificing itself this way, the free pages in > > the v_cache > > pool won't get used. When ARC finally runs out of pages to give up to > > v_free, > > the kernel will start using the free pages from v_cache. Eventually > > it'll run > > down that v_cache free pool and arc will be in a bare minimum state > > while this > > is happening. > > > > Meanwhile, ZFS ARC will be crippled. This has consequences - it does > > RCU like > > things from ARC to keep fragmentation under control. With ARC > > crippled, > > fragmentation will increase because there's less opportunistic > > gathering of > > data from ARC. > > > > Granted, you have to get things freed from active/inactive to the > > cache state, > > but once it's there, depending on the worlkload, it'll mess with ARC. > > There's already a vm_paging_needed() check in there below so this will > already > be dealt with will it not?
No. If you read the code that you changed, you won't get that far. The v_free test comes before vm_paging_needed(), and if the v_free test triggers then ARC will return pages and not look at the rest of the function. If this function returns non-zerp, ARC is given back: static int arc_reclaim_needed(void) { if (kmem_free_count() < zfs_arc_free_target) { return (1); } /* * Cooperate with pagedaemon when it's time for it to scan * and reclaim some pages. */ if (vm_paging_needed()) { return (1); } ie: if v_free (ignoring v_cache free pages) gets below the threshold, stop evertyhing and discard ARC pages. The vm_paging_needed() code is a NO-OP at this point. It can never return true. Consider: vm_cnt.v_free_target = 4 * vm_cnt.v_free_min + vm_cnt.v_free_reserved; vs vm_pageout_wakeup_thresh = (vm_cnt.v_free_min / 10) * 11; zfs_arc_free_target defaults to vm_cnt.v_free_target, which is 400% of v_free_min, and compares it against the smaller v_free pool. vm_paging_needed() compares the total free pool (v_free + v_cache) against the smaller wakeup threshold - 110% of v_free_min. Comparing a larger value against a smaller target than the previous test will never succeed unless you manually change the arc_free_target sysctl. Also, what about the magic numbers here: u_int zfs_arc_free_target = (1 << 19); /* default before pagedaemon init only */ That's half a million pages, or 2GB of physical ram on a 4K page size system How is this going to work on early boot in the machines in the cluster with less than 2GB of ram? -- Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246
signature.asc
Description: This is a digitally signed message part.