[zfs-discuss] Interaction between ZFS intent log and mmap'd files
I'm interested in some more detail on how ZFS intent log behaves for updated done via a memory mapped file - i.e. will the ZIL log updates done to an mmap'd file or not ? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Remedies for suboptimal mmap performance on zfs
On 06/01/2012 02:33 PM, Jeff Bacon wrote: I'd be interested in the results of such tests. You can change the primarycache parameter on the fly, so you could test it in less time than it takes for me to type this email :-) -- Richard Tried that. Performance headed south like a cat with its tail on fire. We didn't bother quantifying, it was just that hideous. (You know, us northern-hemisphere people always use "south" as a "down" direction. Is it different for people in the southern hemisphere? :) ) There's just too many _other_ little things running around a normal system for which NOT having primarycache is just too painful to contemplate (even with L2ARC) that, while I can envisage situations where one might want to do that, they're very very few and far between. Thanks for the valuable feedback Jeff, though I think you might misunderstand - the idea is to make a zfs filesystem just for the files being mmaped by mongo - the idea is to only disable ARC where there is double caching involved (i.e. for mmaped files) - leaving rest of the system with ARC and taking ARC out of the picture with MongoDB. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Remedies for suboptimal mmap performance on zfs
On 05/29/2012 03:29 AM, Daniel Carosone wrote: For the mmap case: does the ARC keep a separate copy, or does the vm system map the same page into the process's address space? If a separate copy is made, that seems like a potential source of many kinds of problems - if it's the same page then the whole premise is essentially moot and there's no "double caching". As far as I understand, for mmap case, is that the page cache is distinct from ARC (i.e. normal simplified flow for reading from disk with mmap is DSK->ARC->PageCache) - and only page cache gets mapped into processes address space - which is what results in the double caching. I have two other general questions regarding page cache with ZFS + Solaris: - Does anything else except mmap still use the page cache ? - Is there a parameter similar to /proc/sys/vm/swappiness that can control how long unused pages in page cache stay in physical ram if there is no shortage of physical ram ? And if not how long will unused pages stay in page cache stay in physical ram given there is no shortage of physical ram ? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Remedies for suboptimal mmap performance on zfs
On 05/28/2012 10:12 PM, Andrew Gabriel wrote: On 05/28/12 20:06, Iwan Aucamp wrote: I'm thinking of doing the following: - relocating mmaped (mongo) data to a zfs filesystem with only metadata cache - reducing zfs arc cache to 16 GB Is there any other recommendations - and is above likely to improve performance. 1. Upgrade to S10 Update 10 - this has various performance improvements, in particular related to database type loads (but I don't know anything about mongodb). 2. Reduce the ARC size so RSS + ARC + other memory users< RAM size. I assume the RSS include's whatever caching the database does. In theory, a database should be able to work out what's worth caching better than any filesystem can guess from underneath it, so you want to configure more memory in the DB's cache than in the ARC. (The default ARC tuning is unsuitable for a database server.) 3. If the database has some concept of blocksize or recordsize that it uses to perform i/o, make sure the filesystems it is using configured to be the same recordsize. The ZFS default recordsize (128kB) is usually much bigger than database blocksizes. This is probably going to have less impact with an mmaped database than a read(2)/write(2) database, where it may prove better to match the filesystem's record size to the system's page size (4kB, unless it's using some type of large pages). I haven't tried playing with recordsize for memory mapped i/o, so I'm speculating here. Blocksize or recordsize may apply to the log file writer too, and it may be that this needs a different recordsize and therefore has to be in a different filesystem. If it uses write(2) or some variant rather than mmap(2) and doesn't document this in detail, Dtrace is your friend. 4. Keep plenty of free space in the zpool if you want good database performance. If you're more than 60% full (S10U9) or 80% full (S10U10), that could be a factor. Anyway, there are a few things to think about. Thanks for the Feedback, I cannot really do 1, but will look into points 3 and 4 - in addition to 2 - which is what I desire to achieve with my second point - but I would still like to know if it is recommended to only do metadata caching for mmaped files (mongodb data files) - the way I see it this should get rid of the double caching which is being done for mmaped files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Remedies for suboptimal mmap performance on zfs
I'm getting sub-optimal performance with an mmap based database (mongodb) which is running on zfs of Solaris 10u9. System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 * 4GB) ram (installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks - a few mongodb instances are running with with moderate IO and total rss of 50 GB - a service which logs quite excessively (5GB every 20 mins) is also running (max 2GB ram use) - log files are compressed after some time to bzip2. Database performance is quite horrid though - it seems that zfs does not know how to manage allocation between page cache and arc cache - and it seems arc cache wins most of the time. I'm thinking of doing the following: - relocating mmaped (mongo) data to a zfs filesystem with only metadata cache - reducing zfs arc cache to 16 GB Is there any other recommendations - and is above likely to improve performance. -- Iwan Aucamp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss