[zfs-discuss] Interaction between ZFS intent log and mmap'd files

2012-07-02 Thread Iwan Aucamp
I'm interested in some more detail on how ZFS intent log behaves for 
updated done via a memory mapped file - i.e. will the ZIL log updates 
done to an mmap'd file or not ?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remedies for suboptimal mmap performance on zfs

2012-06-01 Thread Iwan Aucamp

On 06/01/2012 02:33 PM, Jeff Bacon wrote:

I'd be interested in the results of such tests. You can change the primarycache
parameter on the fly, so you could test it in less time than it
takes for me to type this email :-)
  -- Richard

Tried that. Performance headed south like a cat with its tail on fire. We 
didn't bother quantifying, it was just that hideous.

(You know, us northern-hemisphere people always use "south" as a "down" 
direction. Is it different for people in the southern hemisphere? :) )

There's just too many _other_ little things running around a normal system for 
which NOT having primarycache is just too painful to contemplate (even with 
L2ARC) that, while I can envisage situations where one might want to do that, 
they're very very few and far between.


Thanks for the valuable feedback Jeff, though I think you might 
misunderstand - the idea is to make a zfs filesystem just for the files 
being mmaped by mongo - the idea is to only disable ARC where there is 
double caching involved (i.e. for mmaped files) - leaving rest of the 
system with ARC and taking ARC out of the picture with MongoDB.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remedies for suboptimal mmap performance on zfs

2012-05-29 Thread Iwan Aucamp

On 05/29/2012 03:29 AM, Daniel Carosone wrote:
For the mmap case: does the ARC keep a separate copy, or does the vm 
system map the same page into the process's address space? If a 
separate copy is made, that seems like a potential source of many 
kinds of problems - if it's the same page then the whole premise is 
essentially moot and there's no "double caching". 


As far as I understand, for mmap case, is that the page cache is 
distinct from ARC (i.e. normal simplified flow for reading from disk 
with mmap is DSK->ARC->PageCache) - and only page cache gets mapped into 
processes address space - which is what results in the double caching.


I have two other general questions regarding page cache with ZFS + Solaris:
 - Does anything else except mmap still use the page cache ?
 - Is there a parameter similar to /proc/sys/vm/swappiness that can 
control how long unused pages in page cache stay in physical ram if 
there is no shortage of physical ram ? And if not how long will unused 
pages stay in page cache stay in physical ram given there is no shortage 
of physical ram ?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Remedies for suboptimal mmap performance on zfs

2012-05-28 Thread Iwan Aucamp

On 05/28/2012 10:12 PM, Andrew Gabriel wrote:

  On 05/28/12 20:06, Iwan Aucamp wrote:

I'm thinking of doing the following:
  - relocating mmaped (mongo) data to a zfs filesystem with only
metadata cache
  - reducing zfs arc cache to 16 GB

Is there any other recommendations - and is above likely to improve
performance.

1. Upgrade to S10 Update 10 - this has various performance improvements,
in particular related to database type loads (but I don't know anything
about mongodb).

2. Reduce the ARC size so RSS + ARC + other memory users<  RAM size.
I assume the RSS include's whatever caching the database does. In
theory, a database should be able to work out what's worth caching
better than any filesystem can guess from underneath it, so you want to
configure more memory in the DB's cache than in the ARC. (The default
ARC tuning is unsuitable for a database server.)

3. If the database has some concept of blocksize or recordsize that it
uses to perform i/o, make sure the filesystems it is using configured to
be the same recordsize. The ZFS default recordsize (128kB) is usually
much bigger than database blocksizes. This is probably going to have
less impact with an mmaped database than a read(2)/write(2) database,
where it may prove better to match the filesystem's record size to the
system's page size (4kB, unless it's using some type of large pages). I
haven't tried playing with recordsize for memory mapped i/o, so I'm
speculating here.

Blocksize or recordsize may apply to the log file writer too, and it may
be that this needs a different recordsize and therefore has to be in a
different filesystem. If it uses write(2) or some variant rather than
mmap(2) and doesn't document this in detail, Dtrace is your friend.

4. Keep plenty of free space in the zpool if you want good database
performance. If you're more than 60% full (S10U9) or 80% full (S10U10),
that could be a factor.

Anyway, there are a few things to think about.


Thanks for the Feedback, I cannot really do 1, but will look into points 
3 and 4 - in addition to 2 - which is what I desire to achieve with my 
second point - but I would still like to know if it is recommended to 
only do metadata caching for mmaped files (mongodb data files) - the way 
I see it this should get rid of the double caching which is being done 
for mmaped files.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Remedies for suboptimal mmap performance on zfs

2012-05-28 Thread Iwan Aucamp
I'm getting sub-optimal performance with an mmap based database 
(mongodb) which is running on zfs of Solaris 10u9.


System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 * 4GB) 
ram (installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks


 - a few mongodb instances are running with with moderate IO and total 
rss of 50 GB
 - a service which logs quite excessively (5GB every 20 mins) is also 
running (max 2GB ram use) - log files are compressed after some time to 
bzip2.


Database performance is quite horrid though - it seems that zfs does not 
know how to manage allocation between page cache and arc cache - and it 
seems arc cache wins most of the time.


I'm thinking of doing the following:
 - relocating mmaped (mongo) data to a zfs filesystem with only 
metadata cache

 - reducing zfs arc cache to 16 GB

Is there any other recommendations - and is above likely to improve 
performance.


--
Iwan Aucamp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss