[Default] On Fri, 21 Nov 2008 17:20:48 PST, Vincent Kéravec <[EMAIL PROTECTED]> wrote:
> I just try ZFS on one of our slave and got some really > bad performance. > > When I start the server yesterday, it was able to keep > up with the main server without problem but after two > days of consecutive run the server is crushed by IO. > > After running the dtrace script iopattern, I notice > that the workload is now 100% Random IO. Copying the > database (140Go) from one directory to an other took > more than 4 hours without any other tasks running on > the server, and all the reads on table that where > updated where random... Keeping an eye on iopattern and > zpool iostat I saw that when the systems was accessing > file that have not been changed the disk was reading > sequentially at more than 50Mo/s but when reading files > that changed often the speed got down to 2-3 Mo/s. Good observation and analysis. > The server has plenty of diskplace so it should not > have such a level of file fragmentation in such a short > time. My explanation would be: Whenever a block within a file changes, zfs has to write it at another location ("copy on write"), so the previous version isn't immediately lost. Zfs will try to keep the new version of the block close to the original one, but after several changes on the same database page, things get pretty messed up and logical sequential I/O becomes pretty much physically random indeed. The original blocks will eventually be added to the freelist and reused, so proximity can be restored, but it will never be 100% sequential again. The effect is larger when many snapshots are kept, because older block versions are not freed, or when the same block is changed very often and freelist updating has to be postponed. That is the trade-off between "always consistent" and "fast". > For information I'm using solaris 10/08 with a mirrored > root pool on two 1Tb Sata harddisk (slow with random > io). I'm using MySQL 5.0.67 with MyISAM engine. The zfs > recordsize is 8k as recommended on the zfs guide. I would suggest to enlarge the MyISAM buffers. The InnoDB engine does copy on write within its data files, so things might be different there. -- ( Kees Nuyt ) c[_] _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss