Ray Clark wrote:
> I am [trying to] perform a test prior to moving my data to solaris and zfs.  
> Things are going very poorly.  Please suggest what I might do to understand 
> what is going on, report a meaningful bug report, fix it, whatever!
>
> Both to learn what the compression could be, and to induce a heavy load to 
> expose issues, I am running with compress=gzip-9.
>
> I have two machines, both identical 800MHz P3 with 768MB memory.  The disk 
> complement and OS is different.  My current host is Suse Linux 10.2 (2.6.18 
> kernel) running two 120GB drives under LVM.  My test machine is 2008.11 B2 
> with two 200GB drives on the motherboard secondary IDE, zfs mirroring them, 
> NFS exported.
>
> My "test" is to simply run "cp -rp * /testhome" on the Linux machine, where 
> /testhome is the NFS mounted zfs file system on the Solaris system.
>
> It starts out with "reasonable" throughput.  Although the heavy load makes 
> the Solaris system pretty jerky and unresponsive, it does work.  The Linux 
> system is a little jerky and unresponsive, I assume due to waiting for 
> sluggish network responses.
>
> After about 12 hours, the throughput has slowed to a crawl.  The Solaris 
> machine takes a minute or more to respond to every character typed and mouse 
> click.  The Linux machines is no longer jerky, which makes sense since it has 
> to wait alot for Solaris.  Stuff is flowing, but throughput is in the range 
> of 100K bytes/second.
>
> The Linux machine (available for tests) "gzip -9"ing a few multi-GB files 
> seems to get 3MB/sec +/- 5% pretty consistently.  Being the exact same CPU, 
> RAM (Including brand and model), Chipset, etc. I would expect should have 
> similar throughput from ZFS.  This is in the right ballpark of what I saw 
> when the copy first started.  In an hour or two it moved about 17GB.
>
> I am also running a "vmstat" and a "top" to a log file.  Top reports total 
> swap size as 512MB, 510 available.  vmstat for the first few hours reported 
> something reasonable (it never seems to agree with top), but now is reporting 
> around 570~580MB, and for a while was reporting well over 600MB free swap out 
> of the 512M total!
>
> I have gotten past a top memory leak (opensolaris.com bug 5482) and so am now 
> running top only one iteration, in a shell for loop with a sleep instead of 
> letting it repeat.  This was to be my test run to see it work.
>
> What information can I capture and how can I capture it to figure this out?
>
> My goal is to gain confidence in this system.  The idea is that Solaris and 
> ZFS should be more reliable than Linux and LVM.  Although I have never lost 
> data due to Linux problems, I have lost it due to disk failure, and zfs 
> should cover that!
>
> Thank you ahead for any ideas or suggestions.
>   

800 MHz P3 + 768 MBytes of RAM + IDE + ZFS + gzip-9 + NFS = pain
I'm not sure I could dream of a worse combination for performance.  I'm
actually surprised it takes 12 hours to crater -- probably because the 
client
is also quite slow.

arcstat will show the ARC usage, which should be increasing until the limit
is reached.  If you compare iostat to network use (eg nicstat or iostat 
(does
the Linux version of iostat track NFS I/O?)) then you should see a mismatch,
which can likely be attributed to the time required to gzip-9 and commit to
disk.

When there is plenty of free RAM or the ARC is full of flushable data, then
performance might be ok.  But the ARC can also contain writable 
(unflushable)
data which cannot be quickly drained because of IDE + gzip-9 + 800 MHz P3.
Look for a memory shortfall, which we would normally expect under such
conditions, and probably best observed via the scan rate column in vmstat.

You could change any one of the variables and get much better performance.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to