date:20081129

I am [trying to] perform a test prior to moving my data to solaris and zfs.  
Things are going very poorly.  Please suggest what I might do to understand 
what is going on, report a meaningful bug report, fix it, whatever!

Both to learn what the compression could be, and to induce a heavy load to 
expose issues, I am running with compress=gzip-9.

I have two machines, both identical 800MHz P3 with 768MB memory.  The disk 
complement and OS is different.  My current host is Suse Linux 10.2 (2.6.18 
kernel) running two 120GB drives under LVM.  My test machine is 2008.11 B2 with 
two 200GB drives on the motherboard secondary IDE, zfs mirroring them, NFS 
exported.

My test is to simply run cp -rp * /testhome on the Linux machine, where 
/testhome is the NFS mounted zfs file system on the Solaris system.

It starts out with reasonable throughput.  Although the heavy load makes the 
Solaris system pretty jerky and unresponsive, it does work.  The Linux system 
is a little jerky and unresponsive, I assume due to waiting for sluggish 
network responses.

After about 12 hours, the throughput has slowed to a crawl.  The Solaris 
machine takes a minute or more to respond to every character typed and mouse 
click.  The Linux machines is no longer jerky, which makes sense since it has 
to wait alot for Solaris.  Stuff is flowing, but throughput is in the range of 
100K bytes/second.

The Linux machine (available for tests) gzip -9ing a few multi-GB files seems 
to get 3MB/sec +/- 5% pretty consistently.  Being the exact same CPU, RAM 
(Including brand and model), Chipset, etc. I would expect should have similar 
throughput from ZFS.  This is in the right ballpark of what I saw when the copy 
first started.  In an hour or two it moved about 17GB.

I am also running a vmstat and a top to a log file.  Top reports total swap 
size as 512MB, 510 available.  vmstat for the first few hours reported 
something reasonable (it never seems to agree with top), but now is reporting 
around 570~580MB, and for a while was reporting well over 600MB free swap out 
of the 512M total!

I have gotten past a top memory leak (opensolaris.com bug 5482) and so am now 
running top only one iteration, in a shell for loop with a sleep instead of 
letting it repeat.  This was to be my test run to see it work.

What information can I capture and how can I capture it to figure this out?

My goal is to gain confidence in this system.  The idea is that Solaris and ZFS 
should be more reliable than Linux and LVM.  Although I have never lost data 
due to Linux problems, I have lost it due to disk failure, and zfs should cover 
that!

Thank you ahead for any ideas or suggestions.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

We looked at using gzip compression with zfs for a backup solution. Short 
answer on our conclusion is it will not work. The gzip -9 compression just 
can't sustain a large throughput of copying gigs of files. We ended up doing 
non gzip compression and taking the hit on compression ratio (1.2x or so 
instead of almost 3x).

In the mean time, we are working on integrating our gzip board into zfs so that 
we can try again with hardware compression.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread andrew

I am [trying to] perform a test prior to moving my
data to solaris and zfs. Things are going very
poorly. Please suggest what I might do to understand
what is going on, report a meaningful bug report, fix
it, whatever!

Both to learn what the compression could be, and to
induce a heavy load to expose issues, I am running
with compress=gzip-9.

I have two machines, both identical 800MHz P3 with
768MB memory. The disk complement and OS is
different. My current host is Suse Linux 10.2
(2.6.18 kernel) running two 120GB drives under LVM.
My test machine is 2008.11 B2 with two 200GB drives
on the motherboard secondary IDE, zfs mirroring
them, NFS exported.

My test is to simply run cp -rp * /testhome on
the Linux machine, where /testhome is the NFS mounted
zfs file system on the Solaris system.

It starts out with reasonable throughput. Although
the heavy load makes the Solaris system pretty jerky
and unresponsive, it does work. The Linux system is
a little jerky and unresponsive, I assume due to
waiting for sluggish network responses.

After about 12 hours, the throughput has slowed to a
crawl. The Solaris machine takes a minute or more to
respond to every character typed and mouse click.
The Linux machines is no longer jerky, which makes
sense since it has to wait alot for Solaris. Stuff
is flowing, but throughput is in the range of 100K
bytes/second.

The Linux machine (available for tests) gzip -9ing
a few multi-GB files seems to get 3MB/sec +/- 5%
pretty consistently. Being the exact same CPU, RAM
(Including brand and model), Chipset, etc. I would
expect should have similar throughput from ZFS. This
is in the right ballpark of what I saw when the copy
first started. In an hour or two it moved about
17GB.

I am also running a vmstat and a top to a log
file. Top reports total swap size as 512MB, 510
available. vmstat for the first few hours reported
something reasonable (it never seems to agree with
top), but now is reporting around 570~580MB, and for
a while was reporting well over 600MB free swap out
of the 512M total!

I have gotten past a top memory leak (opensolaris.com
bug 5482) and so am now running top only one
iteration, in a shell for loop with a sleep instead
of letting it repeat. This was to be my test run to
see it work.

What information can I capture and how can I capture
it to figure this out?

My goal is to gain confidence in this system. The
idea is that Solaris and ZFS should be more reliable
than Linux and LVM. Although I have never lost data
due to Linux problems, I have lost it due to disk
failure, and zfs should cover that!

Thank you ahead for any ideas or suggestions.

Solaris reports virtual memory as the sum of physical memory and page file -
so this is where your strange vmstat output comes from. Running ZFS stress
tests on a system with only 768MB of memory is not a good idea since ZFS uses
large amounts of memory for its cache. You can limit the size of the ARC
(Adaptive Replacement Cache) using the details here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache

Try limiting the ARC size then run the test again - if this works then memory
contention is the cause of the slowdown.

Also, NFS to ZFS filesystems will run slowly under certain conditions
-including with the default configuration. See this link for more information:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes

Cheers

Andrew.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Richard Elling

Ray Clark wrote:
 I am [trying to] perform a test prior to moving my data to solaris and zfs.  
 Things are going very poorly.  Please suggest what I might do to understand 
 what is going on, report a meaningful bug report, fix it, whatever!

 Both to learn what the compression could be, and to induce a heavy load to 
 expose issues, I am running with compress=gzip-9.

 I have two machines, both identical 800MHz P3 with 768MB memory.  The disk 
 complement and OS is different.  My current host is Suse Linux 10.2 (2.6.18 
 kernel) running two 120GB drives under LVM.  My test machine is 2008.11 B2 
 with two 200GB drives on the motherboard secondary IDE, zfs mirroring them, 
 NFS exported.

 My test is to simply run cp -rp * /testhome on the Linux machine, where 
 /testhome is the NFS mounted zfs file system on the Solaris system.

 It starts out with reasonable throughput.  Although the heavy load makes 
 the Solaris system pretty jerky and unresponsive, it does work.  The Linux 
 system is a little jerky and unresponsive, I assume due to waiting for 
 sluggish network responses.

 After about 12 hours, the throughput has slowed to a crawl.  The Solaris 
 machine takes a minute or more to respond to every character typed and mouse 
 click.  The Linux machines is no longer jerky, which makes sense since it has 
 to wait alot for Solaris.  Stuff is flowing, but throughput is in the range 
 of 100K bytes/second.

 The Linux machine (available for tests) gzip -9ing a few multi-GB files 
 seems to get 3MB/sec +/- 5% pretty consistently.  Being the exact same CPU, 
 RAM (Including brand and model), Chipset, etc. I would expect should have 
 similar throughput from ZFS.  This is in the right ballpark of what I saw 
 when the copy first started.  In an hour or two it moved about 17GB.

 I am also running a vmstat and a top to a log file.  Top reports total 
 swap size as 512MB, 510 available.  vmstat for the first few hours reported 
 something reasonable (it never seems to agree with top), but now is reporting 
 around 570~580MB, and for a while was reporting well over 600MB free swap out 
 of the 512M total!

 I have gotten past a top memory leak (opensolaris.com bug 5482) and so am now 
 running top only one iteration, in a shell for loop with a sleep instead of 
 letting it repeat.  This was to be my test run to see it work.

 What information can I capture and how can I capture it to figure this out?

 My goal is to gain confidence in this system.  The idea is that Solaris and 
 ZFS should be more reliable than Linux and LVM.  Although I have never lost 
 data due to Linux problems, I have lost it due to disk failure, and zfs 
 should cover that!

 Thank you ahead for any ideas or suggestions.
   

800 MHz P3 + 768 MBytes of RAM + IDE + ZFS + gzip-9 + NFS = pain
I'm not sure I could dream of a worse combination for performance.  I'm
actually surprised it takes 12 hours to crater -- probably because the 
client
is also quite slow.

arcstat will show the ARC usage, which should be increasing until the limit
is reached.  If you compare iostat to network use (eg nicstat or iostat 
(does
the Linux version of iostat track NFS I/O?)) then you should see a mismatch,
which can likely be attributed to the time required to gzip-9 and commit to
disk.

When there is plenty of free RAM or the ARC is full of flushable data, then
performance might be ok.  But the ARC can also contain writable 
(unflushable)
data which cannot be quickly drained because of IDE + gzip-9 + 800 MHz P3.
Look for a memory shortfall, which we would normally expect under such
conditions, and probably best observed via the scan rate column in vmstat.

You could change any one of the variables and get much better performance.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Please help me understand what you mean.  There is a big difference between 
being unacceptably slow and not working correctly, or between being 
unacceptably slow and having an implementation problem that causes it to 
eventually stop.  I expect it to be slow, but I expect it to work.  Are you 
saying that you found that it did not function correctly, or that it was too 
slow for your purposes?  Thanks for your insights!  (3x would be awesome).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 11:06 AM, Ray Clark [EMAIL PROTECTED]wrote:

 Please help me understand what you mean.  There is a big difference between
 being unacceptably slow and not working correctly, or between being
 unacceptably slow and having an implementation problem that causes it to
 eventually stop.  I expect it to be slow, but I expect it to work.  Are you
 saying that you found that it did not function correctly, or that it was too
 slow for your purposes?  Thanks for your insights!  (3x would be awesome).
 --



I expect it will go SO SLOW, that some function somewhere is eventually
going to fail/timeout.  That system is barely usable WITHOUT compression.  I
hope at the very least you're disabling every single unnecessary service
before doing any testing, especially the GUI.

ZFS uses ram, and plenty of it.  That's the nature of COW.  Enabling
realtime compression with an 800mhz p3?  Kiss any performance, however poor
it was, goodbye.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

For us, the machine became increasing unresponsive until it was 
indistinguishable from a complete lockup. A top that was running showed triple 
digit loads when it occasionally updated. We had to hit the reset button to get 
the machine back. While this might simply be unacceptably slow, I was not 
willing to wait long enough to find out. I did let it run for a few days, so I 
don't think I jumped the gun.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

The machine we tested with was a reasonably fast amd64 with 4Gigs of memory. I 
don't know if I had disabled the graphical login yet, but it was probably not 
even logged in via the console. Nothing else was running.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Mario Goebbels

 I expect it will go SO SLOW, that some function somewhere is eventually
 going to fail/timeout.  That system is barely usable WITHOUT
 compression.  I hope at the very least you're disabling every single
 unnecessary service before doing any testing, especially the GUI.
 
 ZFS uses ram, and plenty of it.  That's the nature of COW.  Enabling
 realtime compression with an 800mhz p3?  Kiss any performance, however
 poor it was, goodbye.

Regardless of that, gzip is still heavy on the system. Unbz2ing a 30MB
package (e.g. VirtualBox) in my packages ZFS filesystem with gzip
compression does affect interactivity quite a lot (i.e. 1 sec UI freezes
on transaction commit). This on an Intel Core2 Quad!

I do not notice these effects using lzjb.

People have been clamoring for lzo support more than a year ago. That
algorithm gets a decent compression rate not far from gzip and has a
very light footprint similar to lzjb. I think someone even wanted to
start a project for porting alternative compression methods to ZFS,
focusing on BWT tho, but nothing came from that (at least publicly).

Regards,
-mg



signature.asc
Description: OpenPGP digital signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Andrewk8,

Thanks for the information.  I have some questions.

[1] You said zfs uses large amounts of memory for its cache.   If I 
understand correctly, it is not that it uses large amounts, it is that it uses 
all memory available.  If this is an accurate picture, then it should be just 
as happy with 128MB as it is with 4GB.  The result would simply be less of a 
cache/buffer between clients and the physical disk.  It also seems like any 
congestion should show up fairly soon, not gradually over 12 hours!  Certainly 
limiting the ARC cache is something I will try, but it does not make sense to 
me.  Can you help me along?

[2] Regarding zfs vs. nfs, the reference talks about unneeded cache flushes 
dragging down throughput to NVRAM buffered disks.  The flushes were designed 
for physical rotating disks.  I am using physical, rotating disks, so it seems 
like the changes that they suggest for NVRAM buffered disks would not be 
appropriate for me, and that the default behavior designed for physical 
rotating disks would be what I want.  What am I missing?

[3] I also get ~4MB/second throughput NFS to disk with compression disabled, 
and 3MB/sec with gzip-9 for the first hour or two.  This is nothing to brag 
about and I had planned eventually to look into making it faster, but this 
pales compared to the 100KB/second it has degraded to over 12 hours.  Were your 
comments aimed at helping me get faster NFS throughput, or at addressing the 
immediate gross problem?

Thanks again for taking the time to help.
--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correctl

2008-11-29 Thread Rob Clark

Bump.

Some of the threads on this were last posted to over a year ago. I checked
6485689 and it is not fixed yet, is there any work being done in this area?

Thanks,
Rob

 There may be some work being done to fix this:
 
 zpool should support raidz of mirrors
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu
 g_id=6485689
 
 Discussed in this thread:
 Mirrored Raidz ( Posted: Oct 19, 2006 9:02 PM )
 http://opensolaris.org/jive/thread.jspa?threadID=15854
 tstart=0
 
 
 The suggested solution (by jone
 http://opensolaris.org/jive/thread.jspa?messageID=6627
 9 ) is:
 
 # zpool create a1pool raidz c0t0d0 c0t1d0 c0t2d0 ..
 # zpool create a2pool raidz c1t0d0 c1t1d0 c1t2d0 ..
 # zfs create -V a1pool/vol
 # zfs create -V a2pool/vol
 # zpool create mzdata mirror /dev/zvol/dsk/a1pool/vol
 /dev/zvol/dsk/a2pool/vol
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Ray Clark wrote:

 [1] You said zfs uses large amounts of memory for its cache.  If I 
 understand correctly, it is not that it uses large amounts, it is 
 that it uses all memory available.  If this is an accurate picture, 
 then it should be just as happy with 128MB as it is with 4GB.  The 
 result would simply be less of a cache/buffer between clients and 
 the physical disk.  It also seems like any congestion should show up

Memory is about 10,000 times faster than disk.  Why should it be just 
as happy with vastly less memory?

 [2] Regarding zfs vs. nfs, the reference talks about unneeded cache 
 flushes dragging down throughput to NVRAM buffered disks.  The 
 flushes were designed for physical rotating disks.  I am using 
 physical, rotating disks, so it seems like the changes that they 
 suggest for NVRAM buffered disks would not be appropriate for me, 
 and that the default behavior designed for physical rotating disks 
 would be what I want.  What am I missing?

Most NVRAM buffered disks do use caching.  The question is how 
reliably unflushed data will be stored after power loss.  FLASH 
devices definitely use a cache buffer since writes to FLASH are 
actually pretty slow (often slower than rotating media) and the FLASH 
blocksize is typically larger than the write blocksize.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

tcook,

You bring up a good point.  exponentially slow is very different from crashed, 
though they may have the same net effect.  Also that other factors like 
timeouts would come into play.

Regarding services, I am new to administering modern solaris, and that is on 
my learning curve.  My immediate need is simply a dumb file server.  3 or 4 
MB/sec would be adequate for my needs (marginal and at times annoying, but 
adequate).  If you expect it to be slow, it does work quite nicely without 
compression.  I have to use what I have.  In the meantime, perhaps my stress 
tests will also serve to expose issues.

Regarding the GUI, I don't know how to disable it.   There are no virtual 
consoles, and unlike older versions of SunOS and Solaris, it comes up in XDM 
and there is no [apparent] way to get a shell without running gnome.  I am sure 
that there is, but again, I come from the BSD/SunOS/Linux line, and have not 
learned the ins and outs of Nevada/Indiana yet.  I had hoped to put up a simple 
installation serving up disks and learns details later.  There are several 
60~90MB gnome apps evidently pre-loaded - even a 45MB clock!   Wow.  

Interestingly, the size fields under top add up to 950GB without getting to 
the bottom of the list, yet it shows NO swap being used, and 150MB free out of 
768 of RAM!  So how can the size of the existing processes exceed the size of 
the virtual memory in use by a factor of 2, and the size of total virtual 
memory by a factor of 1.5?  This is not the resident size - this is the total 
size!  

News Flash!  It has come out of it, and is moving along now at 2 MB/sec.  GUI 
is responsive with an occasional stutter.  It was going through a directory 
structure full of .mp3 and .flac files.Perhaps the gzip algorithm gets hung 
up in the data patterns they create.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Hakimian,

So you had a similar experience to what I had with an 800 MHz P3 and 768MB, all 
the way down to totally unresponsive.  Probably 5 or 6 x the CPU speed 
(assuming single core) and 5 x the memory.  This can only be a real design 
problem or bug, not just expected performance. 

Is there anyone from Sun who can advise me how to file this given the diffuse 
information?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 12:02 PM, Ray Clark [EMAIL PROTECTED]wrote:

 tcook,

 You bring up a good point.  exponentially slow is very different from
 crashed, though they may have the same net effect.  Also that other factors
 like timeouts would come into play.

 Regarding services, I am new to administering modern solaris, and that is
 on my learning curve.  My immediate need is simply a dumb file server.  3 or
 4 MB/sec would be adequate for my needs (marginal and at times annoying, but
 adequate).  If you expect it to be slow, it does work quite nicely without
 compression.  I have to use what I have.  In the meantime, perhaps my stress
 tests will also serve to expose issues.

 Regarding the GUI, I don't know how to disable it.   There are no virtual
 consoles, and unlike older versions of SunOS and Solaris, it comes up in XDM
 and there is no [apparent] way to get a shell without running gnome.  I am
 sure that there is, but again, I come from the BSD/SunOS/Linux line, and
 have not learned the ins and outs of Nevada/Indiana yet.  I had hoped to put
 up a simple installation serving up disks and learns details later.  There
 are several 60~90MB gnome apps evidently pre-loaded - even a 45MB clock!
 Wow.

 Interestingly, the size fields under top add up to 950GB without
 getting to the bottom of the list, yet it shows NO swap being used, and
 150MB free out of 768 of RAM!  So how can the size of the existing processes
 exceed the size of the virtual memory in use by a factor of 2, and the size
 of total virtual memory by a factor of 1.5?  This is not the resident size -
 this is the total size!

 News Flash!  It has come out of it, and is moving along now at 2 MB/sec.
  GUI is responsive with an occasional stutter.  It was going through a
 directory structure full of .mp3 and .flac files.Perhaps the gzip
 algorithm gets hung up in the data patterns they create.
 --


Assuming you're running opensolaris:
pfexec svcadm disable gdm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

 Regarding the GUI, I don't know how to disable it.
 There are no virtual consoles, and unlike older
 versions of SunOS and Solaris, it comes up in XDM
 and there is no [apparent] way to get a shell
 without running gnome.  I am sure that there is, but
 again, I come from the BSD/SunOS/Linux line, and
 have not learned the ins and outs of Nevada/Indiana
 yet. 

While we are veering a little of topic here, you can disable gdm (for indiana) 
or cde-login for Nevada to get to the console mode. In Nevada, you can also 
select console login from the gui and it will close it down for you to login at 
a text console.

 News Flash!  It has come out of it, and is moving
 along now at 2 MB/sec.  GUI is responsive with an
 occasional stutter.  It was going through a directory
 structure full of .mp3 and .flac files.Perhaps
 the gzip algorithm gets hung up in the data patterns
 they create.

For a few more details, the test I was doing was attempting to copy around 1TB 
of data via ssh and zfs send | zfs receive. My data was all pretty 
compressible, no jpegs, mp3s etc. Lots of source and log files.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

 So you had a similar experience to what I had with an
 800 MHz P3 and 768MB, all the way down to totally
 unresponsive.  Probably 5 or 6 x the CPU speed
 (assuming single core) and 5 x the memory.  This can
 only be a real design problem or bug, not just
 expected performance. 

Our test machine was actually a dual core. I was pretty surprised at the 
results.

Keep in mind, we did our tests around a year ago and things have changed. We 
have not re-visited gzip software compression since, so I do not know how 
things behave today.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Servo / mg,

I *have* noticed these effects on my system with lzjb, but they are minor.  
Things are a little grainy, not smooth.

Eliminating the algorithm that exposes the shortfall in how the compression is 
integrated into the system does not change the shortfall (See opensolaris.com 
bug 5483).  My low-end system resulted in my stress test being extra stressful. 
 Perhaps that is a good thing for exposing problems (Although frustrating for 
me)!

What I do not understand is why things get better and worse by orders of 
magnitude vs. being a relatively steady drain.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

relling,  Thank you, you gave me several things to look at.  The one thing that 
sticks out for me is that I don't see why you listed IDE.  Compared to all of 
the other factors, it is not the bottleneck by a long shot even if it is a slow 
transfer rate (33MB/Sec) by todays standards.  What don't I know?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

bfriesen,

Ultimately stuff flows in and stuff flows out.  Data is not reused, so a cache 
does not do anything for us.  As a buffer, it is simply a rubber band, a FIFO.  
So if the client wrote something real quick it would complete quickly.  But if 
it is writing an unlimited amount of data (like 200GB) without reading 
anything, it all simply flows through the buffer.  Whether the buffer is 128MB 
or 4GB, once the buffer is full the client will have to wait until something 
flows out to the disk.  So the system runs at the speed of the slowest 
component.  If accesses are done only once, caches don't help.  A buffer helps 
only to smooth out localized chunkyness.

Regarding the NVRAM discussion, what does this have to do with my situation 
with rotating magnetic disks with tiny 8MB embedded volatile caches?  The 
behavior of disks or storage subsystems with NVRAM are not pertinent to my 
situation!  Or do I have something backwards?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 12:30 PM, Ray Clark [EMAIL PROTECTED]wrote:

 relling,  Thank you, you gave me several things to look at.  The one thing
 that sticks out for me is that I don't see why you listed IDE.  Compared to
 all of the other factors, it is not the bottleneck by a long shot even if it
 is a slow transfer rate (33MB/Sec) by todays standards.  What don't I know?



Slow transfers over NFS != slow transfers to and from the disk.  Have you
done a zpool iostat to see what kind of traffic is actually going to and
from disk?  If you've got both drives hanging off a single IDE bus, that can
further hurt performance.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Miles Nordin

 t == Tim  [EMAIL PROTECTED] writes:
 a == andrew  [EMAIL PROTECTED] writes:
 re == Richard Elling [EMAIL PROTECTED] writes:

 t ZFS uses ram, and plenty of it.  That's the nature of COW.

 a Running ZFS stress tests on a system with only
 a 768MB of memory is not a good idea since ZFS uses large
 a amounts of memory for its cache.

He's watching for memory pressure with top and vmstat and not seeing
any.

re 800 MHz P3 + 768 MBytes of RAM + IDE + ZFS + gzip-9 + NFS =
re pain I'm not sure I could dream of a worse combination for
re performance.  I'm actually surprised it takes 12 hours to
re crater

3MB/sec is indeed the expected lousy performance.  He can't even keep
fastethernet full.  But he's not complaining about that part of the
test!  He's complaining about later when it goes down to 100kB/s.

Ray, is there anything in dmesg or 'zpool status'?  Maybe one of the
disks is going bad?  but since Karl also says ``for us, the machine
became increasing unresponsive until it was indistinguishable from a
complete lockup,'' yeah it sounds like a zfs-gzip bug to me, too.
If you agree maybe one of you should file it?


pgpbiACod1gDy.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Ray Clark wrote:

 through the buffer.  Whether the buffer is 128MB or 4GB, once the 
 buffer is full the client will have to wait until something flows 
 out to the disk.  So the system runs at the speed of the slowest 
 component.  If accesses are done only once, caches don't help.  A 
 buffer helps only to smooth out localized chunkyness.

You are wrong in assuming that a write only situation does not 
benefit immensely from caching.  ZFS tries to write data 128K at a 
time.  Without caching, the user data aggregation into 128K does not 
work well when NFS provides only 8-32K at a time.  ZFS likes to buffer 
up a number of blocks (if allowed) and then write them to disk in 
optimum order. Also, the filesystem metadata and structures need to be 
cached or else ZFS needs to continually go to disk in order to 
re-obtain this information that it otherwise would already have in 
RAM.  Since disk access is 10,000 times slower than RAM, having to go 
to the disk even one more time is a *huge* performance loss.

Since you have very little RAM, it is is quite likely that your kernel 
data memory is becoming fragmented so that acquiring and freeing 
memory is less optimum than normal.

Regardless, ZFS is a very memory-oriented filesystem implementation 
which requires more RAM than most other filesystems.  Good old UFS 
uses less RAM.

 Regarding the NVRAM discussion, what does this have to do with my 
 situation with rotating magnetic disks with tiny 8MB embedded 
 volatile caches?  The behavior of disks or storage subsystems with 
 NVRAM are not pertinent to my situation!  Or do I have something 
 backwards? -- This message posted from opensolaris.org

I am not sure why you brought up NVRAM if it was not pertinent to your 
situation. :-)

Bob
 
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

zpool status -v says No known data errors for both the root rpool (separate 
non-mirrored 80GB drive) and my pool (mirrored 200GB drives). 

It is getting very marginal (sluggish/unresponsive) again.  Interesting, top 
shows 20~30% cpu idle with most of remainder kernel.  I wonder if everything is 
counted?  Linux top definitely does not show everything... I suspected at one 
point that it did not count time in interrupt servicing.  Does Solaris?  (Off 
topic I guess).

Free memory runnint 150~175MB.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

bfriesen,

Andrew brought up NVRAM by refering me to the following link:

Also, NFS to ZFS filesystems will run slowly under certain conditions 
-including with the default configuration. See this link for more information:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes

This section discusses exclusively how ZFS cache flushes, which can be 
triggered by NFS requests or policies, interacts with NVRAMs unproductively, 
and how the flushes can be controlled to improve performance.  Since the NVRAMs 
are Non Volatile, the flushes are not necessary to preserve data integrity 
anyway.

Not worth tracing the chain to see how you and I got tangled in this.  One of 
us make an inappropriate association or didn't follow a sub-thread.  Sorry for 
the confusion.
---

Regarding the cache, right now there is 150MB of free memory not being used by 
ANYBODY, so I don't think there is a shortage of memory for the ZFS cache... 
and 150MB  128K, or even a whole slew of 128K blocks.  Also, the yellow light 
that blinks when the disk is accessed is off 90% of the time minimum.  When it 
was almost frozen, the disk almost never blinked (one real quick one every 
minute or two!)  Nothing is accessing the disk to re-obtain anything!  
Otherwise, yes you would have a good point about re-fetching various file 
structure stuff.  (Good thought).

Fragmentation of kernel memory would be a good one.  Wouldn't it get fragmented 
after 6 months or so of everyday use anyway?It must de-frag itself somehow. 
 You bring up an excellent observation.  When it was super-slow, free RAM was 
down to 15MB, although that still seems large compared to 32K or 128K blocks.  
Remember, the system is not doing ANYTHING else.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

tcook, zpool iostat shows 1.15MB/sec.  Is this averaged since boot, or a recent 
running average?

The two drives ARE on a single IDE cable, however again, with a 33MB/sec cable 
rate and 8 or 16MB cache in the disk, 3 or 4 MB/sec should be able to 
time-share the cable without a significant impact on throughput.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

 Interestingly, the size fields under top add up to 950GB without getting 
 to the bottom of the list, yet it
 shows NO swap being used, and 150MB free out of 768 of RAM!  So how can the 
 size of the existing processes
 exceed the size of the virtual memory in use by a factor of 2, and the size 
 of total virtual memory by a factor of 1.5?
 This is not the resident size - this is the total size!

Size is how much address space the process has allocated. Part of that
is executables and shared libraries (they are backed by the file, not
by swap). A large portion of that is shared, the same memory is used
by many processes. Processes can also allocate shared memory by other
means.


Memory is not a big problem for ZFS, address space is. You may have to
give the kernel more address space on 32-bit CPUs.

eeprom kernelbase=0x8000

This will reduce the usable address space of user processes though.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 1:33 PM, Ray Clark [EMAIL PROTECTED]wrote:

 tcook, zpool iostat shows 1.15MB/sec.  Is this averaged since boot, or a
 recent running average?

 The two drives ARE on a single IDE cable, however again, with a 33MB/sec
 cable rate and 8 or 16MB cache in the disk, 3 or 4 MB/sec should be able to
 time-share the cable without a significant impact on throughput.
 --


The *rated* was theoretical, and you couldn't ever achieve anything remotely
close to it.  Sticking a second drive out there makes it even worse.  I'd at
the very least dedicate a channel to each disk and just disconnect the cdrom
drive if you have one in the system, or spend 2$ on ebay for a pci add-on
controller.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Ray Clark wrote:

 Regarding the cache, right now there is 150MB of free memory not 
 being used by ANYBODY, so I don't think there is a shortage of 
 memory for the ZFS cache... and 150MB  128K, or even a whole slew

To be more clear, memory which is claimed to be free is often actually 
still used for caching.  Even if the virtual memory system has not 
mapped a VM page to a process, if a minor page fault occurs (due to an 
access), the data in that seemingly unused page may still be 
immediately switched in and used because the VM system tracks where 
the current content of that page came from.  This is primarily the 
case for memory-mapped regions such as ordinary files, shared 
libraries, executable text, or even a video frame buffer.  This is 
pretty much normal operation since when new processes are started, the 
VM maps the existing pages that the new process requires into its 
address space.

It is pretty common for Unix systems to lie about free memory and use 
that free memory for the filesystem cache with the expectation that 
this free memory can be freed up for use fast enough that no one 
really notices.

If the critical working set of VM pages is larger than available 
memory, then the system will become exceedingly slow.  This is 
indicated by a substantial amount of major page fault activity. 
Since disk is 10,000 times slower than RAM, major page faults can 
really slow things down dramatically.  Imagine what happens if ZFS or 
an often-accessed part of the kernel is not able to fit in available 
RAM.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 2:37 PM, Bob Friesenhahn 
[EMAIL PROTECTED] wrote:


 To be more clear, memory which is claimed to be free is often actually
 still used for caching.  Even if the virtual memory system has not
 mapped a VM page to a process, if a minor page fault occurs (due to an
 access), the data in that seemingly unused page may still be
 immediately switched in and used because the VM system tracks where
 the current content of that page came from.  This is primarily the
 case for memory-mapped regions such as ordinary files, shared
 libraries, executable text, or even a video frame buffer.  This is
 pretty much normal operation since when new processes are started, the
 VM maps the existing pages that the new process requires into its
 address space.

 It is pretty common for Unix systems to lie about free memory and use
 that free memory for the filesystem cache with the expectation that
 this free memory can be freed up for use fast enough that no one
 really notices.

 If the critical working set of VM pages is larger than available
 memory, then the system will become exceedingly slow.  This is
 indicated by a substantial amount of major page fault activity.
 Since disk is 10,000 times slower than RAM, major page faults can
 really slow things down dramatically.  Imagine what happens if ZFS or
 an often-accessed part of the kernel is not able to fit in available
 RAM.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/



So as a follow on to this, I guess my question is: Can you shut down the
linux box and throw the ram from it into this box and see what kind of
performance you are getting?  I believe you'll see far, far better results
with 1.5G in the system.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

 If the critical working set of VM pages is larger than available
 memory, then the system will become exceedingly slow.  This is
 indicated by a substantial amount of major page fault activity.
 Since disk is 10,000 times slower than RAM, major page faults can
 really slow things down dramatically.  Imagine what happens if ZFS or
 an often-accessed part of the kernel is not able to fit in available
 RAM.

ZFS and most of the kernel is locked in physical memory. Swap is never
used for ZFS.

In this case (NFS) everything is done in kernel. working set can not
be larger than available memory.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Ross

From personal experience, 3-6MB/s is about what you should expect for NFS if 
you're not using any kind of nvram write cache.  With write cache, it's easy to 
pretty much saturate 100MB/s ethernet.

And as others have said, ZFS needs RAM and plenty of it.  I'd have thought 2GB 
would be a sensible minimum.  For our ZFS server we bought 8GB and that was 
only £200 for full ECC Registered memory, and we're not even using compression 
here.

I'm no solaris expert, but your symptoms sound like a classic case of running 
low on memory.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Pantzer5:  Thanks for the top  size explanation.

Re: eeprom kernelbase=0x8000
So this makes the kernel load at the 2G mark?  What is the default, something 
like C00... for 3G?

Are PCI and AGP space in there too, such that kernel space is 4G - (kernelbase 
+ PCI_Size + AGP_Size) ?  (Shot in the dark)?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

I get 15 to/from (I don't remember which) Linux LVM to a USB disk.  It does 
seem to saturate there.  I assume due to interrupt service time between 
transfers.  I appreciate the contention for the IDE, but in a 3MB/Sec system, I 
don't think that it is my bottleneck, much less in a 100KByte/second system.  
Do you disagree?

I *have* a PCI add on card, which is unplugged to make the system dead-simple 
until I figure out why it does not function!  

As a side note, most such controllers report themselves as a RAID card or some 
such, and Solaris will refuse to talk to them!  The only one I could find that 
would work was an IT8212 with an out of production flashchip that ITE supported 
an alternate BIOS for.  I went through 3 or 4 different ones before finding it! 
 You seem to say it is easy to buy a PCI add-in and have it work under Solaris 
- what card are you thinking of, and where did you find it?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Thanks for the info about Free Memory.  That also links to another sub-thread 
regarding kernel memory space.If disk files are mapped into memory space, 
that would be a reason that the kernel could make use of address space larger 
that virtual memory (RAM+Swap).

Regarding showing stuff as Free when it is tracked and may be used, I would 
assume though that it would be abandoned if the memory is needed.  Wouldn't the 
fact that it was sitting Free indicate that nothing needed memory?

I also understand Working set as a page replacement algorithm, but that would 
make the disk light blink!

These are all good things, I just don't see how they apply to the current 
situation, at least given the apparent information from vmstat and top!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

If I shut down the Linux box, I won't have a host to send stuff to the Solaris 
box!

Also, the Solaris box can only support 1024MB.  I did have 1024MB in it at one 
time and had essentially the same performance.  I might note that I had the 
same problem with 1024MB, albiet with TOP eating memory (opensolaris.com bug 
5482) (up to 417MB at the highest observation).  No wonder it crashed.  Anyway, 
1024MB is not Far, Far better, it turns out there was no noticeable 
difference when I dropped to 768.

Also note that Hikimiam had identical symptoms with a dual core 64 bit AMD and 
4G of RAM.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 22:19, Ray Clark [EMAIL PROTECTED] wrote:
 Pantzer5:  Thanks for the top  size explanation.

 Re: eeprom kernelbase=0x8000
 So this makes the kernel load at the 2G mark?  What is the default, something 
 like C00... for 3G?

Yes on both questions (i have not checked the hex conversions).

This might not be your problem, but it is easy to test. My symptom was
that zpool scrub made the computer go slower and slower and finally
just stop. But this was a long time ago so this might not be a problem
today.


 Are PCI and AGP space in there too, such that kernel space is 4G - 
 (kernelbase + PCI_Size + AGP_Size) ?  (Shot in the dark)?

No.

This is virtual memory.

The big difference in memory usage between UFS and ZFS is that ZFS
will have all data it caches mapped in the kernel address space. UFS
leaves data unmapped.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, 29 Nov 2008, Mattias Pantzare wrote:

 The big difference in memory usage between UFS and ZFS is that ZFS
 will have all data it caches mapped in the kernel address space. UFS
 leaves data unmapped.

Another big difference I have heard about is that Solaris 10 on x86 
only uses something like 64MB of filesystem caching by default for 
UFS.  This is different than SPARC where the caching is allowed to 
grow.  I am not sure if OpenSolaris maintains this arbitrary limit for 
x86.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 3:26 PM, Ray Clark [EMAIL PROTECTED]wrote:

 I get 15 to/from (I don't remember which) Linux LVM to a USB disk.  It does
 seem to saturate there.  I assume due to interrupt service time between
 transfers.  I appreciate the contention for the IDE, but in a 3MB/Sec
 system, I don't think that it is my bottleneck, much less in a
 100KByte/second system.  Do you disagree?

 I *have* a PCI add on card, which is unplugged to make the system
 dead-simple until I figure out why it does not function!

 As a side note, most such controllers report themselves as a RAID card or
 some such, and Solaris will refuse to talk to them!  The only one I could
 find that would work was an IT8212 with an out of production flashchip that
 ITE supported an alternate BIOS for.  I went through 3 or 4 different ones
 before finding it!  You seem to say it is easy to buy a PCI add-in and have
 it work under Solaris - what card are you thinking of, and where did you
 find it?


Every one of the promise IDE (non-raid) cards work just fine.  Worst case
scenario you have to add the device id for the driver to load properly.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Problem importing degraded Pool

2008-11-29 Thread Philipp Haußleiter

i make the second disk of the pool working again...

any change to restore the metadata of the pool?

tried to figure something out from the head of the device (dd the first 100megs 
to a file), but found nothing helpful :-/.

tried a 
zpool -D tank

what information should i give?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 3:47 PM, Ray Clark [EMAIL PROTECTED]wrote:

 If I shut down the Linux box, I won't have a host to send stuff to the
 Solaris box!

 Also, the Solaris box can only support 1024MB.  I did have 1024MB in it at
 one time and had essentially the same performance.  I might note that I had
 the same problem with 1024MB, albiet with TOP eating memory (
 opensolaris.com bug 5482) (up to 417MB at the highest observation).  No
 wonder it crashed.  Anyway, 1024MB is not Far, Far better, it turns out
 there was no noticeable difference when I dropped to 768.

 Also note that Hikimiam had identical symptoms with a dual core 64 bit AMD
 and 4G of RAM.



He never said they were identical symptoms, he said he had a somewhat
similar experience.  Different kernel, different build of Solaris, different
CPU's.  You can't even attempt to say you were hitting the same issue with
as little information as has been provided.

I've got gzip9 running on a dataset right now with a nearly identical setup
to what he had without issue.  I'd say let's stop jumping to conclusions.

As for your 1024 not making a difference, did you turn off the GUI in that
instance and all unnecessary services?  Search the discussion lists and
you'll find plenty of people who had no difference increasing ram until
they cross the threshold of giving zfs what it needs vs. not for their
workload.  Claiming it's only 3MB/sec and downplaying all the bad design
decisions you've made so far isn't helping the situation at all.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sun, Nov 30, 2008 at 00:04, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Sat, 29 Nov 2008, Mattias Pantzare wrote:

 The big difference in memory usage between UFS and ZFS is that ZFS
 will have all data it caches mapped in the kernel address space. UFS
 leaves data unmapped.

 Another big difference I have heard about is that Solaris 10 on x86 only
 uses something like 64MB of filesystem caching by default for UFS.  This is
 different than SPARC where the caching is allowed to grow.  I am not sure if
 OpenSolaris maintains this arbitrary limit for x86.

That is not true. I doubt that any Solaris version had that type of limit.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

 I've got gzip9 running on a dataset right now
 with a nearly identical setup to what he had without
 issue. I'd say let's stop jumping to
 conclusions.

I curious if to know if you have ever tried dumping many gigs (100s at one 
time) to your setup. Mine seemed fine when writing some files. It never worked 
when trying to dump large amounts of data.

If you have successfully dumped many gigs, I'm very interested to know more 
about your setup and how it differs from what I was testing with.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sun, 30 Nov 2008, Mattias Pantzare wrote:

 Another big difference I have heard about is that Solaris 10 on x86 only
 uses something like 64MB of filesystem caching by default for UFS.  This is
 different than SPARC where the caching is allowed to grow.  I am not sure if
 OpenSolaris maintains this arbitrary limit for x86.

 That is not true. I doubt that any Solaris version had that type of limit.

What is what I heard Jim Mauro tell us.  I recall feeling a bit 
disturbed when I heard it.  If it is true, perhaps it applies only to 
x86 32 bits, which has obvious memory restrictions.  I recall that he 
showed this parameter via DTrace. However on my Solaris 10U5 AMD64 
system I see this limit:

429293568   maximum memory allowed in buffer cache (bufhwm)

which seems much higher than 64MB.  The Solaris Tuning And Tools 
book says that by default the buffer cache is allowed to grow to 2% of 
physical memory.

Obtain the value via

   sysdef | grep bufhwm

My 32-bit Belenix system running under VirtualBox with 2GB allocated 
to the VM reports a value of 41,762,816.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Tim,

I don't think we would really disagree if we were in the same room.  I think in 
the process of the threaded communication that a few things got overlooked, or 
the wrong thing attributed.

You are right that there are many differences.  Some of them are:

- Tests done a year ago, I expect the kernel has had many changes.
- He was moving data via ssh from zfs sed into zfs receive as opposed to my 
file operations over NFS.
- My problem seems to occur on incompressible data.  His was all very 
compressible.
- He had 5x the CPU x2 and 5x the memory.

Yes, I jumped on what I saw as common symptoms, in hakimian's words: becoming 
increasing unresponsive until it was indistinguishable from a complete lockup. 
 This is similar to my description of After about 12 hours, the throughput has 
slowed to a crawl.  The Solaris machine takes a minute or more to respond to 
every character typed... and disk throughput is in the range of 100K 
bytes/second.

I was the one who judged these symptoms to be essentially identical, I did not 
say that Hakimian made that statement.  I also pointed out that he was seeing 
these identical symptoms in a very different environment, which would be your 
point.

Regarding my 768 vs. 1024, there were no changes other than the change in 
memory.  So whatever else is true, the system had 33% more memory to work with 
minimum.  Given that probably a few hundred Meg is needed for a just booted, 
idle system, the effective percentage increase in memory for zfs to work with 
is in reality higher.  I may not have given in 4GB, but I gave it substantially 
more than it had.  It should behave substantially differently if memory is the 
limiting factor.  Just because memory is thin does not make it the limiting 
factor.  I believe the indications by top and vmstat that there is free memory 
(available to be reallocated) that nothing is gobbling up also suggests that 
memory is not the limiting factor.

Regarding my design decisions, I did not make bad design decisions.  I have 
what I have.  I know it is substandard.

Also you seem to be reacting as though I was complaining about the 3MB/Sec 
throughput.  I believe I stated that I understand that there are many 
sub-optimal aspects of this system.  However I don't believe any of them 
explain it running fine for a few hours, then slowing down by a factor of 30, 
for a few hours, then going back up.  I am trying to understand and resolve the 
dysfunctional behavior, not the poor but plausible throughput.  In any system 
there are many possible bottlenecks, most of which are probably suboptimal, but 
it is not productive to focus on the 15MB/Sec links in the chain when you have 
a 100KB/Sec problem.  Increasing the 15MB/Sec to 66 or 132MB/Sec is just not 
going to have a large effect!

I think/hope I have reconciled our apparent differences.  If not, so be it.  I 
do appreciate your suggestions and insights, and they are not lost on me.

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 6:31 PM, Ray Clark [EMAIL PROTECTED]wrote:

 Tim,

 I don't think we would really disagree if we were in the same room.  I
 think in the process of the threaded communication that a few things got
 overlooked, or the wrong thing attributed.

 You are right that there are many differences.  Some of them are:

 - Tests done a year ago, I expect the kernel has had many changes.
 - He was moving data via ssh from zfs sed into zfs receive as opposed to my
 file operations over NFS.
 - My problem seems to occur on incompressible data.  His was all very
 compressible.
 - He had 5x the CPU x2 and 5x the memory.

 Yes, I jumped on what I saw as common symptoms, in hakimian's words:
 becoming increasing unresponsive until it was indistinguishable from a
 complete lockup.  This is similar to my description of After about 12
 hours, the throughput has slowed to a crawl.  The Solaris machine takes a
 minute or more to respond to every character typed... and disk throughput
 is in the range of 100K bytes/second.

 I was the one who judged these symptoms to be essentially identical, I did
 not say that Hakimian made that statement.  I also pointed out that he was
 seeing these identical symptoms in a very different environment, which
 would be your point.

 Regarding my 768 vs. 1024, there were no changes other than the change in
 memory.  So whatever else is true, the system had 33% more memory to work
 with minimum.  Given that probably a few hundred Meg is needed for a just
 booted, idle system, the effective percentage increase in memory for zfs to
 work with is in reality higher.  I may not have given in 4GB, but I gave it
 substantially more than it had.  It should behave substantially differently
 if memory is the limiting factor.  Just because memory is thin does not make
 it the limiting factor.  I believe the indications by top and vmstat that
 there is free memory (available to be reallocated) that nothing is gobbling
 up also suggests that memory is not the limiting factor.

 Regarding my design decisions, I did not make bad design decisions.  I have
 what I have.  I know it is substandard.

 Also you seem to be reacting as though I was complaining about the 3MB/Sec
 throughput.  I believe I stated that I understand that there are many
 sub-optimal aspects of this system.  However I don't believe any of them
 explain it running fine for a few hours, then slowing down by a factor of
 30, for a few hours, then going back up.  I am trying to understand and
 resolve the dysfunctional behavior, not the poor but plausible throughput.
  In any system there are many possible bottlenecks, most of which are
 probably suboptimal, but it is not productive to focus on the 15MB/Sec links
 in the chain when you have a 100KB/Sec problem.  Increasing the 15MB/Sec to
 66 or 132MB/Sec is just not going to have a large effect!

 I think/hope I have reconciled our apparent differences.  If not, so be it.
  I do appreciate your suggestions and insights, and they are not lost on me.

 --Ray
 --


My point is you're not looking at the bigger picture.  Well this small
portion is working some of the time so it's ok, and this small portion is
working some of the time so it's ok, but when I throw it all together
something isn't quite right so it must be the software.

Case in point on the memory front:
http://www.opensolaris.org/jive/thread.jspa?messageID=309878

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Now that it has come out of its slump, I can watch what it is working on vs. 
response.  Whenever it is going through a folder with alot of incompressible 
stuff, it gets worse.  .mp3 and .flac are horrible.  .iso images and .gz and 
.zip files are bad.  It is sinking again, but still works.  It depends on the 
data.

In hindsight, and with the help of this thread, I think I understand.  Yes, it 
is a hypothesis, not fact.  Bug 5483 and the reference in there to bug 6586537 
explains how the zfs compression task blocks out userland tasks (and probably 
all other kernel tasks) by running at the highest kernel priority.  This is a 
fact I take it.  The hypothesis part would be that certain data characteristics 
(probably higher entropy) results in very tedious, laborious behavior by the 
gzip algorithm, or at least the implementation in zfs.  So NOTHING else runs 
unless the gzip algorithm has nothing to do, and it takes FOREVER to do its 
thing on certain types of data.  

All of the free memory discussions will help me to understand the system and 
how to get more information, but I don't see any of the evidence suggesting 
that lack of RAM was the reason for throughput to drop to 100KB/Sec.  No doubt 
if I address all of these things I can get throughput up from the 3~4 that I 
was seeing with compression disabled.

My plan right now is to let it finish (It has someplace around 50GB to go) just 
to see it do so without crashing.  I may then do a diff -r to see if the 
decompression has the same behavior (Glutton for punishment).  Then I will 
forget compression and do the exercise without.  Not sure how I will finally be 
comfortable to commit all my bits!

This understanding gives me hope that the system will be robust, that my heavy 
load is not exposing a critical section of code.  Rather it is a problem that 
causes dysfunctional though still correct behavior.  And I know how to avoid it.

If you have more comments, or especially if you think I reached the wrong 
conclusion, please do post it.  I will post my continuing results.

Thank you ALL for giving me so much attention and help.  It is good to not be 
alone!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sun, Nov 30, 2008 at 01:10, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Sun, 30 Nov 2008, Mattias Pantzare wrote:

 Another big difference I have heard about is that Solaris 10 on x86 only
 uses something like 64MB of filesystem caching by default for UFS.  This
 is
 different than SPARC where the caching is allowed to grow.  I am not sure
 if
 OpenSolaris maintains this arbitrary limit for x86.

 That is not true. I doubt that any Solaris version had that type of limit.

 What is what I heard Jim Mauro tell us.  I recall feeling a bit disturbed
 when I heard it.  If it is true, perhaps it applies only to x86 32 bits,
 which has obvious memory restrictions.  I recall that he showed this
 parameter via DTrace. However on my Solaris 10U5 AMD64 system I see this
 limit:

 429293568   maximum memory allowed in buffer cache (bufhwm)

 which seems much higher than 64MB.  The Solaris Tuning And Tools book says
 that by default the buffer cache is allowed to grow to 2% of physical
 memory.

 Obtain the value via

  sysdef | grep bufhwm

 My 32-bit Belenix system running under VirtualBox with 2GB allocated to the
 VM reports a value of 41,762,816.

That is only a small part of the cache used for file system metadata.
File data caching  is integrated in the normal memory management.

http://docs.sun.com/app/docs/doc/817-0404/chapter2-37?a=view
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Jeff Bonwick

 If you have more comments, or especially if you think I reached the wrong
 conclusion, please do post it.  I will post my continuing results.

I think your conclusions are correct.  The main thing you're seeing is
the combination of gzip-9 being incredibly CPU-intensive with our I/O
pipeline allowing too much of it to be scheduled in parallel.  The latter
is a bug we will fix; the former is the nature of the gzip algorithm.

One other thing you may encounter from time to time is slowdowns due to
kernel VA fragmentation.  The CPU you're using is 32-bit, so you're
running a 32-bit kernel, which has very little KVA.  This tends to be
more of a problem with big-memory machines, however -- e.g. a system
with 8GB running a 32-bit kernel.  With 768MB, you'll probably be OK,
but it's something to be aware of on any 32-bit system.  You can tell
if this is affecting you by looking for kernel threads stuck waiting
to allocate a virtual address:

# echo '::walk thread | ::findstack -v' | mdb -k | grep vmem_xalloc

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Ian Collins

Ray Clark wrote:
 Now that it has come out of its slump, I can watch what it is working on vs. 
 response.  Whenever it is going through a folder with alot of incompressible 
 stuff, it gets worse.  .mp3 and .flac are horrible.  .iso images and .gz and 
 .zip files are bad.  It is sinking again, but still works.  It depends on the 
 data.

   
What did you expect?  A 3GHz Opteron core takes about a minutes to
attempt to compress a 1GB .mkv file.  So your P3 would probably take
between 5 and 10 minutes.  Now move that to the kernel and your system
will crawl.  High gzip compressions are only really feasible on fast
multi-core systems (the compression is threaded).

-- 
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Tim,

I am trying to look at the whole picture.  I don't see any unwarranted 
assumptions, although I know so little about Solaris and I extrapolated all 
over the place based on general knowlege, sort of draping it around and over 
what you all said.  I see quite a few misconceptions in the thread you pointed 
me to based on lack of understanding of modern systems, both clear ones and 
questionable ones.  I suppose I probably have my share of them in here.  Please 
refute my defenses as appropriate.  

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Jeff,

Thank you for weighing in, as well as for the additional insight.  It is good 
to have confidence that I am on the right track.  

I like your system ... alot.  Got work to do for it to be as slick as a recent 
Linux distribution, but you are working on a solid core and just need some 
touch-up work.  Thanks.  Hang in there.  

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

Ref relling's 12:00 post:
My system does not have arcstat or nicstat.  But it is the B2 distribution.  
Would I expect these to be in the final distribution, or where do these come 
from?
Thanks.
--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

On Sat, Nov 29, 2008 at 8:29 PM, Ray Clark [EMAIL PROTECTED]wrote:

 Ref relling's 12:00 post:
 My system does not have arcstat or nicstat.  But it is the B2 distribution.
  Would I expect these to be in the final distribution, or where do these
 come from?
 Thanks.
 --Ray
 --



I don't believe either are bundled.  Search google for arcstat.pl and
nicstat.pl

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Jim Mauro


For the record, what I said was the on x64, the default size of the UFS
segmap segment, which is the L1 cache for UFS reads and writes, is
64MB. Caches pages will be moved to a cache list in memory if segmap
fills up.

The message I was trying to convey is that the default size of the UFS 
segmap
on x64 (64MB) is generally too small if UFS file IO is a component of your
workload (and 64MB is the default for 64-bit x64).

Check out;
http://www.solarisinternals.com/wiki/index.php/Segmap_tuning

For increasing the size. Note please do NOT use the /etc/system method
of increasing segmapsize on x64 - it will panic your system.

None of this has anything to do with ZFS, which uses a completely different
mechanism for caching (the ZFS ARC).

Thanks,
/jim



 What is what I heard Jim Mauro tell us.  I recall feeling a bit 
 disturbed when I heard it.  If it is true, perhaps it applies only to 
 x86 32 bits, which has obvious memory restrictions.  I recall that he 
 showed this parameter via DTrace. However on my Solaris 10U5 AMD64 
 system I see this limit:

 429293568   maximum memory allowed in buffer cache (bufhwm)

 which seems much higher than 64MB.  The Solaris Tuning And Tools 
 book says that by default the buffer cache is allowed to grow to 2% of 
 physical memory.

 Obtain the value via

sysdef | grep bufhwm

 My 32-bit Belenix system running under VirtualBox with 2GB allocated 
 to the VM reports a value of 41,762,816.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression