[zfs-discuss] Severe Problems on ZFS server

2010-04-22 Thread Andreas Höschler

Hi all

we are encountering severe problems on our X4240 (64GB, 16 disks) 
running Solaris 10 and ZFS. From time to time (5-6 times a day)


• FrontBase hangs or crashes
• VBox virtual machine do hang
• Other applications show rubber effect (white screen) while moving the 
windows


I have been tearing my hair off where this comes from. Could be 
software bugs, but in all these applications from different vendors? 
Could be a Solaris bug or bad memory!? Rather unlikely. I just was hit 
by a thought. On another machine with 6GB RAM I fired up a second 
virtual machine (vbox). This drove the machine almost to a halt. The 
second vbox instance never came up. I finally saw a panel raised by the 
first vbox instance that there was not enough memory available (non 
severe vbox error) and the virtual machine was halted!! After killing 
the process of the second vbox I could simply press resume and the 
first vbox machine continued to work properly.


OK, now this starts to make sense. My idea is that ZFS is 
blocking/allocating all of the available system memory. When an app 
(FrontBase, VBox,...) is started and suddenly requests larger chunks of 
memory from the system, the malloc calls fail because ZFS has allocated 
all the memory or because the system cannot release the memory quickly 
enough and make it available fo rthe requesting apps, so the malloc 
fails or times out or whatever which is not catched in the apps and 
makes them hang or crash or stall for minutes. Does this make any 
sense? Any similar experiences?


What can I do about that?

Thanks a lot,

 Andreas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Severe Problems on ZFS server

2010-04-22 Thread Andreas Höschler

Hi all,

we are encountering severe problems on our X4240 (64GB, 16 disks) 
running Solaris 10 and ZFS. From time to time (5-6 times a day)


• FrontBase hangs or crashes
• VBox virtual machine do hang
• Other applications show rubber effect (white screen) while moving 
the windows


I have been tearing my hair off where this comes from. Could be 
software bugs, but in all these applications from different vendors? 
Could be a Solaris bug or bad memory!? Rather unlikely. I just was hit 
by a thought. On another machine with 6GB RAM I fired up a second 
virtual machine (vbox). This drove the machine almost to a halt. The 
second vbox instance never came up. I finally saw a panel raised by 
the first vbox instance that there was not enough memory available 
(non severe vbox error) and the virtual machine was halted!! After 
killing the process of the second vbox I could simply press resume and 
the first vbox machine continued to work properly.


OK, now this starts to make sense. My idea is that ZFS is 
blocking/allocating all of the available system memory. When an app 
(FrontBase, VBox,...) is started and suddenly requests larger chunks 
of memory from the system, the malloc calls fail because ZFS has 
allocated all the memory or because the system cannot release the 
memory quickly enough and make it available fo rthe requesting apps, 
so the malloc fails or times out or whatever which is not catched in 
the apps and makes them hang or crash or stall for minutes. Does this 
make any sense? Any similar experiences?




Followup to my owm message. On the X4240 I have

set zfs:zfs_arc_max = 0x78000

in /etc/system. Would it be a good idea to reduce that to say

set zfs:zfs_arc_max = 0x28000

?? Hints greatly appreciated!

Thanks,

 Andreas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Severe Problems on ZFS server

2010-04-22 Thread Bob Friesenhahn

On Thu, 22 Apr 2010, Andreas Höschler wrote:

we are encountering severe problems on our X4240 (64GB, 16 disks) running 
Solaris 10 and ZFS. From time to time (5-6 times a day)


• FrontBase hangs or crashes
• VBox virtual machine do hang
• Other applications show rubber effect (white screen) while moving the 
windows


I have been tearing my hair off where this comes from. Could be software 
bugs, but in all these applications from different vendors? Could be a 
Solaris bug or bad memory!? Rather unlikely. I just was hit by a thought. On


I see that no one has responded yet.  You are jumping to conclusions 
that zfs and its memory usage is somehow responsible for the problem 
you are seeing.


The problem could be due to a faulty/failing disk, a poor connection 
with a disk, or some other hardware issue.  A failing disk can easily 
make the system pause temporarily like that.


As root you can run '/usr/sbin/fmdump -ef' to see all the fault events 
as they are reported.  Be sure to execute '/usr/sbin/fmadm faulty' to 
see if a fault has already been identified on your system.  Also 
execute '/usr/bin/iostat -xe' to see if there are errors reported 
against some of your disks, or if some are reported as being 
abnormally slow.


You might also want to verify that your Solaris 10 is current.  I 
notice that you did not identify what Solaris 10 you are using.


another machine with 6GB RAM I fired up a second virtual machine (vbox). This 
drove the machine almost to a halt. The second vbox instance never came up. I 
finally saw a panel raised by the first vbox instance that there was not 
enough memory available (non severe vbox error) and the virtual machine was 
halted!! After killing the process of the second vbox I could simply press 
resume and the first vbox machine continued to work properly.


Maybe you should read the VirtualBox documentation.  There is a note 
about Solaris 10 and about how VirtualBox may fail if it can't get 
enough contiguous memory space.


Maybe I am lucky since I have run three VirtualBox instances at a time 
(2GB allocation each) on my system with no problem at all.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Severe Problems on ZFS server

2010-04-22 Thread Andreas Höschler

Hi Bob,

The problem could be due to a faulty/failing disk, a poor connection 
with a disk, or some other hardware issue.  A failing disk can easily 
make the system pause temporarily like that.


As root you can run '/usr/sbin/fmdump -ef' to see all the fault events 
as they are reported.  Be sure to execute '/usr/sbin/fmadm faulty' to 
see if a fault has already been identified on your system.  Also 
execute '/usr/bin/iostat -xe' to see if there are errors reported 
against some of your disks, or if some are reported as being 
abnormally slow.


You might also want to verify that your Solaris 10 is current.  I 
notice that you did not identify what Solaris 10 you are using.


Thanks a lot for these hints. I checked all this. On my mirror server I 
found a faulty DIMM with these commands. But on the main server 
exhibiting the described problem everything seems fine.


another machine with 6GB RAM I fired up a second virtual machine 
(vbox). This drove the machine almost to a halt. The second vbox 
instance never came up. I finally saw a panel raised by the first 
vbox instance that there was not enough memory available (non severe 
vbox error) and the virtual machine was halted!! After killing the 
process of the second vbox I could simply press resume and the first 
vbox machine continued to work properly.


Maybe you should read the VirtualBox documentation.  There is a note 
about Solaris 10 and about how VirtualBox may fail if it can't get 
enough contiguous memory space.


Maybe I am lucky since I have run three VirtualBox instances at a time 
(2GB allocation each) on my system with no problem at all.


I have inserted

set zfs:zfs_arc_max = 0x2

in /etc/system and rebooted the machine having 64GB of memory. Tomorrow 
will show whether this did the trick!


Thanks a lot,

 Andreas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Severe Problems on ZFS server

2010-04-22 Thread Bob Friesenhahn

On Fri, 23 Apr 2010, Andreas Höschler wrote:


Maybe I am lucky since I have run three VirtualBox instances at a time (2GB 
allocation each) on my system with no problem at all.


I have inserted

set zfs:zfs_arc_max = 0x2

in /etc/system and rebooted the machine having 64GB of memory. Tomorrow will 
show whether this did the trick!


This *could* help if your server runs a rather strange and 
intermittent program which suddenly requests a huge amount of memory, 
accesses all that memory, and then releases the memory.  ZFS actually 
gives memory back to the kernel when requested, but of course it needs 
to determine which memory should be returned.  It seems unlikely that 
this would cause other applications to freeze unless there is a common 
dependency.  I do limit the size of the ARC on my system because I do 
run programs which request a lot of memory and then quit.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss