Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-22 Thread Orvar Korvar
Now this is a testament to the power of ZFS. Only ZFS is so sensitive it 
observed these errors to you. Had you run another filesystem, you would never 
got a notice that your data is slowly being corrupted by some faulty hardware. 

:o)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-20 Thread Warren Strange
Just following up...

I reran memtest diagnostics and let it run overnight again.   This time I did 
see some memory errors - which would be the most likely explanation for the 
errors I am seeing.

 Faulty hardware strikes again. 


Thanks to all for the advice.

Warren


> Comments below...
> 
> On Sep 12, 2010, at 2:56 PM, Warren Strange wrote:
> >> So we are clear, you are running VirtualBox on
> ZFS,
> >> rather than ZFS on VirtualBox?
> >> 
> > 
> > Correct
> > 
> >> 
> >> Bad power supply, HBA, cables, or other common
> cause.
> >> To help you determine the sort of corruption, for
> >> mirrored pools FMA will record
> >> the nature of the discrepancies.
> >>fmdump -eV
> >> will show a checksum error and the associated
> bitmap
> >> comparisons.
> > 
> > Below are the errors reported from the two disks.
> Not sure if anything looks suspicious (other than the
> obvious checksum error)
> > 
> > Sep 10 2010 12:49:42.315641690
> ereport.fs.zfs.checksum
> > nvlist version: 0
> > class = ereport.fs.zfs.checksum
> > ena = 0x95816e82e2900401
> > detector = (embedded nvlist)
> > nvlist version: 0
> > version = 0x0
> > scheme = zfs
> > pool = 0xf3cb5e110f2c88ec
> > vdev = 0x961d9b28c1440020
> > (end detector)
> > 
> > pool = tank
> > pool_guid = 0xf3cb5e110f2c88ec
> > pool_context = 0
> > pool_failmode = wait
> > vdev_guid = 0x961d9b28c1440020
> > vdev_type = disk
> > vdev_path = /dev/dsk/c8t5d0s0
> > vdev_devid =
> id1,s...@sata_wdc_wd15eads-00p_wd-wcavu0351361/a
> > parent_guid = 0xdae51838a62627b9
> > parent_type = mirror
> > zio_err = 50
> > zio_offset = 0x1ef6813a00
> > zio_size = 0x2
> > zio_objset = 0x10
> > zio_object = 0x1402f
> > zio_level = 0
> > zio_blkid = 0x76f
> > cksum_expected = 0x405288851d24 0x100655c808fa2072
> 0xa89d11a403482052 0xf1041fd6f838c6eb
> > cksum_actual = 0x40528884fd24 0x100655c803286072
> 0xa89d111c8af30052 0xf0fbe93b4f02c6eb
> > cksum_algorithm = fletcher4
> > __ttl = 0x1
> > __tod = 0x4c8a7dc6 0x12d04f5a
> > 
> > Sep 10 2010 12:49:42.315641636
> ereport.fs.zfs.checksum
> > nvlist version: 0
> > class = ereport.fs.zfs.checksum
> > ena = 0x95816e82e2900401
> > detector = (embedded nvlist)
> > nvlist version: 0
> > version = 0x0
> > scheme = zfs
> > pool = 0xf3cb5e110f2c88ec
> > vdev = 0x969570b704d5bff1
> > (end detector)
> > 
> > pool = tank
> > pool_guid = 0xf3cb5e110f2c88ec
> > pool_context = 0
> > pool_failmode = wait
> > vdev_guid = 0x969570b704d5bff1
> > vdev_type = disk
> > vdev_path = /dev/dsk/c8t4d0s0
> > vdev_devid =
> id1,s...@sata_st31500341as9vs3b4cp/a
> > parent_guid = 0xdae51838a62627b9
> > parent_type = mirror
> > zio_err = 50
> > zio_offset = 0x1ef6813a00
> > zio_size = 0x2
> > zio_objset = 0x10
> > zio_object = 0x1402f
> > zio_level = 0
> > zio_blkid = 0x76f
> > cksum_expected = 0x405288851d24 0x100655c808fa2072
> 0xa89d11a403482052 0xf1041fd6f838c6eb
> > cksum_actual = 0x40528884fd24 0x100655c803286072
> 0xa89d111c8af30052 0xf0fbe93b4f02c6eb
> > cksum_algorithm = fletcher4
> > __ttl = 0x1
> > __tod = 0x4c8a7dc6 0x12d04f24
> 
> In the case where one side of the mirror is corrupted
> and the other is correct, then
> you will be shown the difference between the two, in
> the form of an abbreviated bitmap.
> 
> In this case, the data on each side of the mirror is
> the same, with a large degree of
> confidence. So the source of the corruption is likely
> to be the same -- some common 
> component: CPU, RAM, HBA, I/O path, etc. You can rule
> out the disks as suspects.
> With some additional experiments you can determine if
> the corruption occurred during
> the write or the read.
>  -- richard
> -- 
> OpenStorage Summit, October 25-27, Palo Alto, CA
> http://nexenta-summit2010.eventbrite.com
> 
> Richard Elling
> rich...@nexenta.com   +1-760-896-4422
> Enterprise class storage for everyone
> www.nexenta.com
> 
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Richard Elling
Comments below...

On Sep 12, 2010, at 2:56 PM, Warren Strange wrote:
>> So we are clear, you are running VirtualBox on ZFS,
>> rather than ZFS on VirtualBox?
>> 
> 
> Correct
> 
>> 
>> Bad power supply, HBA, cables, or other common cause.
>> To help you determine the sort of corruption, for
>> mirrored pools FMA will record
>> the nature of the discrepancies.
>>  fmdump -eV
>> will show a checksum error and the associated bitmap
>> comparisons.
> 
> Below are the errors reported from the two disks. Not sure if anything looks 
> suspicious (other than the obvious checksum error)
> 
> Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum
> nvlist version: 0
>   class = ereport.fs.zfs.checksum
>   ena = 0x95816e82e2900401
>   detector = (embedded nvlist)
>   nvlist version: 0
>   version = 0x0
>   scheme = zfs
>   pool = 0xf3cb5e110f2c88ec
>   vdev = 0x961d9b28c1440020
>   (end detector)
> 
>   pool = tank
>   pool_guid = 0xf3cb5e110f2c88ec
>   pool_context = 0
>   pool_failmode = wait
>   vdev_guid = 0x961d9b28c1440020
>   vdev_type = disk
>   vdev_path = /dev/dsk/c8t5d0s0
>   vdev_devid = id1,s...@sata_wdc_wd15eads-00p_wd-wcavu0351361/a
>   parent_guid = 0xdae51838a62627b9
>   parent_type = mirror
>   zio_err = 50
>   zio_offset = 0x1ef6813a00
>   zio_size = 0x2
>   zio_objset = 0x10
>   zio_object = 0x1402f
>   zio_level = 0
>   zio_blkid = 0x76f
>   cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
> 0xf1041fd6f838c6eb
>   cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
> 0xf0fbe93b4f02c6eb
>   cksum_algorithm = fletcher4
>   __ttl = 0x1
>   __tod = 0x4c8a7dc6 0x12d04f5a
> 
> Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum
> nvlist version: 0
>   class = ereport.fs.zfs.checksum
>   ena = 0x95816e82e2900401
>   detector = (embedded nvlist)
>   nvlist version: 0
>   version = 0x0
>   scheme = zfs
>   pool = 0xf3cb5e110f2c88ec
>   vdev = 0x969570b704d5bff1
>   (end detector)
> 
>   pool = tank
>   pool_guid = 0xf3cb5e110f2c88ec
>   pool_context = 0
>   pool_failmode = wait
>   vdev_guid = 0x969570b704d5bff1
>   vdev_type = disk
>   vdev_path = /dev/dsk/c8t4d0s0
>   vdev_devid = id1,s...@sata_st31500341as9vs3b4cp/a
>   parent_guid = 0xdae51838a62627b9
>   parent_type = mirror
>   zio_err = 50
>   zio_offset = 0x1ef6813a00
>   zio_size = 0x2
>   zio_objset = 0x10
>   zio_object = 0x1402f
>   zio_level = 0
>   zio_blkid = 0x76f
>   cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
> 0xf1041fd6f838c6eb
>   cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
> 0xf0fbe93b4f02c6eb
>   cksum_algorithm = fletcher4
>   __ttl = 0x1
>   __tod = 0x4c8a7dc6 0x12d04f24

In the case where one side of the mirror is corrupted and the other is correct, 
then
you will be shown the difference between the two, in the form of an abbreviated 
bitmap.

In this case, the data on each side of the mirror is the same, with a large 
degree of
confidence. So the source of the corruption is likely to be the same -- some 
common 
component: CPU, RAM, HBA, I/O path, etc. You can rule out the disks as suspects.
With some additional experiments you can determine if the corruption occurred 
during
the write or the read.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Warren Strange
> So we are clear, you are running VirtualBox on ZFS,
> rather than ZFS on VirtualBox?
> 

Correct


> 
> Bad power supply, HBA, cables, or other common cause.
> To help you determine the sort of corruption, for
> mirrored pools FMA will record
> the nature of the discrepancies.
>   fmdump -eV
> will show a checksum error and the associated bitmap
> comparisons.

Below are the errors reported from the two disks. Not sure if anything looks 
suspicious (other than the obvious checksum error)




Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x95816e82e2900401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xf3cb5e110f2c88ec
vdev = 0x961d9b28c1440020
(end detector)

pool = tank
pool_guid = 0xf3cb5e110f2c88ec
pool_context = 0
pool_failmode = wait
vdev_guid = 0x961d9b28c1440020
vdev_type = disk
vdev_path = /dev/dsk/c8t5d0s0
vdev_devid = id1,s...@sata_wdc_wd15eads-00p_wd-wcavu0351361/a
parent_guid = 0xdae51838a62627b9
parent_type = mirror
zio_err = 50
zio_offset = 0x1ef6813a00
zio_size = 0x2
zio_objset = 0x10
zio_object = 0x1402f
zio_level = 0
zio_blkid = 0x76f
cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
0xf1041fd6f838c6eb
cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
0xf0fbe93b4f02c6eb
cksum_algorithm = fletcher4
__ttl = 0x1
__tod = 0x4c8a7dc6 0x12d04f5a

Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x95816e82e2900401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xf3cb5e110f2c88ec
vdev = 0x969570b704d5bff1
(end detector)

pool = tank
pool_guid = 0xf3cb5e110f2c88ec
pool_context = 0
pool_failmode = wait
vdev_guid = 0x969570b704d5bff1
vdev_type = disk
vdev_path = /dev/dsk/c8t4d0s0
vdev_devid = id1,s...@sata_st31500341as9vs3b4cp/a
parent_guid = 0xdae51838a62627b9
parent_type = mirror
zio_err = 50
zio_offset = 0x1ef6813a00
zio_size = 0x2
zio_objset = 0x10
zio_object = 0x1402f
zio_level = 0
zio_blkid = 0x76f
cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
0xf1041fd6f838c6eb
cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
0xf0fbe93b4f02c6eb
cksum_algorithm = fletcher4
__ttl = 0x1
__tod = 0x4c8a7dc6 0x12d04f24

Message was edited by: wstrange
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Richard Elling
On Sep 12, 2010, at 11:05 AM, Warren Strange wrote:

> I posted the following to the VirtualBox forum. I would be interested in 
> finding out if anyone else has ever seen zpool corruption with VirtualBox as 
> a host on OpenSolaris:
> 
> -
> I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.
> 
> I have experienced 6-7 instances of my zpool getting corrupted.  I am 
> wondering if anyone else has ever seen this before. 
> 
> This is on a mirrored zpool - using drives from two different manufacturers 
> (i.e. it is very unlikely both drives would fail at the same time, with the 
> same blocks going bad). I initially thought I might have a memory problem - 
> which could explain the simultaneous disk failures. After running memory 
> diagnostics for 24 hours with no errors reported, I am beginning to suspect 
> it might be something else.

So we are clear, you are running VirtualBox on ZFS, rather than ZFS on 
VirtualBox?

> I am using shared folders from the guest - mounted at guest boot up time. 
> 
> Is it possible that the Solaris vboxsf shared folder kernel driver is causing 
> corruption? Being in the kernel, would it allow bypassing of the normal zfs 
> integrity mechanisms? Or is it possible there is some locking issue or race 
> condition that triggers the corruption?
> 
> Anecdotally, when I see the corruption the sequence of events seems to be:
> 
> - dmesg reports various vbox drivers being loaded (normal - just loading the 
> drivers)
> - Guest boots - gets just pass grub boot screen to the initial redhat boot 
> screen. 
> - The Guest hangs and never boots. 
> - zpool status -v  reports corrupted files. The files are on the zpool 
> containing the shared folders and the VirtualBox images
> 
> 
> Thoughts?

Bad power supply, HBA, cables, or other common cause.
To help you determine the sort of corruption, for mirrored pools FMA will record
the nature of the discrepancies.
fmdump -eV
will show a checksum error and the associated bitmap comparisons.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Jeff Savit

Hi Warren,

This may not help much, except perhaps as a way to eliminate possible 
causes, but I ran b134 with VirtualBox and guests on ZFS for quite a 
long time without any such symptoms. My pool is a simple, unmirrored 
one, so the difference may be there. I used shared folders without 
incident. Guests include Linux (several distros, including RH), Windows, 
Solaris, BSD.


--Jeff

On 09/12/10 11:05 AM, Warren Strange wrote:

I posted the following to the VirtualBox forum. I would be interested in 
finding out if anyone else has ever seen zpool corruption with VirtualBox as a 
host on OpenSolaris:

-
I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.

I have experienced 6-7 instances of my zpool getting corrupted.  I am wondering 
if anyone else has ever seen this before.

This is on a mirrored zpool - using drives from two different manufacturers 
(i.e. it is very unlikely both drives would fail at the same time, with the 
same blocks going bad). I initially thought I might have a memory problem - 
which could explain the simultaneous disk failures. After running memory 
diagnostics for 24 hours with no errors reported, I am beginning to suspect it 
might be something else.

I am using shared folders from the guest - mounted at guest boot up time.

Is it possible that the Solaris vboxsf shared folder kernel driver is causing 
corruption? Being in the kernel, would it allow bypassing of the normal zfs 
integrity mechanisms? Or is it possible there is some locking issue or race 
condition that triggers the corruption?

Anecdotally, when I see the corruption the sequence of events seems to be:

- dmesg reports various vbox drivers being loaded (normal - just loading the 
drivers)
- Guest boots - gets just pass grub boot screen to the initial redhat boot 
screen.
- The Guest hangs and never boots.
- zpool status -v  reports corrupted files. The files are on the zpool 
containing the shared folders and the VirtualBox images


Thoughts?
   



--


Jeff Savit | Principal Sales Consultant
Phone: 602.824.6275
Email: jeff.sa...@oracle.com | Blog: http://blogs.sun.com/jsavit
Oracle North America Commercial Hardware
Operating Environments & Infrastructure S/W Pillar
2355 E Camelback Rd | Phoenix, AZ 85016



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Warren Strange
I posted the following to the VirtualBox forum. I would be interested in 
finding out if anyone else has ever seen zpool corruption with VirtualBox as a 
host on OpenSolaris:

-
I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.

I have experienced 6-7 instances of my zpool getting corrupted.  I am wondering 
if anyone else has ever seen this before. 

This is on a mirrored zpool - using drives from two different manufacturers 
(i.e. it is very unlikely both drives would fail at the same time, with the 
same blocks going bad). I initially thought I might have a memory problem - 
which could explain the simultaneous disk failures. After running memory 
diagnostics for 24 hours with no errors reported, I am beginning to suspect it 
might be something else.

I am using shared folders from the guest - mounted at guest boot up time. 

Is it possible that the Solaris vboxsf shared folder kernel driver is causing 
corruption? Being in the kernel, would it allow bypassing of the normal zfs 
integrity mechanisms? Or is it possible there is some locking issue or race 
condition that triggers the corruption?

Anecdotally, when I see the corruption the sequence of events seems to be:

- dmesg reports various vbox drivers being loaded (normal - just loading the 
drivers)
- Guest boots - gets just pass grub boot screen to the initial redhat boot 
screen. 
- The Guest hangs and never boots. 
- zpool status -v  reports corrupted files. The files are on the zpool 
containing the shared folders and the VirtualBox images


Thoughts?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss