[zfs-discuss] Mount External USB cdrom on zfs

2009-01-26 Thread iman habibi
Dear support
when i connect my external usb dvdrom to the sparc machine which has
installed solaris 10u6 based zfs file system,,it return this error:

bash-3.00# mount /dev/dsk/c1t0d0s0 /dvd/
Jan 27 11:08:41 global ufs: NOTICE: mount: not a UFS magic number (0x0)
mount: /dev/dsk/c1t0d0s0 is not this fstype
bash-3.00# Jan 27 11:08:41 global ufs: [ID 717476 kern.notice] NOTICE:
mount: not a UFS magic number (0x0
and i cant mount it and cant see anythings.

but i think in the messages,,it detect it:
Jan 27 10:52:08 global usba: [ID 349649 kern.info]  Cypress
Semiconductor USB2.0 Storage Device DEF10AF1F9AD
Jan 27 10:52:08 global genunix: [ID 936769 kern.info] scsa2usb0 is /p...@1f
,0/u...@a/stor...@1
Jan 27 10:52:08 global genunix: [ID 408114 kern.info] /p...@1f,0/u...@a
/stor...@1 (scsa2usb0) online
Jan 27 10:52:09 global scsi: [ID 193665 kern.info] sd0 at scsa2usb0: target
0 lun 0
Jan 27 10:52:09 global genunix: [ID 936769 kern.info] sd0 is /p...@1f,0/u...@a
/stor...@1/d...@0,0
Jan 27 10:52:15 global genunix: [ID 408114 kern.info] /p...@1f,0/u...@a
/stor...@1/d...@0,0 (sd0) online

how can i mount external usb cdrom in solaris 10u6 based on zfs file system.
Regards
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Unusual CIFS write bursts

2009-01-26 Thread Brent Jones
While doing some performance testing on a pair of X4540's running
snv_105, I noticed some odd behavior while using CIFS.
I am copying a 6TB database file (yes, a single file) over our GigE
network to the X4540, then snapshotting that data to the secondary
X4540.
Writing said 6TB file can peak our gigabit network, with about
95-100MB/sec going over the wire (can't ask for any more, really).

However, the disk IO on the X4540 appears unusual. I would expect the
disks to be constantly writing 95-100MB/sec, but it appears it buffers
about 1GB worth of data before committing to disk. This is in contrast
to NFS write behavior, where as I write a 1GB file to the NFS server
from an NFS client, traffic on the wire correlates concisely to the
disk writes. For example, 60MB/sec on the wire via NFS will trigger
60MB/sec on disk. This is a single file on both cases.

I wouldn't have a problem with this "buffer", it seems to be a rolling
10-second buffer, if I am copying several small files at lower speeds,
the disk buffer still seems to "purge" after roughly 10 seconds, not
when a certain size is reached. The larger the amount of data that
goes into the buffer is what causes a problem, writing 1GB to disk can
cause the system to slow down substantially, all network traffic
pauses or drops to mere kilobytes a second while it writes this
buffer.

I would like to see a smoother handling of this buffer, or a tuneable
to make the buffer write more often or fill quicker.

This is a 48TB unit, 64GB ram, and the arcstat perl script reports my
ARC is 55GB in size, with near 0% miss on reads.

Has anyone seen something similar, or know of any un-documented
tuneables to reduce the effects of this?


Here is 'zpool iostat' output, in 1 second intervals while this "write
storm" occurs".


# zpool iostat pdxfilu01 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
pdxfilu01   2.09T  36.0T  1 61   143K  7.30M
pdxfilu01   2.09T  36.0T  0  0  0  0
pdxfilu01   2.09T  36.0T  0  0  0  0
pdxfilu01   2.09T  36.0T  0  0  0  0
pdxfilu01   2.09T  36.0T  0 60  0  7.55M
pdxfilu01   2.09T  36.0T  0  1.70K  0   211M
pdxfilu01   2.09T  36.0T  0  2.56K  0   323M
pdxfilu01   2.09T  36.0T  0  2.97K  0   375M
pdxfilu01   2.09T  36.0T  0  3.15K  0   399M
pdxfilu01   2.09T  36.0T  0  2.22K  0   244M
pdxfilu01   2.09T  36.0T  0  0  0  0
pdxfilu01   2.09T  36.0T  0  0  0  0
pdxfilu01   2.09T  36.0T  0  0  0  0
pdxfilu01   2.09T  36.0T  0  0  0  0


Here is my 'zpool status' output.

# zpool status
  pool: pdxfilu01
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
pdxfilu01   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c6t0d0  ONLINE   0 0 0
c7t0d0  ONLINE   0 0 0
c8t0d0  ONLINE   0 0 0
c9t0d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
c9t1d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
c9t2d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0
c9t4d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
c7t5d0  ONLINE   0 0 0
c8t5d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4t6d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c6t6d0  ONLINE   0 0 0
c7t6d0  ONLINE   0 0 0
c8t6d0  ONLINE   0 0 0
c9t6d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0

[zfs-discuss] Replacing HDD in x4500

2009-01-26 Thread Jorgen Lundman

The vendor wanted to come in and replace an HDD in the 2nd X4500, as it 
was "constantly busy", and since our x4500 has always died miserably in 
the past when a HDD dies, they wanted to replace it before the HDD 
actually died.

The usual was done, HDD replaced, resilvering started and ran for about 
50 minutes. Then the system hung, same as always, all ZFS related 
commands would just hang and do nothing. System is otherwise fine and 
completely idle.

The vendor for some reason decided to fsck root-fs, not sure why as it 
is mounted with "logging", and also decided it would be best to do so 
from a CDRom boot.

Anyway, that was 12 hours ago and the x4500 is still down. I think they 
have it at single-user prompt resilvering again. (I also noticed they'd 
decided to break the mirror of the root disks for some very strange 
reason). It still shows:

   raidz1  DEGRADED 0 0 0
 c0t1d0ONLINE   0 0 0
 replacing UNAVAIL  0 0 0  insufficient replicas
   c1t1d0s0/o  OFFLINE  0 0 0
   c1t1d0  UNAVAIL  0 0 0  cannot open

So I am pretty sure it'll hang again sometime soon. What is interesting 
though is that this is on x4500-02, and all our previous troubles mailed 
to the list was regarding our first x4500. The hardware is all 
different, but identical. Solaris 10 5/08.

Anyway, I think they want to boot CDrom to fsck root again for some 
reason, but since customers have been without their mail for 12 hours, 
they can go a little longer, I guess.

What I was really wondering, has there been any progress or patches 
regarding the system always hanging whenever a HDD dies (or is replaced 
it seems). It really is rather frustrating.

Lund

-- 
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11

2009-01-26 Thread Jim Dunham
Richard Elling wrote:
> Jim Dunham wrote:
>> Ahmed,
>>
>>> The setup is not there anymore, however, I will share as much  
>>> details
>>> as I have documented. Could you please post the commands you have  
>>> used
>>> and any differences you think might be important. Did you ever test
>>> with 2008.11 ? instead of sxce ?
>>>
>>
>> Specific to the following:
>>
> While we should be getting minimal performance hit (hopefully),  
> we  got
> a big performance hit, disk throughput was reduced to almost 10%  
> of
> the normal rate.
>
>>
>> It looks like I need to test on OpenSoalris 2008.11, not Solaris   
>> Express CE (b105), since this version does not have access to a   
>> version of 'dd' with a  oflag= setting.
>>
>> # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync   
>> bs=256M count=10
>> dd: bad argument: "oflag=dsync
>>
>
> Congratulations!  You've been bit by the gnu-compatibility feature!

Oh that's what one calls it... a feature?

> SXCE and OpenSolaris have more than one version of dd.  The difference
> is that OpenSolaris sets your default PATH to use /usr/gnu/bin/dd,  
> which
> has the oflag option, while SXCE sets your default PATH to use /usr/ 
> bin/dd.

Thank you,

Jim

>
> -- richard
>
>> Using a setting of 'oflag=dsync' will have performance implications.
>>
>> Also there is an issue with an I/O of size bs=256M. SNDR's  
>> internal  architecture has a I/O unit chunk size of one bit in  
>> 32KB". Therefore  when doing an I/O of 256MB, this results in the  
>> need to set 8192 bits,  1024 bytes, or 1KB of data with 0xFF.   
>> Although testing with an /O  size of 256MB is interesting, typical  
>> I/O tests are more like the  following: 
>> http://www.opensolaris.org/os/community/performance/filebench/quick_start/
>>
>> - Jim
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11

2009-01-26 Thread Richard Elling
Jim Dunham wrote:
> Ahmed,
>
>   
>> The setup is not there anymore, however, I will share as much details
>> as I have documented. Could you please post the commands you have used
>> and any differences you think might be important. Did you ever test
>> with 2008.11 ? instead of sxce ?
>> 
>
> Specific to the following:
>
>   
 While we should be getting minimal performance hit (hopefully), we  
 got
 a big performance hit, disk throughput was reduced to almost 10% of
 the normal rate.
 
>
> It looks like I need to test on OpenSoalris 2008.11, not Solaris  
> Express CE (b105), since this version does not have access to a  
> version of 'dd' with a  oflag= setting.
>
> # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync  
> bs=256M count=10
> dd: bad argument: "oflag=dsync
>   

Congratulations!  You've been bit by the gnu-compatibility feature!
SXCE and OpenSolaris have more than one version of dd.  The difference
is that OpenSolaris sets your default PATH to use /usr/gnu/bin/dd, which
has the oflag option, while SXCE sets your default PATH to use /usr/bin/dd.
 -- richard

> Using a setting of 'oflag=dsync' will have performance implications.
>
> Also there is an issue with an I/O of size bs=256M. SNDR's internal  
> architecture has a I/O unit chunk size of one bit in 32KB". Therefore  
> when doing an I/O of 256MB, this results in the need to set 8192 bits,  
> 1024 bytes, or 1KB of data with 0xFF.  Although testing with an /O  
> size of 256MB is interesting, typical I/O tests are more like the  
> following: 
> http://www.opensolaris.org/os/community/performance/filebench/quick_start/
>
> - Jim
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11

2009-01-26 Thread Jim Dunham
Ahmed,

> The setup is not there anymore, however, I will share as much details
> as I have documented. Could you please post the commands you have used
> and any differences you think might be important. Did you ever test
> with 2008.11 ? instead of sxce ?

Specific to the following:

>>> While we should be getting minimal performance hit (hopefully), we  
>>> got
>>> a big performance hit, disk throughput was reduced to almost 10% of
>>> the normal rate.

It looks like I need to test on OpenSoalris 2008.11, not Solaris  
Express CE (b105), since this version does not have access to a  
version of 'dd' with a  oflag= setting.

# dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync  
bs=256M count=10
dd: bad argument: "oflag=dsync"

Using a setting of 'oflag=dsync' will have performance implications.

Also there is an issue with an I/O of size bs=256M. SNDR's internal  
architecture has a I/O unit chunk size of one bit in 32KB". Therefore  
when doing an I/O of 256MB, this results in the need to set 8192 bits,  
1024 bytes, or 1KB of data with 0xFF.  Although testing with an /O  
size of 256MB is interesting, typical I/O tests are more like the  
following: 
http://www.opensolaris.org/os/community/performance/filebench/quick_start/

- Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to fix zpool with corrupted disk?

2009-01-26 Thread Miles Nordin
> "js" == Jakov Sosic  writes:
> "tt" == Toby Thain  writes:

js> Yes but that will do the complete resilvering, and I just want
js> to fix the corrupted blocks... :)

tt> What you are asking for is impossible, since ZFS cannot know
tt> which blocks are corrupted without actually checking them

yeah of course you have to read every (occupied) block, but he's still
not asking for something completely nonsensical.  What if the good
drive has a latent sector error in one of the blocks that hasn't been
scribbled over on the bad drive?  scrub could heal the error if not
for the ``too many errors'' fault, while 'zpool replace' could not
heal it.


pgp8nwAbkix64.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send -R slow

2009-01-26 Thread Ian Collins
BJ Quinn wrote:
> That sounds like a great idea if I can get it to work--
>
>   
What does?

> I get how to add a drive to a zfs mirror, but for the life of me I can't find 
> out how to safely remove a drive from a mirror.
>
>   
Have you tried "man zpool"?  See the entry for detach.

> Also, if I do remove the drive from the mirror, then pop it back up in some 
> unsuspecting (and unrelated) Solaris box, will it just see a drive with a 
> pool on it and let me mount it up?  
You should be able to import it, but I haven't tried.

> What about when I pop in the drive to be resilvered, but right before I add 
> it back to the mirror, will Solaris get upset that I have two drives both 
> with the same pool name?
>   
No, you have to do a manual import.

-- 
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send -R slow

2009-01-26 Thread BJ Quinn
That sounds like a great idea if I can get it to work--

I get how to add a drive to a zfs mirror, but for the life of me I can't find 
out how to safely remove a drive from a mirror.

Also, if I do remove the drive from the mirror, then pop it back up in some 
unsuspecting (and unrelated) Solaris box, will it just see a drive with a pool 
on it and let me mount it up?  What about when I pop in the drive to be 
resilvered, but right before I add it back to the mirror, will Solaris get 
upset that I have two drives both with the same pool name?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to fix zpool with corrupted disk?

2009-01-26 Thread Toby Thain

On 26-Jan-09, at 6:21 PM, Jakov Sosic wrote:

>>> So I wonder now, how to fix this up? Why doesn't
>> scrub overwrite bad data with good data from first
>> disk?
>>
>> ZFS doesn't know why the errors occurred, the most
>> likely scenario would be a
>> bad disk -- in which case you'd need to replace it.
>
> I know and understand that... But, what is then a limit for self- 
> healing? 2 errors per vdev? 3 errors? 10 errors? before ZFS decides  
> that vdev is irreparable...
>
>
>> You shouldn't need to attach/detach anything.
>> I think you're looking for 'zpool replace'.
>>zpool replace tank c0d1s0
>
> Yes but that will do the complete resilvering, and I just want to  
> fix the corrupted blocks... :)

What you are asking for is impossible, since ZFS cannot know which  
blocks are corrupted without actually checking them all (like a  
scrub). A resilver involves knowing that some set of blocks is out of  
date, but ZFS need not verify the rest.

--Toby

> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to fix zpool with corrupted disk?

2009-01-26 Thread Richard Elling
Jakov Sosic wrote:
> Hi guys!
>
> I'm doing series of tests on ZFS before putting it into production on several 
> machines, and I've come to a dead end. I have two disks in mirror (rpool). 
> Intentionally, I corrupt data on second disk:
>
> # dd if=/dev/urandom of=/dev/rdsk/c0d1t0 bs=512 count=20480 seek=10240
>
> So, I've written 10MB's of random data after first 5MB's of hard drive. After 
> sync and reboot, ZFS got the corruption noticed, and then I run zpool scrub 
> rpool. After that, I've got this state:
>
> unknown# zpool status
>   pool: rpool
>  state: DEGRADED
> status: One or more devices has experienced an unrecoverable error.  An
> attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
>see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub in progress for 0h0m, 5.64% done, 0h5m to go
> config:
>
> NAMESTATE READ WRITE CKSUM
> rpool   DEGRADED 0 0 0
>   mirrorDEGRADED 0 0 0
> c0d1s0  DEGRADED 0 026  too many errors
> c0d0s0  ONLINE   0 0 0
>
> errors: No known data errors
>
>
> So I wonder now, how to fix this up? Why doesn't scrub overwrite bad data 
> with good data from first disk?
>   

The data is already fixed, which is why it says "errors: No known data 
errors"

> If I run zpool clear, it will only clear the error reports, and it won't 
> fixed them - I presume that because I don't understand the man page for that 
> section clearly.
>
> So, how can I fix this disk, without detach/attach procedure
>   

Be happy, the data is already fixed. The "DEGRADED" state is used
when too many errors were found in a short period of time, which
one would use as an idicator of a failing device.  However, since the
device is not actually failed, it is of no practical use in your test case.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to fix zpool with corrupted disk?

2009-01-26 Thread Scott Watanabe
Looks like your scrub was not finished yet. Did check it later? You should not
have had to replace the disk. You might have to reinstall the bootblock.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to fix zpool with corrupted disk?

2009-01-26 Thread Jakov Sosic
> > So I wonder now, how to fix this up? Why doesn't
> scrub overwrite bad data with good data from first
> disk?
> 
> ZFS doesn't know why the errors occurred, the most
> likely scenario would be a 
> bad disk -- in which case you'd need to replace it.

I know and understand that... But, what is then a limit for self-healing? 2 
errors per vdev? 3 errors? 10 errors? before ZFS decides that vdev is 
irreparable...


> You shouldn't need to attach/detach anything.
> I think you're looking for 'zpool replace'.
>zpool replace tank c0d1s0

Yes but that will do the complete resilvering, and I just want to fix the 
corrupted blocks... :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to fix zpool with corrupted disk?

2009-01-26 Thread Bryant Eadon
Jakov Sosic wrote:
> Hi guys!
> 
> I'm doing series of tests on ZFS before putting it into production on several 
> machines, and I've come to a dead end. I have two disks in mirror (rpool). 
> Intentionally, I corrupt data on second disk:
> 
> # dd if=/dev/urandom of=/dev/rdsk/c0d1t0 bs=512 count=20480 seek=10240
> 
> So, I've written 10MB's of random data after first 5MB's of hard drive. After 
> sync and reboot, ZFS got the corruption noticed, and then I run zpool scrub 
> rpool. After that, I've got this state:
> 
> unknown# zpool status
>   pool: rpool
>  state: DEGRADED
> status: One or more devices has experienced an unrecoverable error.  An
> attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
>see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub in progress for 0h0m, 5.64% done, 0h5m to go
> config:
> 
> NAMESTATE READ WRITE CKSUM
> rpool   DEGRADED 0 0 0
>   mirrorDEGRADED 0 0 0
> c0d1s0  DEGRADED 0 026  too many errors
> c0d0s0  ONLINE   0 0 0
> 
> errors: No known data errors
> 
> 
> So I wonder now, how to fix this up? Why doesn't scrub overwrite bad data 
> with good data from first disk?

ZFS doesn't know why the errors occurred, the most likely scenario would be a 
bad disk -- in which case you'd need to replace it.


> If I run zpool clear, it will only clear the error reports, and it won't 
> fixed them - I presume that because I don't understand the man page for that 
> section clearly.

The admin guide is great to follow for these tests :
http://docs.sun.com/app/docs/doc/819-5461

> So, how can I fix this disk, without detach/attach procedure?

You shouldn't need to attach/detach anything.
I think you're looking for 'zpool replace'.
   zpool replace tank c0d1s0



-Bryant
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] how to fix zpool with corrupted disk?

2009-01-26 Thread Jakov Sosic
Hi guys!

I'm doing series of tests on ZFS before putting it into production on several 
machines, and I've come to a dead end. I have two disks in mirror (rpool). 
Intentionally, I corrupt data on second disk:

# dd if=/dev/urandom of=/dev/rdsk/c0d1t0 bs=512 count=20480 seek=10240

So, I've written 10MB's of random data after first 5MB's of hard drive. After 
sync and reboot, ZFS got the corruption noticed, and then I run zpool scrub 
rpool. After that, I've got this state:

unknown# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 0h0m, 5.64% done, 0h5m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   DEGRADED 0 0 0
  mirrorDEGRADED 0 0 0
c0d1s0  DEGRADED 0 026  too many errors
c0d0s0  ONLINE   0 0 0

errors: No known data errors


So I wonder now, how to fix this up? Why doesn't scrub overwrite bad data with 
good data from first disk?

If I run zpool clear, it will only clear the error reports, and it won't fixed 
them - I presume that because I don't understand the man page for that section 
clearly.

So, how can I fix this disk, without detach/attach procedure?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] thoughts on parallel backups, rsync, and send/receive

2009-01-26 Thread Ian Collins
Richard Elling wrote:
> Ian Collins wrote:
>
>> One thing I have yet to do is find the optimum number of parallel
>> transfers when there are 100s of filesystems.  I'm looking into making
>> this dynamic, based on throughput.
>>   
>
> I'm not convinced that a throughput throttle or metric will be
> meaningful. I believe this will need to be iop-based.
>
OK, I'll check.  I was looking at adding jibs until the average send
time declined.

>> Are you working with OpenSolaris?  I still haven't managed to nail the
>> toxic streams problem in Solaris 10, which have curtailed my project.
>>   
>
> I am aware of the bug, but have not seen it.  Murphy's Law says it won't
> happen until we roll into production :-(

How many file systems do you have?  I hit the problem about 1 in 1500
send/receives.  The last time was with a 1TB filesystem with about 600GB
of snaps, so I couldn't attach it to the bug!

-- 
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11

2009-01-26 Thread Ahmed Kamal
Hi Jim,

The setup is not there anymore, however, I will share as much details
as I have documented. Could you please post the commands you have used
and any differences you think might be important. Did you ever test
with 2008.11 ? instead of sxce ?

I will probably be testing again soon. Any tips or obvious errors are welcome :)

->8-
The Setup
* A 100G zvol has been setup on each node of an AVS replicating pair
* A "ramdisk" has been setup on each node using
  ramdiskadm -a ram1 10m
* The replication relationship has been setup using
  sndradm -E pri /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 sec
/dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 ip async
* The AVS driver was configured to not log the disk bitmap to disk,
rather to keep it in kernel memory and write it to disk only upon
machine shutdown. This is configured as such
  grep bitmap_mode /usr/kernel/drv/rdc.conf
  rdc_bitmap_mode=2;
* The replication was configured to be in logging mode
  sndradm -P
  /dev/zvol/rdsk/gold/myzvol  <-  pri:/dev/zvol/rdsk/gold/myzvol
  autosync: off, max q writes: 4096, max q fbas: 16384, async threads:
2, mode: async, state: logging

Testing was done with:

 dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync bs=256M count=10

* Option 'dsync' is chosen to try avoiding zfs's aggressive caching.
Moreover however, usually a couple of runs were launched initially to
fill the instant zfs cache and to force real writing to disk
* Option 'bs=256M' was used in order to avoid the overhead of copying
multiple small blocks to kernel memory before disk writes. A larger bs
size ensures max throughput. Smaller values were used without much
difference though

The results on multiple runs

Non Replicated Vol Throughputs: 42.2, 52.8, 50.9 MB/s
Replicated Vol Throughputs:  4.9, 5.5, 4.6 MB/s

-->8-

Regards

On Mon, Jan 26, 2009 at 1:22 AM, Jim Dunham  wrote:
> Ahmed,
>
>> Thanks for your informative reply. I am involved with kristof
>> (original poster) in the setup, please allow me to reply below
>>
>>> Was the follow 'test' run during resynchronization mode or replication
>>> mode?
>>>
>>
>> Neither, testing was done while in logging mode. This was chosen to
>> simply avoid any network "issues" and to get the setup working as fast
>> as possible. The setup was created with:
>>
>> sndradm -E pri /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 sec
>> /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 ip async
>>
>> Note that the logging disks are ramdisks again trying to avoid disk
>> contention and get fastest performance (reliability is not a concern
>> in this test). Before running the tests, this was the state
>>
>> #sndradm -P
>> /dev/zvol/rdsk/gold/myzvol  <-  pri:/dev/zvol/rdsk/gold/myzvol
>> autosync: off, max q writes: 4096, max q fbas: 16384, async threads:
>> 2, mode: async, state: logging
>>
>> While we should be getting minimal performance hit (hopefully), we got
>> a big performance hit, disk throughput was reduced to almost 10% of
>> the normal rate.
>
> Is it possible to share information on your ZFS storage pool configuration,
> your testing tool, testing types and resulting data?
>
> I just downloaded Solaris Express CE (b105)
> http://opensolaris.org/os/downloads/sol_ex_dvd_1/,  configured ZFS in
> various storage pool types, SNDR with and without RAM disks, and I do not
> see that disk throughput was reduced to almost 10% o the normal rate. Yes
> there is some performance impact, but no where near there amount reported.
>
> There are various factors which could come into play here, but the most
> obvious reason that someone may see a serious performance degradation as
> reported, is that prior to SNDR being configured, the existing system under
> test was already maxed out on some system limitation, such as CPU and
> memory.  I/O impact should not be a factor, given that a RAM disk is used.
> The addition of both SNDR and a RAM disk in the data, regardless of how
> small their system cost is, will have a profound impact on disk throughput.
>
> Jim
>
>>
>> Please feel free to ask for any details, thanks for the help
>>
>> Regards
>> ___
>> storage-discuss mailing list
>> storage-disc...@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/storage-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] thoughts on parallel backups, rsync, and send/receive

2009-01-26 Thread Richard Elling
Ahmed Kamal wrote:
> Did anyone share a script to send/recv zfs filesystems tree in
> parallel, especially if a cap on concurrency can be specified?
> Richard, how fast were you taking those snapshots, how fast were the
> syncs over the network. For example, assuming a snapshot every 10mins,
> is it reasonable to expect to sync every snapshot as they're created
> every 10 mins. What would be the limit trying to lower those 10mins
> even more
>   

We were snapping every hour with send/receive times on the order
of 25 minutes.  I do not believe there will be time to experiment with
other combinations.

> Is it catastrophic if a second zfs send launches, while an older one
> is still being run
>   

I use a semaphore property to help avoid this, by design. That said,
I have not tried to see if there is a lurking bug with ZFS receive that
would need to be fixed if it cannot handle concurrent receives. My
send/receive script will incrementally copy from the latest, common
snapshot to the latest snapshot. For rsync, it will sync from the
epoch to the latest snapshot.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] thoughts on parallel backups, rsync, and send/receive

2009-01-26 Thread Richard Elling

Ian Collins wrote:

Richard Elling wrote:
  

Recently, I've been working on a project which had agressive backup
requirements. I believe we solved the problem with parallelism.  You
might consider doing the same.  If you get time to do your own experiments,
please share your observations with the community.
http://richardelling.blogspot.com/2009/01/parallel-zfs-sendreceive.html
  



You raise some interesting points about rsync getting bogged down over
time.  I have been working with a client with a requirement for
replication between a number of hosts and I have found doing several
rend/receives made quite an impact.  What I haven't done is try this
with the latest performance improvements in b105.  Have you?  My guess
is the gain will be less.
  


Unfortunately, the rig was constrained to Solaris 10 10/08, so I don't
have any data on this for OpenSolaris.


One thing I have yet to do is find the optimum number of parallel
transfers when there are 100s of filesystems.  I'm looking into making
this dynamic, based on throughput.
  


I'm not convinced that a throughput throttle or metric will be
meaningful. I believe this will need to be iop-based.


Are you working with OpenSolaris?  I still haven't managed to nail the
toxic streams problem in Solaris 10, which have curtailed my project.
  


I am aware of the bug, but have not seen it.  Murphy's Law says it won't
happen until we roll into production :-(
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] E2BIG

2009-01-26 Thread Marcelo Leal
Hello all...
 We are getting this error: "E2BIG - Arg list too long", when trying to send 
incremental backups (b89 -> b101). Do you know about any bugs related to that? 
I did a look on the archives, and google but could not find anything. 
 What i did find was something related with wrong timestamps (32bits), and some 
ZFS test on the code: zfs_vnops.c. But the error is EOVERFLOW...
 Thanks a lot for your time!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch

Eric D. Mudama writes:
 > On Tue, Jan 20 at 21:35, Eric D. Mudama wrote:
 > > On Tue, Jan 20 at  9:04, Richard Elling wrote:
 > >>
 > >> Yes.  And I think there are many more use cases which are not
 > >> yet characterized.  What we do know is that using an SSD for
 > >> the separate ZIL log works very well for a large number of cases.
 > >> It is not clear to me that the efforts to characterize a large
 > >> number of cases is worthwhile, when we can simply throw an SSD
 > >> at the problem and solve it.
 > >>  -- richard
 > >>
 > >
 > > I think the issue is, like a previous poster discovered, there's not a
 > > lot of available data on exact performance changes of adding ZIL/L2ARC
 > > devices in a variety of workloads, so people wind up spending money
 > > and doing lots of trial and error, without clear expectations of
 > > whether their modifications are working or not.
 > 
 > Sorry for that terrible last sentence, my brain is fried right now.
 > 
 > I was trying to say that most people don't know what they're going to
 > get out of an SSD or other ZIL/L2ARC device ahead of time, since it
 > varies so much by workload, configuration, etc. and it's an expensive
 > problem to solve through trial an error since these
 > performance-improving devices are many times more expensive than the
 > raw SAS/SATA devices in the main pool.
 > 

I agree with you on the L2ARC front but not on the SSD for
ZIL. We clearly expect 10X gain for lightly threaded
workloads and that's a big satifyer because not everything
happen with large amount of concurrency and some high value
tasks do not.

On the L2ARC  the benefit are less  direct because of the L1
ARC presence. The gains, if present will  be of the similar
nature with   8-10X  gain to   workloads  that  are  lightly
threaded  and served   from L2ARC vs   disk.  Note that it's
possible  to configurewhich   (higher businessvalue)
filesystems are allowed to install in the L2ARC.

One dirty way to evaluate if the  L2ARC will be effective in
your environment is to  consider if the  last X GB  of added
memory had a positive impact on your performance
metrics (does nailing down memory reduces performance ?).
If so then on the graph of performance vs caching you are
still on a positive slope and L2ARC is likely to help. When
request you care most about are served from caches, or
when something else saturates (e.g. total  CPU) then it's
time to stop.

-r



 > -- 
 > Eric D. Mudama
 > edmud...@mail.bounceswoosh.org
 > 
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch

Eric D. Mudama writes:

 > On Mon, Jan 19 at 23:14, Greg Mason wrote:
 > >So, what we're looking for is a way to improve performance, without  
 > >disabling the ZIL, as it's my understanding that disabling the ZIL  
 > >isn't exactly a safe thing to do.
 > >
 > >We're looking for the best way to improve performance, without  
 > >sacrificing too much of the safety of the data.
 > >
 > >The current solution we are considering is disabling the cache  
 > >flushing (as per a previous response in this thread), and adding one  
 > >or two SSD log devices, as this is similar to the Sun storage  
 > >appliances based on the Thor. Thoughts?
 > 
 > In general principles, the evil tuning guide states that the ZIL
 > should be able to handle 10 seconds of expected synchronous write
 > workload.
 > 
 > To me, this implies that it's improving burst behavior, but
 > potentially at the expense of sustained throughput, like would be
 > measured in benchmarking type runs.
 > 
 > If you have a big JBOD array with say 8+ mirror vdevs on multiple
 > controllers, in theory, each VDEV can commit from 60-80MB/s to disk.
 > Unless you are attaching a separate ZIL device that can match the
 > aggregate throughput of that pool, wouldn't it just be better to have
 > the default behavior of the ZIL contents being inside the pool itself?
 > 
 > The best practices guide states that the max ZIL device size should be
 > roughly 50% of main system memory, because that's approximately the
 > most data that can be in-flight at any given instant.
 > 
 > "For a target throughput of X MB/sec and given that ZFS pushes
 > transaction groups every 5 seconds (and have 2 outstanding), we also
 > expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service
 > 100MB/sec of synchronous writes, 1 GBytes of log device should be
 > sufficient."
 > 
 > But, no comments are made on the performance requirements of the ZIL
 > device(s) relative to the main pool devices.  Clicking around finds
 > this entry:
 > 
 > http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
 > 
 > ...which appears to indicate cases where a significant number of ZILs
 > were required to match the bandwidth of just throwing them in the pool
 > itself.
 > 
 > 


Big topic. Some write requests are synchronous and some
not, some start as non synchronous and end up being synced.

For non-synchronous loads, ZFS does not commit data to the
slog. The presence of the slog is transparent and won't
hinder performance.

For synchronous loads, the performance is normally governed
by fewer threads commiting more modest amount of data;
performance here is dominated by latency effect, not disk
throughput and this is where a slog greatly helps (10X).

Now you're right to point out that some workloads might end
up as synchronous while still manageing large quantity of
data. The Storage 7000 line was tweaked to handle some of
those cases. So when commiting more say 10MB in a single
operation, the first MB will go to the SSD but the rest will
actually be send to the main storage pool. All these I/Os being
issued concurrently. The latency response of a 1 MB to our
SSD is expected to be similar to the response of regular
disks.


-r



 > --eric
 > 
 > 
 > -- 
 > Eric D. Mudama
 > edmud...@mail.bounceswoosh.org
 > 
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch

Nicholas Lee writes:
 > Another option to look at is:
 > set zfs:zfs_nocacheflush=1
 > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
 > 
 > Best option is to get a a fast ZIL log device.
 > 
 > 
 > Depends on your pool as well. NFS+ZFS means zfs will wait for write
 > completes before responding to a sync NFS write ops.  If you have a RAIDZ
 > array, writes will be slower than a RAID10 style pool.
 > 

Nicholas,

Raid-Z requires a  more complexity in software however
the total amount of I/O to disk is less than raid-10. So the
net performance effect is often in favor of Raid-10 must not necessarely
so.

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch
Greg Mason writes:
 > We're running into a performance problem with ZFS over NFS. When working 
 > with many small files (i.e. unpacking a tar file with source code), a 
 > Thor (over NFS) is about 4 times slower than our aging existing storage 
 > solution, which isn't exactly speedy to begin with (17 minutes versus 3 
 > minutes).
 > 
 > We took a rough stab in the dark, and started to examine whether or not 
 > it was the ZIL.
 > 
 > Performing IO tests locally on the Thor shows no real IO problems, but 
 > running IO tests over NFS, specifically, with many smaller files we see 
 > a significant performance hit.
 > 
 > Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
 > test again. It completed in just under a minute, around 3 times faster 
 > than our existing storage. This was more like it!
 > 
 > Are there any tunables for the ZIL to try to speed things up? Or would 
 > it be best to look into using a high-speed SSD for the log device?
 > 
 > And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
 > We do, however, need to provide our users with a certain level of 
 > performance, and what we've got with the ZIL on the pool is completely 
 > unacceptable.
 > 
 > Thanks for any pointers you may have...
 > 

I think you found out for the replies, this NFS issue is not
related to ZFS nor a ZIL malfunction in any way.

http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 

NFS (particularly lightly threaded load)  is much speeded up
with any form of SSD|NVRAM storage and that's independant on
the backing filesystem used (provided the Filesystem is safe).

For ZFS the best way to acheive  NFS performance for lightly
threaded loads  is to have a   separate intent log  in a low
latency device such as in the 7000 line.

-r



 > --
 > 
 > Greg Mason
 > Systems Administrator
 > Michigan State University
 > High Performance Computing Center
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] thoughts on parallel backups, rsync, and send/receive

2009-01-26 Thread Ahmed Kamal
Did anyone share a script to send/recv zfs filesystems tree in
parallel, especially if a cap on concurrency can be specified?
Richard, how fast were you taking those snapshots, how fast were the
syncs over the network. For example, assuming a snapshot every 10mins,
is it reasonable to expect to sync every snapshot as they're created
every 10 mins. What would be the limit trying to lower those 10mins
even more
Is it catastrophic if a second zfs send launches, while an older one
is still being run

Regards

On Mon, Jan 26, 2009 at 9:16 AM, Ian Collins  wrote:
> Richard Elling wrote:
>> Recently, I've been working on a project which had agressive backup
>> requirements. I believe we solved the problem with parallelism.  You
>> might consider doing the same.  If you get time to do your own experiments,
>> please share your observations with the community.
>> http://richardelling.blogspot.com/2009/01/parallel-zfs-sendreceive.html
>>
>
> You raise some interesting points about rsync getting bogged down over
> time.  I have been working with a client with a requirement for
> replication between a number of hosts and I have found doing several
> rend/receives made quite an impact.  What I haven't done is try this
> with the latest performance improvements in b105.  Have you?  My guess
> is the gain will be less.
>
> One thing I have yet to do is find the optimum number of parallel
> transfers when there are 100s of filesystems.  I'm looking into making
> this dynamic, based on throughput.
>
> Are you working with OpenSolaris?  I still haven't managed to nail the
> toxic streams problem in Solaris 10, which have curtailed my project.
>
> --
> Ian.
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss