Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Phil Harman
Ian,

It would help to have some config detail (e.g. what options are you using? 
zpool status output; property lists for specific filesystems and zvols; etc)

Some basic Solaris stats can be very helpful too (e.g. peak flow samples of 
vmstat 1, mpstst 1, iostat -xnz 1, etc)

It would also be great to know how you are running you tests.

I'd also like to know what version of NFS and mount options. A network trace 
down to NFS RPC or iSCSI operation level with timings would be great too.

I'm wondering whether your HBA has a write through or write back cache enabled? 
The latter might make things very fast, but could put data at risk if not 
sufficiently non-volatile.

Cheers,
Phil

On 14 Oct 2010, at 22:02, Ian D rewar...@hotmail.com wrote:

 Our next test is to try with a different kind of HBA,
 we have a Dell H800 lying around.
 
 ok... we're making progress.  After swapping the LSI HBA for a Dell H800 the 
 issue disappeared.  Now, I'd rather not use those controllers because they 
 don't have a JBOD mode. We have no choice but to make individual RAID0 
 volumes for each disks which means we need to reboot the server every time we 
 replace a failed drive.  That's not good...
 
 What can we do with the LSI HBA?  Would you call LSI's support?  Is there 
 anything we should try besides the obvious (using the latests 
 firmware/driver)?
 
 To resume the issue, when we copy files from/to the JBODs connected to that 
 HBA using NFS/iSCSI, we get slow transfer rate 20M/s and a 1-2 second pause 
 between each file.   When we do the same experiment locally using the 
 external drives as a local volume (no NFS/iSCSI involved) then it goes upward 
 of 350M/sec with no delay between files. 
 
 Ian
 
 Message was edited by: reward72
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Phil Harman
 
 I'm wondering whether your HBA has a write through or write back cache
 enabled? The latter might make things very fast, but could put data at
 risk if not sufficiently non-volatile.

He already said he has SSD's for dedicated log.  This means the best
solution is to disable WriteBack and just use WriteThrough.  Not only is it
more reliable than WriteBack, it's faster.

And I know I've said this many times before, but I don't mind repeating:  If
you have slog devices, then surprisingly, it actually hurts performance to
enable the WriteBack on the HBA.

Think of it like this:

Speed of a naked disk:  1.0
Speed of a disk with WriteBack:  2.2
Speed of a disk with slog and WB:  2.8
Speed of a disk with slog and no WB:  3.0

Of course those are really rough numbers, that vary by architecture and
usage patterns.  But you get the idea.  The consistent result is that disk
with slog is the fastest, with WB disabled.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-15 Thread Stephan Budach

Am 14.10.10 17:48, schrieb Edward Ned Harvey:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Toby Thain


I don't want to heat up the discussion about ZFS managed discs vs.
HW raids, but if RAID5/6 would be that bad, no one would use it
anymore.

It is. And there's no reason not to point it out. The world has

Well, neither one of the above statements is really fair.

The truth is:  radi5/6 are generally not that bad.  Data integrity failures
are not terribly common (maybe one bit per year out of 20 large disks or
something like that.)

And in order to reach the conclusion nobody would use it, the people using
it would have to first *notice* the failure.  Which they don't.  That's kind
of the point.

Since I started using ZFS in production, about a year ago, on three servers
totaling approx 1.5TB used, I have had precisely one checksum error, which
ZFS corrected.  I have every reason to believe, if that were on a raid5/6,
the error would have gone undetected and nobody would have noticed.


Point taken!

So, what would you suggest, if I wanted to create really big pools? Say 
in the 100 TB range? That would be quite a number of single drives then, 
especially when you want to go with zpool raid-1.


Cheers,
budy

--
Stephan Budach
Jung von Matt/it-services GmbH
Glashüttenstraße 79
20357 Hamburg

Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.bud...@jvm.de
Internet: http://www.jvm.com

Geschäftsführer: Ulrich Pallas, Frank Wilhelm
AG HH HRB 98380

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS cache inconsistencies with Oracle

2010-10-15 Thread Gerry Bragg
A customer is running ZFS version15 on Solaris SPARC 10/08 supporting Oracle 
10.2.0.3 databases in a dev and production test environment.   We have come 
across some cache inconsistencies with one of the Oracle databases where 
fetching a record displays a 'historical value' (that has been changed and 
committed many times).   This is an isolated occurance and is not always 
consistent.  I can't replicate it to other tables.   I'll also be posting a 
note to the ZFS discussion list.



Is it possible for a read to bybpass the write cache and fetch from disk before 
the flush of the cache to disk occurs?  This is a large system that is 
infrequently busy.  The Oracle SGA size is minimized to 1GB per instance and we 
rely more on the ZFS cache, allowing us to fit 'more instances' (many of which 
are cloned snapshots).  We've been running this setup for 2 years.  The 
filesystems are set with compression on, blocksize 8k for oracle datafiles, 
128k for redologs.



Here are the details of the scenerio:



1.   Update statement re-setting existing value. At this point the previous 
value was actually set to -643 prior to the update.  It was originally set to 3 
before today's session:



SQL update [name deleted] set status_cd = 1 where id = 65;

1 row updated.

SQL commit;

Commit complete.

SQL select rowid, id, status_cd from [table name deleted]

SQL where id = 65;

ROWID  ID STATUS_CD

-- -- --

AAAq/DAAERlAAM 65  3



Note that when retrieved the status_cd reverts to the old original value of 3, 
not the previous value of -643.



2.  Oracle trace file proves that the update was issued and committed:



=

PARSING IN CURSOR #1 len=70 dep=0 uid=110 oct=6 lid=110 tim=17554807027344 
hv=3512595279 ad='fd211878'

update [table deleted] set status_cd = 1 where id = 65 END OF STMT PARSE 
#1:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,tim=17554807027340

BINDS #1:

EXEC #1:c=0,e=257,p=0,cr=3,cu=3,mis=0,r=1,dep=0,og=2,tim=17554807027737

WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 
p3=0 obj#=-1 tim=17554807027803 WAIT #1: nam='SQL*Net message from client' ela= 
2999139 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=17554810026992 STAT #1 
id=1 cnt=1 pid=0 pos=1 obj=0 op='UPDATE  [TABLE DELETED] (cr=3 pr=0 pw=0 
time=144 us)'

STAT #1 id=2 cnt=1 pid=1 pos=1 obj=177738 op='INDEX UNIQUE SCAN 
[TABLE_DELETED]_XPK (cr=3 pr=0 pw=0 time=19 us)'

PARSE #2:c=0,e=9,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=17554810027367

XCTEND rlbk=0, rd_only=0

EXEC #2:c=0,e=226,p=0,cr=0,cu=1,mis=0,r=0,dep=0,og=0,tim=17554810027630

WAIT #2: nam='log file sync' ela= 833 buffer#=9408 p2=0 p3=0 obj#=-1 
tim=17554810028507 WAIT #2: nam='SQL*Net message to client' ela= 2 driver 
id=1413697536 #bytes=1 p3=0 obj#=-1 tim=17554810028578 WAIT #2: nam='SQL*Net 
message from client' ela= 1825185 driver id=1413697536 #bytes=1 p3=0 obj#=-1 
tim=17554811853812 = PARSING IN CURSOR #1 len=67 dep=0 
uid=110 oct=3 lid=110 tim=17554811854015 hv=1593702413 ad='fd713640'

select status_cd from [table_deleted] where id = 65 END OF STMT PARSE 
#1:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,tim=17554811854010

BINDS #1:

EXEC #1:c=0,e=91,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,tim=17554811854273

WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 
p3=0 obj#=-1 tim=17554811854327 FETCH 
#1:c=0,e=64,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=2,tim=17554811854436

WAIT #1: nam='SQL*Net message from client' ela= 780 driver id=1413697536 
#bytes=1 p3=0 obj#=-1 tim=17554811855291 FETCH 
#1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=17554811855331

WAIT #1: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 
p3=0 obj#=-1 tim=17554811855366





There are no Oracle or Solaris error messages indicating any issue with this 
update.   Haas anyone seen this behavoir?



The features of ZFS (snapshots/clones/compression) save us a ton of time on 
this platform and we have certainly benefited from it.   Just want to 
understand how something like this could occur and determine how we can prevent 
it in the future.

==
Gerry Bragg
Sr. Developer
Altarum Institute
(734) 516-0825
gerry.br...@altarum.orgmailto:gerry.br...@altarum.org
www.altarum.orghttp://www.altarum.org/
Systems Research For Better Health

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS cache inconsistencies with Oracle

2010-10-15 Thread Enda O'Connor

Hi
so to be absolutely clear
in the same session, you ran an update, commit and select, and the 
select returned an earlier value than the committed update?


Things like
ALTER SESSION set ISOLATION_LEVEL = SERIALIZABLE;

will cause a session to NOT see commits from other sessions, but in 
Oracle one always sees one updates in ones transactions. ( assuming no 
other session makes a change of course )


So are you sure that
1 come other session hasn't mucked with the value between the commit and 
the select in your session.

2 some DB trigger is doing this perhaps, ie setting some default value?

In my experience with DB's, triggers are the root of all evil.

Enda
On 15/10/2010 14:36, Gerry Bragg wrote:

A customer is running ZFS version15 on Solaris SPARC 10/08 supporting
Oracle 10.2.0.3 databases in a dev and production test environment. We
have come across some cache inconsistencies with one of the Oracle
databases where fetching a record displays a 'historical value' (that
has been changed and committed many times). This is an isolated
occurance and is not always consistent. I can't replicate it to other
tables. I'll also be posting a note to the ZFS discussion list.

Is it possible for a read to bybpass the write cache and fetch from disk
before the flush of the cache to disk occurs? This is a large system
that is infrequently busy. The Oracle SGA size is minimized to 1GB per
instance and we rely more on the ZFS cache, allowing us to fit ‘more
instances’ (many of which are cloned snapshots). We’ve been running this
setup for 2 years. The filesystems are set with compression on,
blocksize 8k for oracle datafiles, 128k for redologs.

Here are the details of the scenerio:

1. Update statement re-setting existing value. At this point the
previous value was actually set to -643 prior to the update. It was
originally set to 3 before today’s session:

SQL update [name deleted] set status_cd = 1 where id = 65;

1 row updated.

SQL commit;

Commit complete.

SQL select rowid, id, status_cd from [table name deleted]

SQL where id = 65;

ROWID ID STATUS_CD

-- -- --

AAAq/DAAERlAAM 65 3

Note that when retrieved the status_cd reverts to the old original value
of 3, not the previous value of -643.

2. Oracle trace file proves that the update was issued and committed:

=

PARSING IN CURSOR #1 len=70 dep=0 uid=110 oct=6 lid=110
tim=17554807027344 hv=3512595279 ad='fd211878'

update [table deleted] set status_cd = 1 where id = 65 END OF STMT PARSE
#1:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,tim=17554807027340

BINDS #1:

EXEC #1:c=0,e=257,p=0,cr=3,cu=3,mis=0,r=1,dep=0,og=2,tim=17554807027737

WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=17554807027803 WAIT #1: nam='SQL*Net message
from client' ela= 2999139 driver id=1413697536 #bytes=1 p3=0 obj#=-1
tim=17554810026992 STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='UPDATE
[TABLE DELETED] (cr=3 pr=0 pw=0 time=144 us)'

STAT #1 id=2 cnt=1 pid=1 pos=1 obj=177738 op='INDEX UNIQUE SCAN
[TABLE_DELETED]_XPK (cr=3 pr=0 pw=0 time=19 us)'

PARSE #2:c=0,e=9,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=17554810027367

XCTEND rlbk=0, rd_only=0

EXEC #2:c=0,e=226,p=0,cr=0,cu=1,mis=0,r=0,dep=0,og=0,tim=17554810027630

WAIT #2: nam='log file sync' ela= 833 buffer#=9408 p2=0 p3=0 obj#=-1
tim=17554810028507 WAIT #2: nam='SQL*Net message to client' ela= 2
driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=17554810028578 WAIT #2:
nam='SQL*Net message from client' ela= 1825185 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=17554811853812 = PARSING
IN CURSOR #1 len=67 dep=0 uid=110 oct=3 lid=110 tim=17554811854015
hv=1593702413 ad='fd713640'

select status_cd from [table_deleted] where id = 65 END OF STMT PARSE
#1:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,tim=17554811854010

BINDS #1:

EXEC #1:c=0,e=91,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,tim=17554811854273

WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=17554811854327 FETCH
#1:c=0,e=64,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=2,tim=17554811854436

WAIT #1: nam='SQL*Net message from client' ela= 780 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=17554811855291 FETCH
#1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=17554811855331

WAIT #1: nam='SQL*Net message to client' ela= 0 driver id=1413697536
#bytes=1 p3=0 obj#=-1 tim=17554811855366

There are no Oracle or Solaris error messages indicating any issue with
this update. Haas anyone seen this behavoir?

The features of ZFS (snapshots/clones/compression) save us a ton of time
on this platform and we have certainly benefited from it. Just want to
understand how something like this could occur and determine how we can
prevent it in the future.

==

Gerry Bragg

Sr. Developer

Altarum Institute

(734) 516-0825

gerry.br...@altarum.org mailto:gerry.br...@altarum.org

www.altarum.org http://www.altarum.org/

Systems Research 

Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
As I have mentioned already, we have the same performance issues whether we 
READ or we WRITE to the array, shouldn't that rule out caching issues?

Also we can get great performances with the LSI HBA if we use the JBODs as a 
local file system.  The issues only arise when it is done through iSCSI and NFS.

I'm opening tickets with LSI to see if they can help.

Thanks all!
Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
 He already said he has SSD's for dedicated log.  This
 means the best
 solution is to disable WriteBack and just use
 WriteThrough.  Not only is it
 more reliable than WriteBack, it's faster.
 
 And I know I've said this many times before, but I
 don't mind repeating:  If
 you have slog devices, then surprisingly, it actually
 hurts performance to
 enable the WriteBack on the HBA.

The HBA who gives us problem is a LSI 9200-16e which has no cache whatsoever.  
We do get great performances with a Dell H800 that has cache.  We'll use H800s 
if we have to, but i really would like to find a way to make the LSI's work.

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Available Space Discrepancy

2010-10-15 Thread David Stewart
Using snv_111b and yesterday both the Mac OS X Finder and Solaris File Browser 
started reporting that I had 0 space available on the SMB shares.  Earlier in 
the day I had copied some files from the Mac to the SMB shares and no problems 
reported by the Mac (Automator will report errors if the destination is full 
and it is unable to copy the remaining files).  Later I tried to move a folder 
from one share to another share and the Mac Finder crashed and restarted.  I 
tried it again and after the Finder counted the number of files it was going to 
move, it reported that there wasn't enough space available when there should 
have been.  Now, I know I did at least one thing I had not intended: dragging 
from one share to another will not MOVE, but will instead COPY.  That was not 
my intention.

I have 5 shares on the pool (data, movies, music, photos, scans) and zfs list 
reports:
NAME USED AVAIL
mediaz1 4.00T  0
data 760k 0
movies 2.57T 0
music 874G 0
photos 360G 0
scans 235G 0

zpool list reports:
NAME SIZE USED AVAIL
mediaz1 5.44T 5.35T 86.7G

and

zpool iostat reports:
pool used avail operations read write bandwidth read write
mediaz1 5.35T 86.7G 248 2 30.1M 10.4k

There should be about 86G free and that sounds about right, but I don't 
understand why the GUI Finder and File browser report 0 as does zfs list.  And 
how do I correct this or myself?

David

BTW, I DID search the forums and Google and did not find a solution.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Marty Scholes
 I've had a few people sending emails directly
 suggesting it might have something to do with the
 ZIL/SLOG.   I guess I should have said that the issue
 happen both ways, whether we copy TO or FROM the
 Nexenta box.

You mentioned a second Nexenta box earlier.  To rule out client-side issues, 
have you considered testing with Nexenta as the iSCSI/NFS client?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Phil Harman
 As I have mentioned already, it would be useful to know more about the 
config, how the tests are being done, and to see some basic system 
performance stats.


On 15/10/2010 15:58, Ian D wrote:

As I have mentioned already, we have the same performance issues whether we 
READ or we WRITE to the array, shouldn't that rule out caching issues?

Also we can get great performances with the LSI HBA if we use the JBODs as a 
local file system.  The issues only arise when it is done through iSCSI and NFS.

I'm opening tickets with LSI to see if they can help.

Thanks all!
Ian


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] adding new disks and setting up a raidz2

2010-10-15 Thread Cindy Swearingen

Derek,

The c0t5000C500268CFA6Bd0 disk has some kind of label problem.
You might compare the label of this disk to the other disks.

I agree with Richard that using whole disks (use the d0 device)
is best.

You could also relabel it manually by using the format--fdisk--
delete the current partition, create a new partition using the
EFI option, and save the configuration.

Thanks,

Cindy




On 10/14/10 21:21, Derek G Nokes wrote:

Thank you both.  I did try without specifying the 's0' portion before posting 
and got the following error:

r...@dnokes.homeip.net:~# zpool create marketData raidz2 c0t5000C5001A6B9C5Ed0 
c0t5000C5001A81E100d0 c0t5000C500268C0576d0 c0t5000C500268C5414d0 
c0t5000C500268CFA6Bd0 c0t5000C500268D0821d0
cannot label 'c0t5000C500268CFA6Bd0': try using fdisk(1M) and then provide a 
specific slice

Any idea what this means?

Thanks again.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
 You mentioned a second Nexenta box earlier.  To rule
 out client-side issues, have you considered testing
 with Nexenta as the iSCSI/NFS client?

If you mean running the NFS client AND server on the same box then yes, and it 
doesn't show the same performance issues.  It's only when a Linux box 
SEND/RECEIVE data to the NFS/iSCSI shares that we have problems.  But if the 
Linux box send/receive file through scp on the external disks mounted by the 
Nexenta box as a local filesystem then there is no problem.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
 As I have mentioned already, it would be useful to
  know more about the 
 onfig, how the tests are being done, and to see some
 basic system 
 performance stats.

I will shortly.  Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Darren J Moffat

On 15/10/2010 19:09, Ian D wrote:

It's only when a Linux box SEND/RECEIVE data to the NFS/iSCSI shares that we 
have problems.  But if the Linux box send/receive file through scp on the 
external disks mounted by the Nexenta box as a local filesystem then there is 
no problem.


Does the Linux box have the same issue to any other server ?
What if the client box isn't Linux but Solaris or Windows or MacOS X ?

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
 Does the Linux box have the same issue to any other
 server ?
 What if the client box isn't Linux but Solaris or
 Windows or MacOS X ?

That would be a good test.  We'll try that.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
After contacting LSI they say that the 9200-16e HBA is not supported in 
OpenSolaris, just Solaris.  Aren't Solaris drivers the same as OpenSolaris?

Is there anyone here using 9200-16e HBAs?  What about the 9200-8e?  We have a 
couple lying around and we'll test one shortly.

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-USAS2-L8i

2010-10-15 Thread Maurice Volaski

The mpt_sas driver supports it. We've had LSI 2004 and 2008 controllers hang
for quite some time when used with SuperMicro chassis and Intel X25-E SSDs
(OSOL b134 and b147). It seems to be a firmware issue that isn't fixed with
the last update.


Do you mean to include all the PCie cards not just the AOC-USAS2-L8i 
and when it's directly connected and not through the backplane? Prior 
reports here seem to be implicating the card only when it was 
connected to the backplane.

--

Maurice Volaski, maurice.vola...@einstein.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
A little setback  We found out that we also have the issue with the Dell 
H800 controllers, not just the LSI 9200-16e.  With the Dell it's initially 
faster as we benefit from the cache, but after a little while it goes sour- 
from 350MB/sec down to less than 40MB/sec.  We've also tried with a LSI 9200-8e 
with the same results.

So to recap...  No matter what HBA we use, copying through the network to/from 
the external drives is painfully slow when access is done through either NFS or 
iSCSI.  HOWEVER, it is plenty fast when we do a scp where the data is written 
to the external drives (or internal ones for that matter) when they are seen by 
the Nexenta box as local drives- ie when neither NFS or iSCSI are involved.  

What now?  :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread erik.ableson
On 15 oct. 2010, at 22:19, Ian D wrote:

 A little setback  We found out that we also have the issue with the Dell 
 H800 controllers, not just the LSI 9200-16e.  With the Dell it's initially 
 faster as we benefit from the cache, but after a little while it goes sour- 
 from 350MB/sec down to less than 40MB/sec.  We've also tried with a LSI 
 9200-8e with the same results.
 
 So to recap...  No matter what HBA we use, copying through the network 
 to/from the external drives is painfully slow when access is done through 
 either NFS or iSCSI.  HOWEVER, it is plenty fast when we do a scp where the 
 data is written to the external drives (or internal ones for that matter) 
 when they are seen by the Nexenta box as local drives- ie when neither NFS or 
 iSCSI are involved.  

Sounds an awful lot like client side issues coupled possibly with networking 
problems.

Have you looked into disabling the Nagle algorithm on the client side? That's 
something that can impact both iSCSI and NFS badly, but ssh is usually not as 
affected... I vaguely remember that being a real performance killer on some 
Linux versions.

Another thing to check would be ensure that noatime is set so that your reads 
aren't triggering writes across the network as well.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Saxon, Will
 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org 
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ian D
 Sent: Friday, October 15, 2010 4:19 PM
 To: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Performance issues with iSCSI under Linux
 
 A little setback  We found out that we also have the 
 issue with the Dell H800 controllers, not just the LSI 
 9200-16e.  With the Dell it's initially faster as we benefit 
 from the cache, but after a little while it goes sour- from 
 350MB/sec down to less than 40MB/sec.  We've also tried with 
 a LSI 9200-8e with the same results.
 
 So to recap...  No matter what HBA we use, copying through 
 the network to/from the external drives is painfully slow 
 when access is done through either NFS or iSCSI.  HOWEVER, it 
 is plenty fast when we do a scp where the data is written to 
 the external drives (or internal ones for that matter) when 
 they are seen by the Nexenta box as local drives- ie when 
 neither NFS or iSCSI are involved.  

Has anyone suggested either removing L2ARC/SLOG entirely or relocating them so 
that all devices are coming off the same controller? You've swapped the 
external controller but the H700 with the internal drives could be the real 
culprit. Could there be issues with cross-controller IO in this case? Does the 
H700 use the same chipset/driver as the other controllers you've tried? 

I don't have a good understanding of where the various software components here 
fit together, but it seems like the problem is not with the controller(s) but 
with whatever is queueing network IO requests to the storage subsystem (or 
controlling queues/buffers/etc for this). Do NFS and iSCSI share a code path 
for this? 

-Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-15 Thread Bob Friesenhahn

On Wed, 13 Oct 2010, Edward Ned Harvey wrote:


raidzN takes a really long time to resilver (code written inefficiently,
it's a known problem.)  If you had a huge raidz3, it would literally never
finish, because it couldn't resilver as fast as new data appears.  A week


In what way is the code written inefficiently?

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ian D
 Has anyone suggested either removing L2ARC/SLOG
 entirely or relocating them so that all devices are
 coming off the same controller? You've swapped the
 external controller but the H700 with the internal
 drives could be the real culprit. Could there be
 issues with cross-controller IO in this case? Does
 the H700 use the same chipset/driver as the other
 controllers you've tried? 

We'll try that.  We have a couple other devices we could use for the SLOG like 
a DDRDrive X1 and an OCZ Z-Drive which are both PCIe cards and don't use the 
local controller.

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-15 Thread Marty Scholes
Sorry, I can't not respond...

Edward Ned Harvey wrote:
 whatever you do, *don't* configure one huge raidz3.

Peter, whatever you do, *don't* make a decision based on blanket 
generalizations.

 If you can afford mirrors, your risk is much lower.
  Because although it's
 hysically possible for 2 disks to fail simultaneously
 and ruin the pool,
 the probability of that happening is smaller than the
 probability of 3
 simultaneous disk failures on the raidz3.

Edward, I normally agree with most of what you have to say, but this has gone 
off the deep end.  I can think of counter-use-cases far faster than I can type.

  Due to
 smaller resilver window.

Coupled with a smaller MTTDL, smaller cabinet space yield, smaller $/GB ratio, 
etc.

 I highly endorse mirrors for nearly all purposes.

Clearly.

Peter, go straight to the source.

http://blogs.sun.com/roch/entry/when_to_and_not_to

In short:
1. vdev_count = spindle_count / (stripe_width + parity_count)
2. IO/s is proprotional to vdev_count
3. Usable capacity is proportional to stripe_width * vdev_count
4. A mirror can be approximated by a stripe of width one
5. Mean time to data loss increases exponentially with parity_count
6. Resilver time increases (super)linearly with stripe width

Balance capacity available, storage needed, performance needed and your own 
level of paranoia regarding data loss.

My home server's main storage is a 22 (19 + 3) disk RAIDZ3 pool backed up 
hourly to a 14 (11+3) RAIDZ3 backup pool.

Clearly this is not a production Oracle server.  Equally clear is that my 
paranoia index is rather high.

ZFS will let you choose the combination of stripe width and parity count which 
works for you.

There is no one size fits all.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-15 Thread Freddie Cash
On Fri, Oct 15, 2010 at 3:16 PM, Marty Scholes martyscho...@yahoo.com wrote:
 My home server's main storage is a 22 (19 + 3) disk RAIDZ3 pool backed up 
 hourly to a 14 (11+3) RAIDZ3 backup pool.

How long does it take to resilver a disk in that pool?  And how long
does it take to run a scrub?

When I initially setup a 24-disk raidz2 vdev, it died trying to
resilver a single 500 GB SATA disk.  I/O under 1 MBps, all 24 drives
thrashing like crazy, could barely even login to the system and type
onscreen.  It was a nightmare.

That, and normal (no scrub, no resilver) disk I/O was abysmal.

Since then, I've avoided any vdev with more than 8 drives in it.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-15 Thread Ross Walker
On Oct 15, 2010, at 9:18 AM, Stephan Budach stephan.bud...@jvm.de wrote:

 Am 14.10.10 17:48, schrieb Edward Ned Harvey:
 
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Toby Thain
 
 I don't want to heat up the discussion about ZFS managed discs vs.
 HW raids, but if RAID5/6 would be that bad, no one would use it
 anymore.
 It is. And there's no reason not to point it out. The world has
 Well, neither one of the above statements is really fair.
 
 The truth is:  radi5/6 are generally not that bad.  Data integrity failures
 are not terribly common (maybe one bit per year out of 20 large disks or
 something like that.)
 
 And in order to reach the conclusion nobody would use it, the people using
 it would have to first *notice* the failure.  Which they don't.  That's kind
 of the point.
 
 Since I started using ZFS in production, about a year ago, on three servers
 totaling approx 1.5TB used, I have had precisely one checksum error, which
 ZFS corrected.  I have every reason to believe, if that were on a raid5/6,
 the error would have gone undetected and nobody would have noticed.
 
 Point taken!
 
 So, what would you suggest, if I wanted to create really big pools? Say in 
 the 100 TB range? That would be quite a number of single drives then, 
 especially when you want to go with zpool raid-1.

A pool consisting of 4 disk raidz vdevs (25% overhead) or 6 disk raidz2 vdevs 
(33% overhead) should deliver the storage and performance for a pool that size, 
versus a pool of mirrors (50% overhead).

You need a lot if spindles to reach 100TB.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] how to replace failed vdev on non redundant pool?

2010-10-15 Thread Cassandra Pugh
Hello,

I would like to know how to replace a failed vdev in a non redundant pool?

I am using fiber attached disks, and cannot simply place the disk back into
the machine, since it is virtual.

I have the latest kernel from sept 2010 that includes all of the new ZFS
upgrades.

Please, can you help me?
-
Cassandra
(609) 243-2413
Unix Administrator


From a little spark may burst a mighty flame.
-Dante Alighieri
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Ross Walker
On Oct 15, 2010, at 5:34 PM, Ian D rewar...@hotmail.com wrote:

 Has anyone suggested either removing L2ARC/SLOG
 entirely or relocating them so that all devices are
 coming off the same controller? You've swapped the
 external controller but the H700 with the internal
 drives could be the real culprit. Could there be
 issues with cross-controller IO in this case? Does
 the H700 use the same chipset/driver as the other
 controllers you've tried? 
 
 We'll try that.  We have a couple other devices we could use for the SLOG 
 like a DDRDrive X1 and an OCZ Z-Drive which are both PCIe cards and don't use 
 the local controller.

What mount options are you using on the Linux client for the NFS share?

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to replace failed vdev on non redundant pool?

2010-10-15 Thread Scott Meilicke
If the pool is non-redundant and your vdev has failed, you have lost your data. 
Just rebuild the pool, but consider a redundant configuration. 

On Oct 15, 2010, at 3:26 PM, Cassandra Pugh wrote:

 Hello, 
 
 I would like to know how to replace a failed vdev in a non redundant pool?
 
 I am using fiber attached disks, and cannot simply place the disk back into 
 the machine, since it is virtual.  
 
 I have the latest kernel from sept 2010 that includes all of the new ZFS 
 upgrades.
 
 Please, can you help me?
 -
 Cassandra
 (609) 243-2413
 Unix Administrator
 
 
 From a little spark may burst a mighty flame.
 -Dante Alighieri 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-15 Thread Marty Scholes
 On Fri, Oct 15, 2010 at 3:16 PM, Marty Scholes
 martyscho...@yahoo.com wrote:
  My home server's main storage is a 22 (19 + 3) disk
 RAIDZ3 pool backed up hourly to a 14 (11+3) RAIDZ3
 backup pool.
 
 How long does it take to resilver a disk in that
 pool?  And how long
 does it take to run a scrub?
 
 When I initially setup a 24-disk raidz2 vdev, it died
 trying to
 resilver a single 500 GB SATA disk.  I/O under 1
 MBps, all 24 drives
 thrashing like crazy, could barely even login to the
 system and type
 onscreen.  It was a nightmare.
 
 That, and normal (no scrub, no resilver) disk I/O was
 abysmal.
 
 Since then, I've avoided any vdev with more than 8
 drives in it.

MY situation is kind of unique.  I picked up 120 15K 73GB FC disks early this 
year for $2 per.  As such, spindle count is a non-issue.  As a home server, it 
has very little need for write iops and I have 8 disks for L2ARC on the main 
pool.

Main pool is at 40% capacity and backup pool is at 65% capacity.  Both take 
about 70 minutes to scrub.  The last time I tested a resilver it took about 3 
hours.

The difference is that these are low capacity 15K FC spindles and the pool has 
very little sustained I/O; it only bursts now and again.  Resilvers would go 
mostly uncontested, and with RAIDZ3 + autoreplace=off, I can actually schedule 
a resilver.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZPool creation brings down the host

2010-10-15 Thread Anand Bhakthavatsala
Thanks James for the response.

Please find attached here with the crash dump that we got from the admin.

Regards,
Anand





From: James C. McPherson j...@opensolaris.org
To: Ramesh Babu rama.b...@gmail.com
Cc: zfs-discuss@opensolaris.org; anand_...@yahoo.com
Sent: Thu, 7 October, 2010 11:56:36 AM
Subject: Re: [zfs-discuss] ZPool creation brings down the host

On  7/10/10 03:46 PM, Ramesh Babu wrote:
 I am trying to create ZPool using single veritas volume. The host is going
 down as soon as I issue zpool create command. It looks like the command is
 crashing and bringing host down. Please let me know what the issue might
 be.Below is the command used, textvol is the veritas volume and testpool
 is the name of pool which I am tyring to create.

 zpool create testpool /dev/vx/dsk/dom/textvol


That's not a configuration that I'd recommend - you're layering
one volume management system on top of another. It seems that
it's getting rather messy inside the kernel.


Do you have the panic stack trace we can look at, and/or a
crash dump?



James C. McPherson
--
Oracle
http://www.jmcp.homeunix.com/blog


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZPool creation brings down the host

2010-10-15 Thread Anand Bhakthavatsala
Thanks you very much Victor for the update.

Regards,
Anand





From: Victor Latushkin victor.latush...@oracle.com
To: j...@opensolaris.org
Cc: Anand Bhakthavatsala anand_...@yahoo.com; zfs-discuss discuss 
zfs-discuss@opensolaris.org
Sent: Fri, 8 October, 2010 1:33:57 PM
Subject: Re: [zfs-discuss] ZPool creation brings down the host


On Oct 8, 2010, at 10:25 AM, James C. McPherson wrote:

 On  8/10/10 03:28 PM, Anand Bhakthavatsala wrote:
 ...
 --
 *From:* James C. McPherson j...@opensolaris.org
 *To:* Ramesh Babu rama.b...@gmail.com
 
 On 7/10/10 03:46 PM, Ramesh Babu wrote:
  I am trying to create ZPool using single veritas volume. The host is going
  down as soon as I issue zpool create command. It looks like the command is
  crashing and bringing host down. Please let me know what the issue might
  be.Below is the command used, textvol is the veritas volume and testpool
  is the name of pool which I am tyring to create.
 
  zpool create testpool /dev/vx/dsk/dom/textvol
 
 
 That's not a configuration that I'd recommend - you're layering
 one volume management system on top of another. It seems that
 it's getting rather messy inside the kernel.
 
 
 Do you have the panic stack trace we can look at, and/or a
 crash dump?
 ...
 
 
 vxioioctl+0x4c0(1357918, 42a, 0, ff0, 10, 0)
 vdev_disk_open+0x4c4(300036fd9c0, 7c00, 2a100fe3440, 18dbc00, 
3000cf04900,ctor
 18c0268)
 vdev_open+0x9c(300036fd9c0, 1, 1274400, 0, 3000e647800, 6)
 vdev_root_open+0x48(30004036080, 2a100fe35b8, 2a100fe35b0, 0, 7c00, 138)
 vdev_open+0x9c(30004036080, 1c, 0, 0, 3000e647800, 6)
 vdev_create+4(30004036080, 4, 0, 130e3c8, 0, 130e000)
 spa_create+0x1a4(0, 30011ffb500, 0, 300124cc040, 0, 3000e647800)
 zfs_ioc_pool_create+0x18c(30008524000, 0, 0, 74, 0, 300124cc040)
 zfsdev_ioctl+0x184(0, 18dbff0, ffbfa728, 0, 0, 1000)
 fop_ioctl+0x20(60015662e40, 5a00, ffbfa728, 13, 3000a4407a0, 127aa58)
 ioctl+0x184(3, 3000cb5fd28, ffbfa728, 0, 0, 5a00)
 syscall_trap32+0xcc(3, 5a00, ffbfa728, 0, 0, ffbfa270)
 
 
 Looks like you need to ask Symantec what's going on in their
 vxioioctl function.

This is most likely

6940833 vxio`vxioioctl() panics when zfs passes it a NULL rvalp via ldi_ioctl()

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6940833

victor

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-15 Thread Stephan Budach

Am 12.10.10 14:21, schrieb Edward Ned Harvey:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Stephan Budach

   c3t211378AC0253d0  ONLINE   0 0 0

How many disks are there inside of c3t211378AC0253d0?

How are they configured?  Hardware raid 5?  A mirror of two hardware raid
5's?  The point is:  This device, as seen by ZFS, is not a pure storage
device.  It is a high level device representing some LUN or something, which
is configured  controlled by hardware raid.

If there's zero redundancy in that device, then scrub would probably find
the checksum errors consistently and repeatably.

If there's some redundancy in that device, then all bets are off.  Sometimes
scrub might read the good half of the data, and other times, the bad half.


But then again, the error might not be in the physical disks themselves.
The error might be somewhere in the raid controller(s) or the interconnect.
Or even some weird unsupported driver or something.

Both raid boxes run raid6 with 16 drives each. This is the reason I was 
running a non-mirrored pool in the first place.
I fully understand that zfs' power comes to play, when you're running 
with multiple independent drives, but that was what I got at hand.


I now also got what you meant by good half but I don't dare to say 
whether or not this is also the case in a raid6 setup.


Regards

--
Stephan Budach
Jung von Matt/it-services GmbH
Glashüttenstraße 79
20357 Hamburg

Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.bud...@jvm.de
Internet: http://www.jvm.com

Geschäftsführer: Ulrich Pallas, Frank Wilhelm
AG HH HRB 98380

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZPOOL_CONFIG_IS_HOLE

2010-10-15 Thread Matt Keenan

Hi,

Can someone shed some light on what this ZPOOL_CONFIG is exactly.
At a guess is it a bad sector of the disk, non writable and thus ZFS 
marks it as a hole ?


cheers

Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] New STEP pkgs built via autoTSI

2010-10-15 Thread Super-User
The following new test versions have had STEP pkgs built for them.

[You are receiving this email because you are listed as the owner of the
  testsuite in the STC.INFO file, or you are on the s...@sun.com alias]


tcp v2.7.10 STEP pkg built for Solaris Snv
zfstest v1.23 STEP pkg built for Solaris Snv
tcp v2.6.11 STEP pkg built for Solaris S10
zfstest v1.23 STEP pkg built for Solaris S10
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZPOOL_CONFIG_IS_HOLE

2010-10-15 Thread Mark Musante
You should only see a HOLE in your config if you removed a slog after having 
added more stripes.  Nothing to do with bad sectors.

On 14 Oct 2010, at 06:27, Matt Keenan wrote:

 Hi,
 
 Can someone shed some light on what this ZPOOL_CONFIG is exactly.
 At a guess is it a bad sector of the disk, non writable and thus ZFS marks it 
 as a hole ?
 
 cheers
 
 Matt
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-15 Thread Edward Ned Harvey
 From: Stephan Budach [mailto:stephan.bud...@jvm.de]
 
 Point taken!
 
 So, what would you suggest, if I wanted to create really big pools? Say
 in the 100 TB range? That would be quite a number of single drives
 then, especially when you want to go with zpool raid-1.

You have a lot of disks.  You either tell the hardware to manage a lot of
disks, and then tell ZFS to manage a single device, and you take unnecessary
risk and performance degradation for no apparent reason ...

Or you tell ZFS to manage a lot of disks.  Either way, you have a lot of
disks that need to be managed by something.  Why would you want to make that
hardware instead of ZFS?

For 100TB ... I suppose you have 2TB disks.  I suppose you have 12 buses.  I
would make a raidz1 using 1 disk from bus0, bus1, ... bus5.  I would make
another raidz1 vdev using a disk from bus6, bus7, ... bus11.  And so forth.
Then, even if you lose a whole bus, you still haven't lost your pool.  Each
raidz1 vdev would be 6 disks with a capacity of 5, so you would have a total
of 10 vdev's and that means 5 disks on each bus.

Or do whatever you want.  The point is yes, give all the individual disks to
ZFS.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to replace failed vdev on non redundant pool?

2010-10-15 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Cassandra Pugh
 
 I would like to know how to replace a failed vdev in a non redundant
 pool?

Non redundant ... Failed ... What do you expect?  This seems like a really
simple answer...  You can't.  Unless perhaps I've misunderstood the
question, or the question wasn't asked right or something...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-15 Thread Ian Collins

On 10/16/10 12:29 PM, Marty Scholes wrote:

On Fri, Oct 15, 2010 at 3:16 PM, Marty Scholes
martyscho...@yahoo.com  wrote:
 

My home server's main storage is a 22 (19 + 3) disk
   

RAIDZ3 pool backed up hourly to a 14 (11+3) RAIDZ3
backup pool.

How long does it take to resilver a disk in that
pool?  And how long
does it take to run a scrub?

When I initially setup a 24-disk raidz2 vdev, it died
trying to
resilver a single 500 GB SATA disk.  I/O under 1
MBps, all 24 drives
thrashing like crazy, could barely even login to the
system and type
onscreen.  It was a nightmare.

That, and normal (no scrub, no resilver) disk I/O was
abysmal.

Since then, I've avoided any vdev with more than 8
drives in it.
 

MY situation is kind of unique.  I picked up 120 15K 73GB FC disks early this 
year for $2 per.  As such, spindle count is a non-issue.  As a home server, it 
has very little need for write iops and I have 8 disks for L2ARC on the main 
pool.

   

I'd hate to be paying your power bill!


Main pool is at 40% capacity and backup pool is at 65% capacity.  Both take 
about 70 minutes to scrub.  The last time I tested a resilver it took about 3 
hours.

   
So a tiny fast drive takes three hours, consider how long a 30x bigger, 
much slower drive will take.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss