Re: [zfs-discuss] zfs on a raid box

2007-11-26 Thread Paul Boven
Hi everyone,

I've had some time to upgrade the machine in question to nv-b77 and run
the same tests. And I'm happy to report that now, hotspares work a lot
better. The only question remaining for us: how long for these changes
to be integrated into a supported Solaris release?

See below for some logs.

# zpool history data
History for 'data':
2007-11-22.14:48:18 zpool create -f data raidz2 c4t0d0 c4t1d0 c4t2d0
c4t3d0 c4t4d0 c4t5d0 c4t6d0 c4t8d0 c4t9d0 c4t10d0 spare c4t11d0 c4t12d0

>From /var/adm/messages:
Nov 22 15:15:52 ddd scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd16):
Error for Command: write(10)   Error Level: Fatal
Requested Block: 103870006 Error Block: 103870006
Vendor: transtec   Serial Number:
Sense Key: Not_Ready
ASC: 0x4 (LUN not ready intervention required), ASCQ: 0x3, FRU: 0x0
(and about 27 more of these, until 15:16:02)

Nov 22 15:16:12 ddd scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd16): offline or
reservation conflict
(95 of these, until 15:43:49, almost half an hour later)

And then the console showed "The device has been offlined and marked as
faulted. An attemt will be made to activate a hotspare if available"

And my current zpool status shows:
# zpool status
  pool: data
 state: DEGRADED
status: One or more devices are faulted in response to persistent
errors. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the
device  repaired.
 scrub: resilver completed with 0 errors on Thu Nov 22 16:09:49 2007
config:

NAME   STATE READ WRITE CKSUM
data   DEGRADED 0 0 0
  raidz2   DEGRADED 0 0 0
c4t0d0 ONLINE   0 0 0
c4t1d0 ONLINE   0 0 0
spare  DEGRADED 0 0 0
  c4t2d0   FAULTED  0 23.7K 0  too many errors
  c4t11d0  ONLINE   0 0 0
c4t3d0 ONLINE   0 0 0
c4t4d0 ONLINE   0 0 0
c4t5d0 ONLINE   0 0 0
c4t6d0 ONLINE   0 0 0
c4t8d0 ONLINE   0 0 0
c4t9d0 ONLINE   0 0 0
c4t10d0ONLINE   0 0 0
spares
  c4t11d0  INUSE currently in use
  c4t12d0  AVAIL

One remark: I find the overview above a bit confusing ('spare'
apparently is 'DEGRADED' and consists of C4t2d0 and c4t11d0) but the
hotspare was properly activated this time and my pool is otherwise in
good health.

Thanks everyone for the replies and suggestions,

Regards, Paul Boven.
-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-21 Thread Paul Boven
Hi Dan,

Dan Pritts wrote:
> On Mon, Nov 19, 2007 at 11:10:32AM +0100, Paul Boven wrote:
>> Any suggestions on how to further investigate / fix this would be very
>> much welcomed. I'm trying to determine whether this is a zfs bug or one
>> with the Transtec raidbox, and whether to file a bug with either
>> Transtec (Promise) or zfs.

> the way i'd try to do this would be to use the same box under solaris
> software RAID, or better yet linux or windows software RAID (to make
> sure it's not a solaris device driver problem). 
> Does pulling the disk then get noticed?  If so, it's a zfs bug.  

Excellent suggestion, and today I had some time to give it a try.
I created a 4 disk SVM volume (2x 2-disk stripe, mirrored, with 2 more
disks as hot spare).

d10 -m /dev/md/rdsk/d11 /dev/md/rdsk/d12 1
d11 1 2 /dev/rdsk/c4t0d0s0 /dev/rdsk/c4t1d0s0 -i 1024b -h hsp001
d12 1 2 /dev/rdsk/c4t2d0s0 /dev/rdsk/c4t3d0s0 -i 1024b -h hsp001
hsp001 c4t4d0s0 c4t5d0s0

I started a write and then pulled a disk. And without any further
probing, SVM put a hotspare in place and started resyncing:

d10  m  463GB d11 d12 (resync-0%)
d11  s  463GB c4t0d0s0 c4t1d0s0
d12  s  463GB c4t2d0s0 (resyncing-c4t4d0s0) c4t3d0s0
hsp001   h  - c4t4d0s0 (in-use) c4t5d0s0

This is all on b76. The issue seems to be with zfs indeed. I'm currently
downloading b77, and once that is installed I'll have to see whether the
fault diagnostics and hot spare handling have indeed improved as several
people here have pointed out.

Regards, Paul Boven.
-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-20 Thread Paul Boven
Hi MP,

MP wrote:
>> but my issue is that
>> not only the 'time left', but also the progress
>> indicator itself varies
>> wildly, and keeps resetting itself to 0%, not giving
>> any indication that
> 
> Are you sure you are not being hit by this bug:
> 
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667
> 
> i.e. scrub or resilver get's reset to 0% on a snapshot creation or deletion.
>Cheers.

I'm very sure of that: I've never done a snapshot on these, and I am the
only user on the machine (it's not in production yet).

Regards, Paul Boven.
-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz2

2007-11-20 Thread Paul Boven
Hi Eric, everyone,

Eric Schrock wrote:
> There have been many improvements in proactively detecting failure,
> culminating in build 77 of Nevada.  Earlier builds:
> 
> - Were unable to distinguish device removal from devices misbehaving,
>   depending on the driver and hardware.
> 
> - Did not diagnose a series of I/O failures as disk failure.
> 
> - Allowed several (painful) SCSI retries and continued to queue up I/O,
>   even if the disk was fatally damaged.

> Most classes of hardware would behave reasonably well on device removal,
> but certain classes caused cascading failures in ZFS, all which should
> be resolved in build 77 or later.

I seem to be having exactly the problems you are describing (see my
postings with the subject 'zfs on a raid box'). So I would very much
like to give b77 a try. I'm currently running b76, as that's the latest
sxce that's available. Are the sources to anything beyond b76 already
available? Would I need to build it, or bfu?

I'm seeing zfs not making use of available hot-spares when I pull a
disk, long and indeed painful SCSI retries and very poor write
performance on a degraded zpool - I hope to be able to test if b77 fares
any better with this.

Regards, Paul Boven.
-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-19 Thread Paul Boven
Hi Tom, everyone,

Tom Mooney wrote:
> A little extra info:
> ZFS brings in a ZFS spare device the next time the pool is accessed, not
> a raidbox hot spare. Resilvering starts automatically and increases disk
> access times by about 30%. The first hour of estimated time left ( for
> 5-6 TB pools ) is wildly inaccurate, but it starts to settle down after
> that.

Thanks for your reply. I'm talking about zfs hot spares, not the hot
spare functionality of the raid box:

# zpool create -f data raidz c4t0d0 c4t0d1 c4t0d2 c4t0d3 c4t0d4 c4t0d5
c4t0d6 c4t0d7 c4t0d8 c4t0d9 spare c4t0d10 c4t0d11

I did my initial tests by pulling a disk during a 100GB sequential
write, so that should have kicked in a hot spare right away. But no hot
spare was activated (as shown by 'zpool status' and write performance
fell to less than 25%.

I have also tried to start resilvering manually, but that doesn't seem
to work either. I've heard from several people that currently, zfs has
problems with reporting the 'estimated time left' - but my issue is that
not only the 'time left', but also the progress indicator itself varies
wildly, and keeps resetting itself to 0%, not giving any indication that
the resilvering will ever finish. And with nv-b76, 'zpool status' simply
hangs when there is a drive missing, so I can't even really keep track
of the resilvering, if any.

So, at least for me, hot spare functionality in zfs seems completely broken.

Any suggestions on how to further investigate / fix this would be very
much welcomed. I'm trying to determine whether this is a zfs bug or one
with the Transtec raidbox, and whether to file a bug with either
Transtec (Promise) or zfs.

Regards, Paul Boven.


-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-16 Thread Paul Boven
Hi Dan,

Dan Pritts wrote:
> On Tue, Nov 13, 2007 at 12:25:24PM +0100, Paul Boven wrote:

>> We've building a storage system that should have about 2TB of storage
>> and good sequential write speed. The server side is a Sun X4200 running
>> Solaris 10u4 (plus yesterday's recommended patch cluster), the array we
>> bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and
>> it's connected to the Sun through U320-scsi.
> 
> We are doing basically the same thing with simliar Western Scientific
> (wsm.com) raids, based on infortrend controllers.  ZFS notices when we
> pull a disk and goes on and does the right thing.
> 
> I wonder if you've got a scsi card/driver problem.  We tried using
> an Adaptec card with solaris with poor results; switched to LSI,
> it "just works".

Thanks for your reply. The SCSI-card in the X4200 is a Sun Single
Channel U320 card that came with the system, but the PCB artwork does
sport a nice 'LSI LOGIC' imprint.

So, just to make sure we're talking about the same thing here - your
drives are SATA, you're exporting each drive through the Western
Scientific raidbox as a seperate volume, and zfs actually brings in a
hot spare when you pull a drive?

Over here, I've still not been able to accomplish that - even after
installing Nevada b76 on the machine, removing a disk will not cause a
hot-spare to become active, nor does resilvering start. Our Transtec
raidbox seems to be based on a chipset by Promise, by the way.

Regards, Paul Boven.
-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs on a raid box

2007-11-13 Thread Paul Boven
Hi everyone,

We've building a storage system that should have about 2TB of storage
and good sequential write speed. The server side is a Sun X4200 running
Solaris 10u4 (plus yesterday's recommended patch cluster), the array we
bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and
it's connected to the Sun through U320-scsi.

Now the raidbox was sold to us as doing JBOD and various other raid
levels, but JBOD turns out to mean 'create a single-disk stripe for
every drive'. Which works, after a fashion: When using a 12-drive zfs
with raidz and 1 hotspare, I get 132MB/s write performance, with raidz2
it's still 112MB/s. If instead I configure the array as a Raid-50
through the hardware raid controller, I can only manage 72MB/s.
So at a first glance, this seems a good case for zfs.

Unfortunately, if I then pull a disk from the zfs array, it will keep
trying to write to this disk, and will never activate the hot-spare. So
a zpool status will then show the pool as 'degraded', one drive marked
as unavailable - and the hot-spare still marked as available. Write
performance also drops to about 32MB/s.

If I then try to activate the hot-spare by hand (zpool replace  ) the resilvering starts, but never makes it past 10% -
it seems to restart all the time. As this box is not in production yet,
and I'm the only user on it, I'm 100% sure that there is nothing
happening on the zfs filesystem during the resilvering - no reads,
writes and certainly no snapshots.

In /var/adm/messages, I see this message repeated several times each minute:
Nov 12 17:30:52 ddd scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd47):
Nov 12 17:30:52 ddd offline or reservation conflict

Why isn't this enough for zfs to switch over to the hotspare?
I've tried disabling (setting to write-thru) the write-cache on the
array box, but that didn't make any difference to the behaviour either.

I'd appreciate any insights or hints on how to proceed with this -
should I even be trying to use zfs in this situation?

Regards, Paul Boven.
-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Paul Boven
Hi everyone,

Now that zfsboot is becoming available, I'm wondering how to put it to
use. Imagine a system with 4 identical disks. Of course I'd like to use
raidz, but zfsboot doesn't do raidz. What if I were to partition the
drives, such that I have 4 small partitions that make up a zfsboot
partition (4 way mirror), and the remainder of each drive becomes part
of a raidz? Do I still have the advantages of having the whole disk
'owned' by zfs, even though it's split into two parts?
Swap would probably have to go on a zvol - would that be best placed on
the n-way mirror, or on the raidz?

Regards, Paul Boven.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss