Re: [zfs-discuss] RAIDZ one of the disk showing unavail

2008-09-26 Thread Ralf Ramge
Srinivas Chadalavada wrote:

  I see the first disk as unavailble, How do i make it online? 

By replacing it with a non-broken one.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which is better for root ZFS: mlc or slc SSD?

2008-09-26 Thread Adam Leventhal
For a root device it doesn't matter that much. You're not going to be  
writing to the device at a high data rate so write/erase cycles don't  
factor much (MLC can sustain about a factor of 10 more). With MLC  
you'll get 2-4x the capacity for the same price, but again that  
doesn't matter much for a root device. Performance is typically a bit  
better with SLC -- especially on the write side -- but it's not such a  
huge difference.

The reason you'd use a flash SSD for a boot device is power (with  
maybe a dash of performance), and either SLC or MLC will do just fine.

Adam

On Sep 24, 2008, at 11:41 AM, Erik Trimble wrote:

 I was under the impression that MLC is the preferred type of SSD,  
 but I
 want to prevent myself from having a think-o.


 I'm looking to get (2) SSD to use as my boot drive. It looks like I  
 can
 get 32GB SSDs composed of either SLC or MLC for roughly equal pricing.
 Which would be the better technology?  (I'll worry about rated access
 times/etc of the drives, I'm just wondering about general tech for  
 an OS
 boot drive usage...)



 -- 
 Erik Trimble
 Java System Support
 Mailstop:  usca22-123
 Phone:  x17195
 Santa Clara, CA
 Timezone: US/Pacific (GMT-0800)

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs resilvering

2008-09-26 Thread Mikael Kjerrman
Hi,

I've searched without luck, so I'm asking instead.

I have a Solaris 10 box,

# cat /etc/release
   Solaris 10 11/06 s10s_u3wos_10 SPARC
   Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
   Assembled 14 November 2006

this box was rebooted this morning and after the boot I noticed a resilver was 
in progress. But the suggested time seemed a bit long, so is this a problem 
which can be patched or remediated in another way?

# zpool status -x
  pool: zonedata
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.04% done, [b]4398h43m[/b] to go
config:

NAME   STATE READ WRITE CKSUM
zonedata   ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A0d0  ONLINE   0 0 0
c6t60060E8004283300283310A0d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A1d0  ONLINE   0 0 0
c6t60060E8004283300283310A1d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A2d0  ONLINE   0 0 0
c6t60060E8004283300283310A2d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A4d0  ONLINE   0 0 0
c6t60060E8004283300283310A4d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A5d0  ONLINE   0 0 0
c6t60060E8004283300283310A5d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A6d0  ONLINE   0 0 0
c6t60060E8004283300283310A6d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B2022d0  ONLINE   0 0 0
c6t60060E800428330028332022d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B2023d0  ONLINE   0 0 0
c6t60060E800428330028332024d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B2024d0  ONLINE   0 0 0
c6t60060E800428330028332023d0  ONLINE   0 0 0


I also have a question about sharing a zfs from the global zone to a local 
zone. Are there any issues with this? We had an unfortunate sysadmin who did 
this and our systems hung. We have no logs that show anyhing at all, but I 
thought I'd ask just be sure.

cheers,

//Mike
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Scripting zfs send / receive

2008-09-26 Thread Ross
Hey folks,

Is anybody able to help a Solaris scripting newbie with this? I want to put 
together an automatic script to take snapshots on one system and send them 
across to another. I've shown the manual process works, but only have a very 
basic idea about how I'm going to automate this.

My current thinking is that I want to put together a cron job that will work 
along these lines:

- Run every 15 mins
- take a new snapshot of the pool
- send the snapshot to the remote system with zfs send / receive and ssh.
(am I right in thinking I can get ssh to work with no password if I create a 
public/private key pair? http://www.go2linux.org/ssh-login-using-no-password)
- send an e-mail alert if zfs send / receive fails for any reason (with the 
text of the failure message)
- send an e-mail alert if zfs send / receive takes longer than 15 minutes and 
clashes with the next attempt
- delete the oldest snapshot on both systems if the send / receive worked

Can anybody think of any potential problems I may have missed? 

Bearing in mind I've next to no experience in bash scripting, how does the 
following look?

**
#!/bin/bash

# Prepare variables for e-mail alerts
SUBJECT=zfs send / receive error
EMAIL=[EMAIL PROTECTED]

NEWSNAP=build filesystem + snapshot name here
RESULTS=$(/usr/sbin/zfs snapshot $NEWSNAP)
# how do I check for a snapshot failure here?  Just look for non blank $RESULTS?
if $RESULTS; then
   # send e-mail
   /bin/mail -s $SUBJECT $EMAIL $RESULTS
   exit
fi

PREVIOUSSNAP=build filesystem + snapshot name here
RESULTS=$(/usr/sbin/zfs send -i $NEWSNAP $PREVIOUSSNAP | ssh -l *user* 
*remote-system* /usr/sbin/zfs receive *filesystem*)
# again, how do I check for error messages here?  Do I just look for a blank 
$RESULTS to indicate success?
if $RESULTS ok; then
   OBSOLETESNAP=build filesystem + name here
   zfs destroy $OBSOLETESNAP
   ssh -l *user* *remote-system* /usr/sbin/zfs destroy $OBSOLETESNAP
else 
   # send e-mail with error message
   /bin/mail -s $SUBJECT $EMAIL $RESULTS
fi
**

One concern I have is what happens if the send / receive takes longer than 15 
minutes. Do I need to check that manually, or will the script cope with this 
already? Can anybody confirm that it will behave as I am hoping in that the 
script will take the next snapshot, but the send / receive will fail and 
generate an e-mail alert?

thanks,

Ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scripting zfs send / receive

2008-09-26 Thread Enda O'Connor ( Sun Micro Systems Ireland)
Hi
Clive King has a nice blog entry showing this in action
http://blogs.sun.com/clive/entry/replication_using_zfs

with associated script at:
http://blogs.sun.com/clive/resource/zfs_repl.ksh

Which I think answers most of your questions.

Enda
Ross wrote:
 Hey folks,
 
 Is anybody able to help a Solaris scripting newbie with this? I want to put 
 together an automatic script to take snapshots on one system and send them 
 across to another. I've shown the manual process works, but only have a very 
 basic idea about how I'm going to automate this.
 
 My current thinking is that I want to put together a cron job that will work 
 along these lines:
 
 - Run every 15 mins
 - take a new snapshot of the pool
 - send the snapshot to the remote system with zfs send / receive and ssh.
 (am I right in thinking I can get ssh to work with no password if I create a 
 public/private key pair? http://www.go2linux.org/ssh-login-using-no-password)
 - send an e-mail alert if zfs send / receive fails for any reason (with the 
 text of the failure message)
 - send an e-mail alert if zfs send / receive takes longer than 15 minutes and 
 clashes with the next attempt
 - delete the oldest snapshot on both systems if the send / receive worked
 
 Can anybody think of any potential problems I may have missed? 
 
 Bearing in mind I've next to no experience in bash scripting, how does the 
 following look?
 
 **
 #!/bin/bash
 
 # Prepare variables for e-mail alerts
 SUBJECT=zfs send / receive error
 EMAIL=[EMAIL PROTECTED]
 
 NEWSNAP=build filesystem + snapshot name here
 RESULTS=$(/usr/sbin/zfs snapshot $NEWSNAP)
 # how do I check for a snapshot failure here?  Just look for non blank 
 $RESULTS?
 if $RESULTS; then
# send e-mail
/bin/mail -s $SUBJECT $EMAIL $RESULTS
exit
 fi
 
 PREVIOUSSNAP=build filesystem + snapshot name here
 RESULTS=$(/usr/sbin/zfs send -i $NEWSNAP $PREVIOUSSNAP | ssh -l *user* 
 *remote-system* /usr/sbin/zfs receive *filesystem*)
 # again, how do I check for error messages here?  Do I just look for a blank 
 $RESULTS to indicate success?
 if $RESULTS ok; then
OBSOLETESNAP=build filesystem + name here
zfs destroy $OBSOLETESNAP
ssh -l *user* *remote-system* /usr/sbin/zfs destroy $OBSOLETESNAP
 else 
# send e-mail with error message
/bin/mail -s $SUBJECT $EMAIL $RESULTS
 fi
 **
 
 One concern I have is what happens if the send / receive takes longer than 15 
 minutes. Do I need to check that manually, or will the script cope with this 
 already? Can anybody confirm that it will behave as I am hoping in that the 
 script will take the next snapshot, but the send / receive will fail and 
 generate an e-mail alert?
 
 thanks,
 
 Ross
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow zpool import with b98

2008-09-26 Thread Lars Timmann
Hi again... today I maybe had the same problem you described...

I had an on disk format of 11. After upgrading to 13 all works fine.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs resilvering

2008-09-26 Thread Brent Jones
On Fri, Sep 26, 2008 at 1:27 AM, Mikael Kjerrman
[EMAIL PROTECTED] wrote:
 Hi,

 I've searched without luck, so I'm asking instead.

 I have a Solaris 10 box,

 # cat /etc/release
   Solaris 10 11/06 s10s_u3wos_10 SPARC
   Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
   Assembled 14 November 2006

 this box was rebooted this morning and after the boot I noticed a resilver 
 was in progress. But the suggested time seemed a bit long, so is this a 
 problem which can be patched or remediated in another way?

 # zpool status -x
  pool: zonedata
  state: ONLINE
 status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
 action: Wait for the resilver to complete.
  scrub: resilver in progress, 0.04% done, [b]4398h43m[/b] to go
 config:

NAME   STATE READ WRITE CKSUM
zonedata   ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A0d0  ONLINE   0 0 0
c6t60060E8004283300283310A0d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A1d0  ONLINE   0 0 0
c6t60060E8004283300283310A1d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A2d0  ONLINE   0 0 0
c6t60060E8004283300283310A2d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A4d0  ONLINE   0 0 0
c6t60060E8004283300283310A4d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A5d0  ONLINE   0 0 0
c6t60060E8004283300283310A5d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B10A6d0  ONLINE   0 0 0
c6t60060E8004283300283310A6d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B2022d0  ONLINE   0 0 0
c6t60060E800428330028332022d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B2023d0  ONLINE   0 0 0
c6t60060E800428330028332024d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c6t60060E8004282B00282B2024d0  ONLINE   0 0 0
c6t60060E800428330028332023d0  ONLINE   0 0 0


 I also have a question about sharing a zfs from the global zone to a local 
 zone. Are there any issues with this? We had an unfortunate sysadmin who did 
 this and our systems hung. We have no logs that show anyhing at all, but I 
 thought I'd ask just be sure.

 cheers,

 //Mike
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Do you have a lot of competing I/O's on the box which would slow down
the resilver?


-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs resilvering

2008-09-26 Thread Mikael Kjerrman
define a lot :-)

We are doing about 7-8M per second which I don't think is a lot but perhaps it 
is enough to screw up the estimates? Anyhow the resilvering completed about 
4386h earlier than expected so everything is ok now, but I still feel that the 
way it figures out the number is wrong.

Any thoughts on my other issue?

cheers,

//Mike
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs resilvering

2008-09-26 Thread Johan Hartzenberg
On Fri, Sep 26, 2008 at 4:02 PM, [EMAIL PROTECTED] wrote:


 Note the progress so far 0.04%.  In my experience the time estimate has
 no basis in reality until it's about 1% do or so.  I think there is some
 bookkeeping or something ZFS does at the start of a scrub or resilver that
 throws off the time estimate for a while.  Thats just my experience with
 it but it's been like that pretty consistently for me.

 Jonathan Stewart


I agree here.

I've watched iostat -xnc 5 while I start scrubbing a few times, and the
first minute or so is spend doing very little IO.  There after the transfers
shoot up to near what I think is the maximum the drive can do an stays there
until the scrub is completed.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS poor performance on Areca 1231ML

2008-09-26 Thread Ross Becker
Well,   I just got in a system I am intending to be a BIG fileserver;  
background-  I work for a SAN startup, and we're expecting in our first year to 
collect 30-60 terabytes of Fibre Channel traces.  The purpose of this is to be 
a large repository for those traces w/ statistical analysis run against them. 
Looking at that storage figure,  I decided this would be a perfect application 
for ZFS.  I purchased a Super Micro chassis that's 4u and has 24 slots for SATA 
drives.  I've put one quad-core 2.66 ghz processor in  8gig of ECC ram.   I 
put in two Areca 1231ML ( http://www.areca.com.tw/products/pcie341.htm ) 
controllers which come with Solaris drivers.  I've half-populated the chassis 
with 12  1Tb drives to begin with, and I'm running some experiments.  I loaded 
OpenSolaris 05-2008 on the system.  

I configured up an 11 drive RAID6 set + 1 hot spare on the Areca controller put 
a ZFS on that raid volume, and ran bonnie++ against it (16g size), and achieved 
150 mb/s  write,  200 mb/s read.  I then blew that away, configured the Areca 
to present JBOD, and configured ZFS with RAIDZ2  11 disks, and a hot spare.  
Running bonnie++ against that, it  achieved 40 mb/sec read and 40 mb/sec write. 
 I wasn't expecting RAIDZ to outrun the controller-based RAID, but I wasn't 
expecting 1/3rd to 1/4 the performance.  I've looked at the ZFS tuning info on 
the solaris site, and mostly what they said is tuning is evil, with a few 
things for Database tuning.   

Anyone got suggestions on whether there's something I might poke at to at least 
get this puppy up closer to 100 mb/sec?  Otherwise,  I may dump the JBOD and go 
back to the controller-based RAID.

Cheers
   Ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ one of the disk showing unavail

2008-09-26 Thread Miles Nordin
 sc == Srinivas Chadalavada [EMAIL PROTECTED] writes:
 rr == Ralf Ramge [EMAIL PROTECTED] writes:

sc I see the first disk as unavailble, How do i make it online?

rr By replacing it with a non-broken one.

Ralf, aren't you missing this obstinence-error:

sc the following errors must be manually repaired:
sc /dev/dsk/c0t2d0s0 is part of active ZFS pool export_content.

and he used the -f flag.


pgp1eI3LJ8FWN.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs resilvering

2008-09-26 Thread Richard Elling
Mikael Kjerrman wrote:
 define a lot :-)

 We are doing about 7-8M per second which I don't think is a lot but perhaps 
 it is enough to screw up the estimates? Anyhow the resilvering completed 
 about 4386h earlier than expected so everything is ok now, but I still feel 
 that the way it figures out the number is wrong.
   

Yes, the algorithm is conservative and very often wrong until you
get close to the end.  In part this is because resilvering works in time
order, not spatial distance. In ZFS, the oldest data is resilvered first.
This is also why you will see a lot of thinking before you see a
lot of I/O because ZFS is determining the order to resilver the data.
Unfortunately, this makes time completion prediction somewhat
difficult to get right.

 Any thoughts on my other issue?
   

Try the zones-discuss forum
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread Miles Nordin
 t == Tim  [EMAIL PROTECTED] writes:

 t http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm

I'm not sure.  A different thing is wrong with it depending on what
driver attaches to it.  I can't tell for sure because this page:

  http://linuxmafia.com/faq/Hardware/sas.html

says the LSI SAS 3800 series uses a 1068E chip, and James says (1)
1068E is supported by mpt, (2) LSI SAS 3800 uses mega_sas.  so, I
don't know which for that card, which means I don't know which for
this card.

If it's mpt:

 * does not come with source according to:

   http://www.openbsd.org/papers/opencon06-drivers/mgp00024.html
   http://www.opensolaris.org/os/about/no_source/

If it's mega_sas:

 * does not come with source

 * driver is new and unproven.  We believed the Marvell driver was
   good for the first few months too, the same amount of experience we
   have with mega_sas.

 * not sure if it's available in stable solaris.


In either case:

 * may require expensive cables


Uncertain problems:

 * might not support hotplug

 * might not support NCQ

 * probably doesn't support port multipliers

 * probably doesn't support smartctl

 * none of these features can be fixed by the community without
   source.  all are available with cheaper cards on Linux, and on
   Linux both mptsas and megaraid_sas come with source as far as I can
   tell maintained by dell and lsi, though might not support the above
   features.


HTH, HAND.


pgprXeXGuGTc7.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS poor performance on Areca 1231ML

2008-09-26 Thread Bob Friesenhahn
On Fri, 26 Sep 2008, Ross Becker wrote:

 I configured up an 11 drive RAID6 set + 1 hot spare on the Areca 
 controller put a ZFS on that raid volume, and ran bonnie++ against 
 it (16g size), and achieved 150 mb/s write,  200 mb/s read.  I then 
 blew that away, configured the Areca to present JBOD, and configured 
 ZFS with RAIDZ2 11 disks, and a hot spare.  Running bonnie++ against 
 that, it achieved 40 mb/sec read and 40 mb/sec write.  I wasn't 
 expecting RAIDZ to outrun the controller-based RAID, but I wasn't 
 expecting 1/3rd to 1/4 the performance.  I've looked at the ZFS

Terrible!  Have you tested the I/O performance of each drive to make 
sure that they are all performing ok?

If the individual drives are found to be performing ok with your JBOD 
setup, then I would suspect a device driver, card slot, or card 
firmware performance problem.  If RAID6 is done by the RAID card then 
backplane I/O to the card is not very high.  If raidz2 is used, then 
the I/O to the card is much higher.  With a properly behaving device 
driver and card, it is quite likely that ZFS raidz2 will outperform 
the on-card RAID6.

You might try disabling the card's NVRAM to see if that makes a 
difference.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread Will Murnane
On Thu, Sep 25, 2008 at 18:51, Tim [EMAIL PROTECTED] wrote:
 So what's wrong with this card?
 http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm
If you have a UIO slot (many recent Supermicro boards do) then it's a
fine choice.  But if you have a non-Supermicro board, you may be in
for a shock when you get it---it's swapped left for right, compare it
to a regular pci-e card.  It won't fit in a standard case.  AIUI it is
a standard pci express slot, just shifted over a bit so the backwards
slot cover fits into normal cases, so perhaps you could try fastening
a normal slot cover to it and using it in a normal pci-e slot... but
that doesn't sound particularly elegant, and would take up the slot on
the other side of it as well.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread Will Murnane
On Thu, Sep 25, 2008 at 21:59, Miles Nordin [EMAIL PROTECTED] wrote:
 wm == Will Murnane [EMAIL PROTECTED] writes:

wm I'd rather have a working closed blob than a driver that is
wm Free Software for a device that is faulty.  Ideals are very
wm nice, but broken hardware isn't.

 except,

  1. part of the reason the closed Solaris drivers are (also) broken,
IMHO, is that they're closed, so highly-invested competent people
can't fix them if they happen to be on the wrong side of the wall.
I agree this is an issue.  But as I said, I'd rather have a working
closed driver than a broken open one.

  2. Linux has open drivers for the Marvell chip that work better than
Sun's closed driver (snip)
That's not my experience.  I bought my Marvell card around 2005, and
at that point I used Linux drivers.  Drivers for the card at that
point did not support DMA, but were fairly reliable.  In late 2006 or
so, DMA support was finally added, so I gleefully installed a new
kernel and was happy.

Until I realized that my data was corrupt.  This is for a home system,
so I didn't have checksums for the data before the corruption, but I
started to hear glitches in music playback.  At that point I switched
to Solaris, and was very glad for the drivers that didn't cause
corruption---and the filesystem that could tell me when things went
wrong.  I did have a problem with disks falling off the card, so I
posted to the storage-discuss mailing list [2].  Despite being on the
wrong side of the wall, the drivers were updated fairly soon
thereafter, and my problem was solved [1].  The system worked quite
well for me in this instance.

  3. The position is incredibly short-sighted.  Imagine the quality of
driver we'd have right now if _everyone_ refused to sign that
damned paper, not just the Linux people.  We would have a better
driver.  It would be open, too, but open or not it would be
better.
Not necessarily.  Suppose that the corporation making the hardware
released its own drivers, for Windows and Linux, say, and didn't
release specs to anyone else, even under NDA conditions.  Then nobody
gets good drivers (ones that correctly use all the features the
hardware has).

I agree that having complete hardware specs is a very helpful thing to
make drivers.  But they're not strictly necessary, as the Linux/BSD
folks have shown.

  4. there are missing features like NCQ, hotplug, port-multiplier
support, all highly relevant to ZFS, for which we will have to
wait longer because we've accepted closed drivers.
That's true.  But honestly, I don't see those features (with the
exception of hot-plug) as being all that necessary.  Port multipliers
are uncommon and don't perform as well as they could, and NCQ seems to
me to be something the OS could do better than the drive firmware.

  5. The Sil 3124 chip works fine on Linux.  I have not tried the 3114,
but at least on Linux it is part of libata, their SATA framework,
not supported in remedial PATA mode, so it's at least more of a
first-class driver in Linux than in Solaris, if not simply a
better one.
IMHO, attempting to make SilImage controllers work well is lipstick on
a pig.  Working around the bugs in the hardware is not worth the
effort.

I just want an open driver that works well for some
fairly-priced card I can actually buy.
This I can agree with.  Despite my objections to free drivers being
inherently better than closed ones, I do like the idea of being able
to have a completely transparent machine, where I can inspect every
piece of software.  I would be more than happy to buy such hardware
were it available, but in the interim I will continue to suggest and
buy LSI's products, which are not free but which have good drivers for
them.

The open driver isn't
obtainable as an add-on card
The ICH series would indeed be nice to see as an addon card of some sort.

If there _is_ an open vs. closed trade-off, the track record so
far suggests a different trade-off than what you suggest: you can
have closed drivers if you really want them, but they'll be more
broken than the open ones.
That may be the case in the larger picture, but in my experience I've
seen otherwise.

Will

[1]: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg14188.html
[2]: 
http://osdir.com/ml/os.solaris.opensolaris.storage.general/2007-08/msg00054.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool import of bootable root pool renders it unbootable

2008-09-26 Thread Stephen Quintero
I am running OpenSolaris 2008.05 as a PV guest under Xen.  If you import the 
bootable root pool of a VM into another Solaris VM, the root pool is no longer 
bootable.

It is related to the device associated to the pool, which is originally c4d0s0, 
but on import (-f) becomes c0d2s0 in this case.  Afterwards, booting the 
original image results in a kernel panic because, I think, zfs_mountroot() 
cannot mount the root path (which is evidently now wrong).

I this fixable?  How does one mount (import) a bootable zpool without 
wrecking it?  This is something that is commonly done under virtualization 
platforms, e.g., to manage the contents of a VM from another VM, or to perform 
a file-system level copy of the contents of a VM to another device.

Any insight would be appreciated.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread Tim
On Fri, Sep 26, 2008 at 1:02 PM, Will Murnane [EMAIL PROTECTED]wrote:

 On Thu, Sep 25, 2008 at 18:51, Tim [EMAIL PROTECTED] wrote:
  So what's wrong with this card?
  http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm
 If you have a UIO slot (many recent Supermicro boards do) then it's a
 fine choice.  But if you have a non-Supermicro board, you may be in
 for a shock when you get it---it's swapped left for right, compare it
 to a regular pci-e card.  It won't fit in a standard case.  AIUI it is
 a standard pci express slot, just shifted over a bit so the backwards
 slot cover fits into normal cases, so perhaps you could try fastening
 a normal slot cover to it and using it in a normal pci-e slot... but
 that doesn't sound particularly elegant, and would take up the slot on
 the other side of it as well.

 Will



This is not a UIO card.  It's a standard PCI-E card.  What the description
is telling you is that you can combine it with a UIO card to add raid
functionality as there is none built-in.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread Tim
On Fri, Sep 26, 2008 at 12:29 PM, Miles Nordin [EMAIL PROTECTED] wrote:

  t == Tim  [EMAIL PROTECTED] writes:

 t
 http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm

 I'm not sure.  A different thing is wrong with it depending on what
 driver attaches to it.  I can't tell for sure because this page:

  http://linuxmafia.com/faq/Hardware/sas.html

 says the LSI SAS 3800 series uses a 1068E chip, and James says (1)
 1068E is supported by mpt, (2) LSI SAS 3800 uses mega_sas.  so, I
 don't know which for that card, which means I don't know which for
 this card.

 If it's mpt:

  * does not come with source according to:

   http://www.openbsd.org/papers/opencon06-drivers/mgp00024.html
   http://www.opensolaris.org/os/about/no_source/

 If it's mega_sas:

  * does not come with source

  * driver is new and unproven.  We believed the Marvell driver was
   good for the first few months too, the same amount of experience we
   have with mega_sas.

  * not sure if it's available in stable solaris.


Someone's already gotten it working, if they're watching I'm sure they'll
pipe up on what driver it uses.




 In either case:

  * may require expensive cables


Nope, cables are standardized.  I'm not sure what your definition of
expensive is but I believe they were roughly 15$ for a SAS4sata ports.



 Uncertain problems:

  * might not support hotplug

  * might not support NCQ

  * probably doesn't support port multipliers

  * probably doesn't support smartctl

  * none of these features can be fixed by the community without
   source.  all are available with cheaper cards on Linux, and on
   Linux both mptsas and megaraid_sas come with source as far as I can
   tell maintained by dell and lsi, though might not support the above
   features.


 HTH, HAND.


I know it supports hotplug and NCQ.  Can't say smartctl was ever on my list
of important features so I haven't bothered to research if it does.  I'm
also not sure what good port multipliers are going to do you in this
instance... the cables it uses already support 4 SATA drives per physical
card port.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread Will Murnane
On Fri, Sep 26, 2008 at 21:51, Tim [EMAIL PROTECTED] wrote:
 This is not a UIO card.  It's a standard PCI-E card.  What the description
 is telling you is that you can combine it with a UIO card to add raid
 functionality as there is none built-in.
Not so.  The description [1] mentions that this is UIO, and says only
that it negotiates pci-e link speeds, not that it fits in a pci
express slot.  UIO is pci express, but the slots are positioned
differently from pci-e ones.

Compare this to the picture of an equivalent LSI card [2].  The
pictures are similar, but compare the position of the bracket.  The
components are mounted on the wrong sides.  Take a look at a UIO board
[3]: the PCI-X slot is shared with the blue UIO slot on the left side,
like PCI and ISA slots used to be shared.  This is why the components
are backwards.

Will

[1]: http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm
[2]: 
http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8208elp/index.html
[3]: http://www.supermicro.com/products/motherboard/Xeon1333/5400/X7DWE.cfm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread Tim
On Fri, Sep 26, 2008 at 5:07 PM, Will Murnane [EMAIL PROTECTED]wrote:

 On Fri, Sep 26, 2008 at 21:51, Tim [EMAIL PROTECTED] wrote:
  This is not a UIO card.  It's a standard PCI-E card.  What the
 description
  is telling you is that you can combine it with a UIO card to add raid
  functionality as there is none built-in.
 Not so.  The description [1] mentions that this is UIO, and says only
 that it negotiates pci-e link speeds, not that it fits in a pci
 express slot.  UIO is pci express, but the slots are positioned
 differently from pci-e ones.

 Compare this to the picture of an equivalent LSI card [2].  The
 pictures are similar, but compare the position of the bracket.  The
 components are mounted on the wrong sides.  Take a look at a UIO board
 [3]: the PCI-X slot is shared with the blue UIO slot on the left side,
 like PCI and ISA slots used to be shared.  This is why the components
 are backwards.

 Will

 [1]:
 http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm
 [2]:
 http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8208elp/index.html
 [3]:
 http://www.supermicro.com/products/motherboard/Xeon1333/5400/X7DWE.cfm



Well, there's people that have it working in a PCI-E slot, so I don't know
what to tell you.

http://www.opensolaris.org/jive/thread.jspa?messageID=272283#272283
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS poor performance on Areca 1231ML

2008-09-26 Thread Ross Becker
Okay, after doing some testing, it appears that the issue is on the ZFS side.  
I fiddled around a while with options on the areca card, and never got any 
better performance results than my first test. So, my best out of the raidz2 is 
42 mb/s write and 43 mb/s read.  I also tried turning off crc's (not how I'd 
run production, but for testing), and got no performance gain.

After fiddling with options, I destroyed my zfs  zpool, and tried some 
single-drive bits.   I simply used newfs to create filesystems on single 
drives, mounted them, and ran some single-drive bonnie++ tests.  On a single 
drive, I got 50 mb/sec write  70 mb/sec read.   I also tested two benchmarks 
on two drives simultaneously, and on each of the tests, the result dropped by 
about 2mb/sec, so I got a combined 96 mb/sec write  136 mb/sec read with two 
separate UFS filesystems on two separate disks.

So next steps? 

--ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS poor performance on Areca 1231ML

2008-09-26 Thread Tim
On Fri, Sep 26, 2008 at 5:46 PM, Ross Becker [EMAIL PROTECTED]wrote:

 Okay, after doing some testing, it appears that the issue is on the ZFS
 side.  I fiddled around a while with options on the areca card, and never
 got any better performance results than my first test. So, my best out of
 the raidz2 is 42 mb/s write and 43 mb/s read.  I also tried turning off
 crc's (not how I'd run production, but for testing), and got no performance
 gain.

 After fiddling with options, I destroyed my zfs  zpool, and tried some
 single-drive bits.   I simply used newfs to create filesystems on single
 drives, mounted them, and ran some single-drive bonnie++ tests.  On a single
 drive, I got 50 mb/sec write  70 mb/sec read.   I also tested two
 benchmarks on two drives simultaneously, and on each of the tests, the
 result dropped by about 2mb/sec, so I got a combined 96 mb/sec write  136
 mb/sec read with two separate UFS filesystems on two separate disks.

 So next steps?

 --ross
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Did you try disabling the card cache as others advised?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] working closed blob driver

2008-09-26 Thread James C. McPherson
Tim wrote:
 On Fri, Sep 26, 2008 at 12:29 PM, Miles Nordin [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:
   t == Tim  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] writes:
 t
 http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm
 I'm not sure.  A different thing is wrong with it depending on what
 driver attaches to it.  I can't tell for sure because this page:
 
  http://linuxmafia.com/faq/Hardware/sas.html
 
 says the LSI SAS 3800 series uses a 1068E chip, and James says (1)
 1068E is supported by mpt, (2) LSI SAS 3800 uses mega_sas.  so, I
 don't know which for that card, which means I don't know which for
 this card.

There are several LSI cards which use the 1068 and 1068E chip.
Some of these use mpt(7d), some use mega_sas(7d). It all depends
on the firmware of the card, basically. You could also have a
look at the PCI IDs database at http://pciids.sourceforge.net
to see what the name to pci vid/did mapping is. That provides a
fairly good indicator of whether you'll need mpt(7d) or mega_sas(7d).

 If it's mpt:
 
  * does not come with source according to:
 
   http://www.openbsd.org/papers/opencon06-drivers/mgp00024.html
   http://www.opensolaris.org/os/about/no_source/
 
 If it's mega_sas:
 
  * does not come with source
 
  * driver is new and unproven.  We believed the Marvell driver was
   good for the first few months too, the same amount of experience we
   have with mega_sas.
 
  * not sure if it's available in stable solaris.
 Someone's already gotten it working, if they're watching I'm sure 
 they'll pipe up on what driver it uses.


http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/mega_sas

We've got this driver into Solaris 10 Update 6. I'm still keen to
find out from Miles why mega_sas is new and unproven given that
it's been in NV since build 88. Miles - if you're seeing problems
with it, please let us know so that we can fix them. If you don't
tell us, how will we ever know?


 In either case:
 
  * may require expensive cables
 Nope, cables are standardized.  I'm not sure what your definition of 
 expensive is but I believe they were roughly 15$ for a SAS4sata ports.

If you want to get an external SAS cable (particularly if it's
got the InfiniBand-style SF8088 connector), then that might cost
you a bit. If you just want to connect devices internally, then
I would expect the cables to be somewhat cheaper. Either way,
with more and more volume of cards and devices on the market, the
pricing for cables should decrease too.
[snip]


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS poor performance on Areca 1231ML

2008-09-26 Thread James C. McPherson
Ross Becker wrote:
 Well,   I just got in a system I am intending to be a BIG fileserver;
 background-  I work for a SAN startup, and we're expecting in our first
 year to collect 30-60 terabytes of Fibre Channel traces.  The purpose of
 this is to be a large repository for those traces w/ statistical analysis
 run against them. Looking at that storage figure,  I decided this would
 be a perfect application for ZFS.  I purchased a Super Micro chassis
 that's 4u and has 24 slots for SATA drives.  I've put one quad-core 2.66
 ghz processor in  8gig of ECC ram.   I put in two Areca 1231ML (
 http://www.areca.com.tw/products/pcie341.htm ) controllers which come
 with Solaris drivers.  I've half-populated the chassis with 12  1Tb
 drives to begin with, and I'm running some experiments.  I loaded
 OpenSolaris 05-2008 on the system.
 
 I configured up an 11 drive RAID6 set + 1 hot spare on the Areca
 controller put a ZFS on that raid volume, and ran bonnie++ against it
 (16g size), and achieved 150 mb/s  write,  200 mb/s read.  I then blew
 that away, configured the Areca to present JBOD, and configured ZFS with
 RAIDZ2  11 disks, and a hot spare.  Running bonnie++ against that, it
 achieved 40 mb/sec read and 40 mb/sec write.  I wasn't expecting RAIDZ to
 outrun the controller-based RAID, but I wasn't expecting 1/3rd to 1/4 the
 performance.  I've looked at the ZFS tuning info on the solaris site, and
 mostly what they said is tuning is evil, with a few things for Database
 tuning.
 
 Anyone got suggestions on whether there's something I might poke at to at
 least get this puppy up closer to 100 mb/sec?  Otherwise,  I may dump the
 JBOD and go back to the controller-based RAID.

While running pre-integration testing of arcmsr(7d), I noticed
that random IO was pretty terrible. My results matched what I
saw in benchmark PDFs from http://www.areca.com.tw/support/main.htm
(bottom of page), but I'd still like to improve the results.

Were you doing more random or more sequential IO?

The source is here:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/io/scsi/adapters/arcmsr


... and I'm keen to talk with you in detail about the issues
you're seeing with arcmsr too.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS poor performance on Areca 1231ML

2008-09-26 Thread Jonathan Loran


Ross Becker wrote:
 Okay, after doing some testing, it appears that the issue is on the ZFS side. 
  I fiddled around a while with options on the areca card, and never got any 
 better performance results than my first test. So, my best out of the raidz2 
 is 42 mb/s write and 43 mb/s read.  I also tried turning off crc's (not how 
 I'd run production, but for testing), and got no performance gain.

 After fiddling with options, I destroyed my zfs  zpool, and tried some 
 single-drive bits.   I simply used newfs to create filesystems on single 
 drives, mounted them, and ran some single-drive bonnie++ tests.  On a single 
 drive, I got 50 mb/sec write  70 mb/sec read.   I also tested two benchmarks 
 on two drives simultaneously, and on each of the tests, the result dropped by 
 about 2mb/sec, so I got a combined 96 mb/sec write  136 mb/sec read with two 
 separate UFS filesystems on two separate disks.

 So next steps? 

 --ross
   

Raidz(2) vdevs can sustain the max iops of single drive in the vdev.  
I'm curious what zpool iostat would say while bonnie++ is running it's 
writing intelligently test.  The throughput sounds very low to me, but 
the clue here is the single drive speed is in line with the raidz2 vdev, 
so if a single drive is being limited by iops, not by raw throughput, 
then this IO result makes sense.  For fun, you should make two vdevs out 
of two raidz to see if you get twice the throughput, more or less.  I'll 
bet the answer is yes. 

Jon

-- 


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss