Re: [zfs-discuss] Trying to determine if this box will be compatible with Opensolaris or Solaris

2009-03-12 Thread mike
yeah i really wish the HCL was easier to work with, and allowed comments.

for instance that HCL entry was updated in 2007 sometime. since then
like you've said it could have been better or dropped altogether. some
sort of more community oriented aspect might help beef it up some.
also making the tools simpler - absolutely no UI for instance. does it
really need one to dump out things? :)

On Wed, Mar 11, 2009 at 7:15 PM, David Magda dma...@ee.ryerson.ca wrote:

 On Mar 11, 2009, at 21:59, mike wrote:

 On Wed, Mar 11, 2009 at 6:53 PM, David Magda dma...@ee.ryerson.ca wrote:


 If you know someone who already has the hardware, you can ask them to run
 the Sun Device Detection Tool:

 http://www.sun.com/bigadmin/hcl/hcts/device_detect.jsp

 It runs under other operating system (Windows, Linux, BSD) AFAIK, so a
 re-install or reboot isn't necessary to see what it comes up with.


 doesnt it require java and x11?

 Yes, it requires Java 1.5+; a GUI is needed, but I don't think X11 is
 specifically required (X is the GUI on Unix-y systems of course). Java
 doesn't specifically need X, it simply uses whatever the OS has.

 Looking at the page a bit more, you can run commands on the system and save
 the output  to a file that can be processed by the tool on another system:

 Apart from testing the current system on which Sun Device Detection Tool
 is invoked, you can also test the device data files that are generated from
 the external systems. To test the external device data files, print the PCI
 configuration of the external systems to a text file by using the following
 commands:

        • prtconf -pv on Solaris OS.
        • lspci -vv -n on Linux OS.
        • reg query hklm\system\currentcontrolset\enum\pci /s on Windows
 OS.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Blake
I start the cp, and then, with prstat -a, watch the cpu load for the
cp process climb to 25% on a 4-core machine.

Load, measured for example with 'uptime', climbs steadily until the reboot.

Note that the machine does not dump properly, panic or hang - rather,
it reboots.

I attached a screenshot earlier in this thread of the little bit of
error message I could see on the console.  The machine is trying to
dump to the dump zvol, but fails to do so.  Only sometimes do I see an
error on the machine's local console - mos times, it simply reboots.



On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
nathan.kroen...@sun.com wrote:
 Hm -

 Crashes, or hangs? Moreover - how do you know a CPU is pegged?

 Seems like we could do a little more discovery on what the actual problem
 here is, as I can read it about 4 different ways.

 By this last piece of information, I'm guessing the system does not crash,
 but goes really really slow??

 Crash == panic == we see stack dump on console and try to take a dump
 hang == nothing works == no response - might be worth looking at mdb -K
        or booting with a -k on the boot line.

 So - are we crashing, hanging, or something different?

 It might simply be that you are eating up all your memory, and your physical
 backing storage is taking a while to catch up?

 Nathan.

 Blake wrote:

 My dump device is already on a different controller - the motherboards
 built-in nVidia SATA controller.

 The raidz2 vdev is the one I'm having trouble with (copying the same
 files to the mirrored rpool on the nVidia controller work nicely).  I
 do notice that, when using cp to copy the files to the raidz2 pool,
 load on the machine climbs steadily until the crash, and one proc core
 pegs at 100%.

 Frustrating, yes.

 On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
 maidakalexand...@johndeere.com wrote:

 If you're having issues with a disk contoller or disk IO driver its
 highly likely that a savecore to disk after the panic will fail.  I'm not
 sure how to work around this, maybe a dedicated dump device not on a
 controller that uses a different driver then the one that you're having
 issues with?

 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
 Sent: Wednesday, March 11, 2009 4:45 PM
 To: Richard Elling
 Cc: Marc Bevand; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] reboot when copying large amounts of data

 I guess I didn't make it clear that I had already tried using savecore to
 retrieve the core from the dump device.

 I added a larger zvol for dump, to make sure that I wasn't running out of
 space on the dump device:

 r...@host:~# dumpadm
     Dump content: kernel pages
      Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
 directory: /var/crash/host
  Savecore enabled: yes

 I was using the -L option only to try to get some idea of why the system
 load was climbing to 1 during a simple file copy.



 On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
 richard.ell...@gmail.com wrote:

 Blake wrote:

 I'm attaching a screenshot of the console just before reboot.  The
 dump doesn't seem to be working, or savecore isn't working.

 On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com wrote:

 I'm working on testing this some more by doing a savecore -L right
 after I start the copy.


 savecore -L is not what you want.

 By default, for OpenSolaris, savecore on boot is disabled.  But the
 core will have been dumped into the dump slice, which is not used for
 swap.
 So you should be able to run savecore at a later time to collect the
 core from the last dump.
 -- richard


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 //
 // Nathan Kroenert              nathan.kroen...@sun.com         //
 // Systems Engineer             Phone:  +61 3 9869-6255         //
 // Sun Microsystems             Fax:    +61 3 9869-6288         //
 // Level 7, 476 St. Kilda Road  Mobile: 0419 305 456            //
 // Melbourne 3004   Victoria    Australia                       //
 //

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Nathan Kroenert

definitely time to bust out some mdb -k and see what it's moaning about.

I did not see the screenshot earlier... sorry about that.

Nathan.

Blake wrote:

I start the cp, and then, with prstat -a, watch the cpu load for the
cp process climb to 25% on a 4-core machine.

Load, measured for example with 'uptime', climbs steadily until the reboot.

Note that the machine does not dump properly, panic or hang - rather,
it reboots.

I attached a screenshot earlier in this thread of the little bit of
error message I could see on the console.  The machine is trying to
dump to the dump zvol, but fails to do so.  Only sometimes do I see an
error on the machine's local console - mos times, it simply reboots.



On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
nathan.kroen...@sun.com wrote:

Hm -

Crashes, or hangs? Moreover - how do you know a CPU is pegged?

Seems like we could do a little more discovery on what the actual problem
here is, as I can read it about 4 different ways.

By this last piece of information, I'm guessing the system does not crash,
but goes really really slow??

Crash == panic == we see stack dump on console and try to take a dump
hang == nothing works == no response - might be worth looking at mdb -K
   or booting with a -k on the boot line.

So - are we crashing, hanging, or something different?

It might simply be that you are eating up all your memory, and your physical
backing storage is taking a while to catch up?

Nathan.

Blake wrote:

My dump device is already on a different controller - the motherboards
built-in nVidia SATA controller.

The raidz2 vdev is the one I'm having trouble with (copying the same
files to the mirrored rpool on the nVidia controller work nicely).  I
do notice that, when using cp to copy the files to the raidz2 pool,
load on the machine climbs steadily until the crash, and one proc core
pegs at 100%.

Frustrating, yes.

On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
maidakalexand...@johndeere.com wrote:

If you're having issues with a disk contoller or disk IO driver its
highly likely that a savecore to disk after the panic will fail.  I'm not
sure how to work around this, maybe a dedicated dump device not on a
controller that uses a different driver then the one that you're having
issues with?

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
Sent: Wednesday, March 11, 2009 4:45 PM
To: Richard Elling
Cc: Marc Bevand; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] reboot when copying large amounts of data

I guess I didn't make it clear that I had already tried using savecore to
retrieve the core from the dump device.

I added a larger zvol for dump, to make sure that I wasn't running out of
space on the dump device:

r...@host:~# dumpadm
Dump content: kernel pages
 Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
directory: /var/crash/host
 Savecore enabled: yes

I was using the -L option only to try to get some idea of why the system
load was climbing to 1 during a simple file copy.



On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
richard.ell...@gmail.com wrote:

Blake wrote:

I'm attaching a screenshot of the console just before reboot.  The
dump doesn't seem to be working, or savecore isn't working.

On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com wrote:


I'm working on testing this some more by doing a savecore -L right
after I start the copy.



savecore -L is not what you want.

By default, for OpenSolaris, savecore on boot is disabled.  But the
core will have been dumped into the dump slice, which is not used for
swap.
So you should be able to run savecore at a later time to collect the
core from the last dump.
-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//



--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //

Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Nathan Kroenert
definitely time to bust out some mdb -K or boot -k and see what it's 
moaning about.


I did not see the screenshot earlier... sorry about that.

Nathan.

Blake wrote:

I start the cp, and then, with prstat -a, watch the cpu load for the
cp process climb to 25% on a 4-core machine.

Load, measured for example with 'uptime', climbs steadily until the reboot.

Note that the machine does not dump properly, panic or hang - rather,
it reboots.

I attached a screenshot earlier in this thread of the little bit of
error message I could see on the console.  The machine is trying to
dump to the dump zvol, but fails to do so.  Only sometimes do I see an
error on the machine's local console - mos times, it simply reboots.



On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
nathan.kroen...@sun.com wrote:

Hm -

Crashes, or hangs? Moreover - how do you know a CPU is pegged?

Seems like we could do a little more discovery on what the actual problem
here is, as I can read it about 4 different ways.

By this last piece of information, I'm guessing the system does not crash,
but goes really really slow??

Crash == panic == we see stack dump on console and try to take a dump
hang == nothing works == no response - might be worth looking at mdb -K
   or booting with a -k on the boot line.

So - are we crashing, hanging, or something different?

It might simply be that you are eating up all your memory, and your physical
backing storage is taking a while to catch up?

Nathan.

Blake wrote:

My dump device is already on a different controller - the motherboards
built-in nVidia SATA controller.

The raidz2 vdev is the one I'm having trouble with (copying the same
files to the mirrored rpool on the nVidia controller work nicely).  I
do notice that, when using cp to copy the files to the raidz2 pool,
load on the machine climbs steadily until the crash, and one proc core
pegs at 100%.

Frustrating, yes.

On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
maidakalexand...@johndeere.com wrote:

If you're having issues with a disk contoller or disk IO driver its
highly likely that a savecore to disk after the panic will fail.  I'm not
sure how to work around this, maybe a dedicated dump device not on a
controller that uses a different driver then the one that you're having
issues with?

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
Sent: Wednesday, March 11, 2009 4:45 PM
To: Richard Elling
Cc: Marc Bevand; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] reboot when copying large amounts of data

I guess I didn't make it clear that I had already tried using savecore to
retrieve the core from the dump device.

I added a larger zvol for dump, to make sure that I wasn't running out of
space on the dump device:

r...@host:~# dumpadm
Dump content: kernel pages
 Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
directory: /var/crash/host
 Savecore enabled: yes

I was using the -L option only to try to get some idea of why the system
load was climbing to 1 during a simple file copy.



On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
richard.ell...@gmail.com wrote:

Blake wrote:

I'm attaching a screenshot of the console just before reboot.  The
dump doesn't seem to be working, or savecore isn't working.

On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com wrote:


I'm working on testing this some more by doing a savecore -L right
after I start the copy.



savecore -L is not what you want.

By default, for OpenSolaris, savecore on boot is disabled.  But the
core will have been dumped into the dump slice, which is not used for
swap.
So you should be able to run savecore at a later time to collect the
core from the last dump.
-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   //
//



--
//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Systems Engineer Phone:  +61 3 9869-6255 //
// Sun Microsystems Fax:+61 3 9869-6288 //
// Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
// Melbourne 3004   VictoriaAustralia   

Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Blake
So, if I boot with the -k boot flags (to load the kernel debugger?)
what do I need to look for?  I'm no expert at kernel debugging.

I think this is a pci error judging by the console output, or at least
is i/o related...

thanks for your feedback,
Blake

On Thu, Mar 12, 2009 at 2:18 AM, Nathan Kroenert
nathan.kroen...@sun.com wrote:
 definitely time to bust out some mdb -K or boot -k and see what it's moaning
 about.

 I did not see the screenshot earlier... sorry about that.

 Nathan.

 Blake wrote:

 I start the cp, and then, with prstat -a, watch the cpu load for the
 cp process climb to 25% on a 4-core machine.

 Load, measured for example with 'uptime', climbs steadily until the
 reboot.

 Note that the machine does not dump properly, panic or hang - rather,
 it reboots.

 I attached a screenshot earlier in this thread of the little bit of
 error message I could see on the console.  The machine is trying to
 dump to the dump zvol, but fails to do so.  Only sometimes do I see an
 error on the machine's local console - mos times, it simply reboots.



 On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
 nathan.kroen...@sun.com wrote:

 Hm -

 Crashes, or hangs? Moreover - how do you know a CPU is pegged?

 Seems like we could do a little more discovery on what the actual problem
 here is, as I can read it about 4 different ways.

 By this last piece of information, I'm guessing the system does not
 crash,
 but goes really really slow??

 Crash == panic == we see stack dump on console and try to take a dump
 hang == nothing works == no response - might be worth looking at mdb -K
       or booting with a -k on the boot line.

 So - are we crashing, hanging, or something different?

 It might simply be that you are eating up all your memory, and your
 physical
 backing storage is taking a while to catch up?

 Nathan.

 Blake wrote:

 My dump device is already on a different controller - the motherboards
 built-in nVidia SATA controller.

 The raidz2 vdev is the one I'm having trouble with (copying the same
 files to the mirrored rpool on the nVidia controller work nicely).  I
 do notice that, when using cp to copy the files to the raidz2 pool,
 load on the machine climbs steadily until the crash, and one proc core
 pegs at 100%.

 Frustrating, yes.

 On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
 maidakalexand...@johndeere.com wrote:

 If you're having issues with a disk contoller or disk IO driver its
 highly likely that a savecore to disk after the panic will fail.  I'm
 not
 sure how to work around this, maybe a dedicated dump device not on a
 controller that uses a different driver then the one that you're having
 issues with?

 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
 Sent: Wednesday, March 11, 2009 4:45 PM
 To: Richard Elling
 Cc: Marc Bevand; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] reboot when copying large amounts of data

 I guess I didn't make it clear that I had already tried using savecore
 to
 retrieve the core from the dump device.

 I added a larger zvol for dump, to make sure that I wasn't running out
 of
 space on the dump device:

 r...@host:~# dumpadm
    Dump content: kernel pages
     Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
 directory: /var/crash/host
  Savecore enabled: yes

 I was using the -L option only to try to get some idea of why the
 system
 load was climbing to 1 during a simple file copy.



 On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
 richard.ell...@gmail.com wrote:

 Blake wrote:

 I'm attaching a screenshot of the console just before reboot.  The
 dump doesn't seem to be working, or savecore isn't working.

 On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com
 wrote:

 I'm working on testing this some more by doing a savecore -L right
 after I start the copy.


 savecore -L is not what you want.

 By default, for OpenSolaris, savecore on boot is disabled.  But the
 core will have been dumped into the dump slice, which is not used for
 swap.
 So you should be able to run savecore at a later time to collect the
 core from the last dump.
 -- richard


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 //
 // Nathan Kroenert              nathan.kroen...@sun.com         //
 // Systems Engineer             Phone:  +61 3 9869-6255         //
 // Sun Microsystems             Fax:    +61 3 9869-6288         //
 // Level 7, 476 St. Kilda Road  Mobile: 0419 305 456            //
 // Melbourne 3004   Victoria    Australia                       //
 

[zfs-discuss] Encryption through compression?

2009-03-12 Thread Monish Shah

Hello everyone,

My understanding is that the ZFS crypto framework will not release until 
2010.  In light of that, I'm wondering if the following approach to 
encryption could make sense for some subset of users:


The idea is to use the compression framework to do both compression and 
encryption in one pass.  This would be done by defining a new compression 
type, which might be called compress-encrypt or something like that. 
There could be two levels, one that does both compress and encrypt and 
another that does encrypt only.


I see the following issues with this approach:

1.  ZFS compression framework presently takes compressed data only if there 
was at least 12.5% reduction.  For data that didn't compress, you would wind 
up storing it unencrypted, even if encryption was on.


2.  Meta-data would not be encrypted.  I.e., even if you don't have the key, 
you will be able to do directory listings and see file names, etc.


3.  There is no key management framework.

I would deal with these as follows:

Issue #1 can be solved by changing ZFS code such that it always accepts the 
compressed data.  I guess this is an easy change.


Issue #2 may be a limitation to some and feature to others.  May be OK.

Issue #3 can be solved using encryption hardware (which my company happens 
to make).  The keys are stored in hardware and can be used directly from 
that.  Of course, this means that the solution will be specific to our 
hardware, but that's fine by me.


The idea is that we would do this project on our own and supply this 
modified ZFS with our compression/encryption hardware to our customers.  We 
may submit the patch for inclusion in some future version of OS, if the 
developers are amenable to that.


Does anyone see any problems with this?  There are probably various gotchas 
here that I haven't thought of.  If you can think of any, please let me 
know.


Thanks,

Monish

Monish Shah
CEO, Indra Networks, Inc.
www.indranetworks.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption through compression?

2009-03-12 Thread Darren J Moffat

Monish Shah wrote:

Hello everyone,

My understanding is that the ZFS crypto framework will not release until 
2010. 


That is incorrect information, where did you get that from ?

 In light of that, I'm wondering if the following approach to

encryption could make sense for some subset of users:

The idea is to use the compression framework to do both compression and 
encryption in one pass.  This would be done by defining a new 
compression type, which might be called compress-encrypt or something 
like that. There could be two levels, one that does both compress and 
encrypt and another that does encrypt only.


I see the following issues with this approach:

1.  ZFS compression framework presently takes compressed data only if 
there was at least 12.5% reduction.  For data that didn't compress, you 
would wind up storing it unencrypted, even if encryption was on.


2.  Meta-data would not be encrypted.  I.e., even if you don't have the 
key, you will be able to do directory listings and see file names, etc.


3.  There is no key management framework.


That is impossible there has to be key management somewhere.


I would deal with these as follows:

Issue #1 can be solved by changing ZFS code such that it always accepts 
the compressed data.  I guess this is an easy change.


Issue #2 may be a limitation to some and feature to others.  May be OK.

Issue #3 can be solved using encryption hardware (which my company 
happens to make).  The keys are stored in hardware and can be used 
directly from that.  Of course, this means that the solution will be 
specific to our hardware, but that's fine by me.


The idea is that we would do this project on our own and supply this 
modified ZFS with our compression/encryption hardware to our customers.  
We may submit the patch for inclusion in some future version of OS, if 
the developers are amenable to that.


If it is specific to your companies hardware I doubt it would ever get 
integrated into OpenSolaris particularly given the existing zfs-crypto 
project has no hardware dependencies at all.


The better way to use your encryption hardware is to get it plugged into 
the OpenSolaris cryptographic framework (see the crypto project on 
OpenSolaris.org)


Does anyone see any problems with this?  There are probably various 
gotchas here that I haven't thought of.  If you can think of any, please 
let me know.


The various gotchas are the things that have been taking me and the rest 
of the ZFS team a large part of the zfs-crypto project to resolve.  It 
really isn't as simple as you think it is - if it were then the 
zfs-crypto project would be done by now!


If you really want to help get encryption for ZFS then please come and 
join the already existing project rather than starting another one from 
scratch.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption through compression?

2009-03-12 Thread Darren J Moffat

Monish Shah wrote:

Hello Darren,



Monish Shah wrote:

Hello everyone,

My understanding is that the ZFS crypto framework will not release 
until 2010.


That is incorrect information, where did you get that from ?


It was in Mike Shapiro's presentation at the Open Solaris Storage Summit 
that took place a couple of weeks ago.  Perhaps I mis-read the slide, 
but I'm pretty sure it listed encryption as a feature for 2010.


That is for its availablity in the S7000 appliance.  It will be in 
OpenSolaris before that (it has to be because the S7000 is based on an 
OpenSolaris build).


If the schedule is much sooner than 2010, I would definitely do so.  
What is your present schedule estimate?


I can't commit to this yet but I expect somewhere around August 2009.

Note that the code in hg.opensolaris.org/hg/zfs-crypto/gate actually 
works today and encrypts more than what your proposal work.  It is just 
that we are making some design changes to simplify the model and ensure 
that encryption integrates with other ZFS features coming along.


There will be a design update posted to zfs-crypto-discuss@ later this 
month.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption through compression?

2009-03-12 Thread Monish Shah

Hello Darren,



Monish Shah wrote:

Hello everyone,

My understanding is that the ZFS crypto framework will not release until 
2010.


That is incorrect information, where did you get that from ?


It was in Mike Shapiro's presentation at the Open Solaris Storage Summit 
that took place a couple of weeks ago.  Perhaps I mis-read the slide, but 
I'm pretty sure it listed encryption as a feature for 2010.


...


3.  There is no key management framework.


That is impossible there has to be key management somewhere.


What I meant was, the compression framework does not have key management 
framework.  Using our hardware (which I mentioned later in my mail), the key 
management would come with the hardware, since we store keys in the 
hardware.  We provide a utility to manage the keys stored in the hardware.


...

If it is specific to your companies hardware I doubt it would ever get 
integrated into OpenSolaris particularly given the existing zfs-crypto 
project has no hardware dependencies at all.


The better way to use your encryption hardware is to get it plugged into 
the OpenSolaris cryptographic framework (see the crypto project on 
OpenSolaris.org)


That was precisely what I want thinking originally.  However, if it is out 
in 2010, there is temptation to do our own project, which I thought could be 
done in a couple of months.  (In light of your comment below, my estimate 
may have been wildly optimistic, but the foregoing is merely an explanation 
of what I was thinking.)


Does anyone see any problems with this?  There are probably various 
gotchas here that I haven't thought of.  If you can think of any, please 
let me know.


The various gotchas are the things that have been taking me and the rest 
of the ZFS team a large part of the zfs-crypto project to resolve.  It 
really isn't as simple as you think it is - if it were then the zfs-crypto 
project would be done by now!


If you really want to help get encryption for ZFS then please come and 
join the already existing project rather than starting another one from 
scratch.


If the schedule is much sooner than 2010, I would definitely do so.  What is 
your present schedule estimate?



--
Darren J Moffat



Monish 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Export ZFS via ISCSI to Linux - Is it stable for production use now?

2009-03-12 Thread howard chen
Hi,

On Thu, Mar 12, 2009 at 1:19 AM, Darren J Moffat
darr...@opensolaris.org wrote:

 That is all that has to be done on the OpenSolaris side to make a 10g lun
 available over iSCSI.  The rest of it is all how Linux sets up its iSCSI
 client side which I don't know but I know on Solaris it is very easy using
 iscsiadm(1M).


Thanks for your detail steps.

Bbut I think using this setup, only one client can mount the share
blocks at a time? So there must be a need of clustered file system.
(e.g. gfs)

Just out of curious, what is the clustered file system used in Sun
Unified Storage 7000 series for data sharing ammong clients?

Thanks.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on a SAN

2009-03-12 Thread Sriram Narayanan
On Thu, Mar 12, 2009 at 2:12 AM, Erik Trimble erik.trim...@sun.com wrote:
snip/

 On the SAN, create (2) LUNs - one for your primary data, and one for
 your snapshots/backups.

 On hostA, create a zpool on the primary data LUN (call it zpool A), and
 another zpool on the backup LUN (zpool B).  Take snapshots on A, then
 use 'zfs send' and 'zfs receive' to copy the clone/snapshot over to
 zpool B. then 'zpool export B'

Shouldn't this be 'zpool export A' ?

-- Sriram
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CLI grinds to a halt during backups

2009-03-12 Thread Marius van Vuuren
Hi,

I have a X4150 with a J4200 connected populated with 12 x 1 TB Disks (SATA)

I run backup_pc as my software for backing up.

Is there anything I can do to make the command line more responsive during
backup windows? At the moment it grinds to a complete standstill.

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Bob Friesenhahn

On Thu, 12 Mar 2009, Jorgen Lundman wrote:

User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process will open 
'/dev/quota' and empty the transaction log entries constantly. Take the 
uid,gid entries and update the byte-count in its database. How we store this 
database is up to us, but since it is in user-land it should have more 
flexibility, and is not as critical to be fast as it would have to be in 
kernel.


In order for this to work, ZFS data blocks need to somehow be 
associated with a POSIX user ID.  To start with, the ZFS POSIX layer 
is implemented on top of a non-POSIX Layer which does not need to know 
about POSIX user IDs.  ZFS also supports snapshots and clones.


The support for snapshots, clones, and potentially non-POSIX data 
storage, results in ZFS data blocks which are owned by multiple users 
at the same time, or multiple users over a period of time spanned by 
multiple snapshots.  If ZFS clones are modified, then files may have 
their ownership changed, while the unmodified data continues to be 
shared with other users.  If a cloned file has its ownership changed, 
then it would be quite tedious to figure out which blocks are now 
wholely owned by the new user, and which blocks are shared with other 
users.  By the time the analysis is complete, it will be wrong.


Before ZFS can apply per-user quota management, it is necessary to 
figure out how individual blocks can be charged to a user.  This seems 
to be a very complex issue and common usage won't work with your 
proposal.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on a SAN

2009-03-12 Thread Grant Lowe

Hi Erik,

A couple of questions about what you said in your email.  In synopsis 2, if 
hostA has gone belly up and is no longer accessible, then a step that is 
implied (or maybe I'm just inferring it) is to go to the SAN and reassign the 
LUN from hostA to hostB.  Correct?



- Original Message 
From: Erik Trimble erik.trim...@sun.com
To: Grant Lowe gl...@sbcglobal.net
Cc: zfs-discuss@opensolaris.org
Sent: Wednesday, March 11, 2009 1:42:06 PM
Subject: Re: [zfs-discuss] ZFS on a SAN

I'm not 100% sure what your question here is, but let me give you a
(hopefully) complete answer:

(1) ZFS is NOT a clustered file system, in the sense that it is NOT
possible for two hosts to have the same LUN mounted at the same time,
even if both are hooked to a SAN and can normally see that LUN.

(2) ZFS can do failover, however.  If you have a LUN from a SAN on
hostA, create a ZFS pool in it, and use as normal.  Should you with to
failover the LUN to hostB, you need to do a 'zpool export zpool' on
hostA, then 'zpool import zpool' on hostB.  If hostA has been lost
completely (hung/died/etc) and you are unable to do an 'export' on it,
you can force the import on hostB via 'zpool import -f zpool'


ZFS requires that you import/export entire POOLS, not just filesystems.
So, given what you seem to want, I'd recommend this:

On the SAN, create (2) LUNs - one for your primary data, and one for
your snapshots/backups.

On hostA, create a zpool on the primary data LUN (call it zpool A), and
another zpool on the backup LUN (zpool B).  Take snapshots on A, then
use 'zfs send' and 'zfs receive' to copy the clone/snapshot over to
zpool B. then 'zpool export B'

On hostB, import the snapshot pool:  'zfs import B'



It might just be as easy to have two independent zpools on each host,
and just do a 'zfs send' on hostA, and 'zfs receive' on hostB to copy
the snapshot/clone over the wire.

-Erik



On Wed, 2009-03-11 at 13:18 -0700, Grant Lowe wrote:
 Hi All,
 
 I'm new on ZFS, so I hope this isn't too basic a question.  I have a host 
 where I setup ZFS.  The Oracle DBAs did their thing and I know have a number 
 of ZFS datasets with their respective clones and snapshots on serverA.  I 
 want to export some of the clones to serverB.  Do I need to zone serverB to 
 see the same LUNs as serverA?  Or does it have to have preexisting, empty 
 LUNs to import the clones?  Please help.  Thanks.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Eric Schrock
Note that:

6501037 want user/group quotas on ZFS 

Is already committed to be fixed in build 113 (i.e. in the next month).

- Eric

On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote:
 
 In the style of a discussion over a beverage, and talking about 
 user-quotas on ZFS, I recently pondered a design for implementing user 
 quotas on ZFS after having far too little sleep.
 
 It is probably nothing new, but I would be curious what you experts 
 think of the feasibility of implementing such a system and/or whether or 
 not it would even realistically work.
 
 I'm not suggesting that someone should do the work, or even that I will, 
 but rather in the interest of chatting about it.
 
 Feel free to ridicule me as required! :)
 
 Thoughts:
 
 Here at work we would like to have user quotas based on uid (and 
 presumably gid) to be able to fully replace the NetApps we run. Current 
 ZFS are not good enough for our situation. We simply can not mount 
 500,000 file-systems on all the NFS clients. Nor do all servers we run 
 support mirror-mounts. Nor do auto-mount see newly created directories 
 without a full remount.
 
 Current UFS-style-user-quotas are very exact. To the byte even. We do 
 not need this precision. If a user has 50MB of quota, and they are able 
 to reach 51MB usage, then that is acceptable to us. Especially since 
 they have to go under 50MB to be able to write new data, anyway.
 
 Instead of having complicated code in the kernel layer, slowing down the 
 file-system with locking and semaphores (and perhaps avoiding learning 
 indepth ZFS code?), I was wondering if a more simplistic setup could be 
 designed, that would still be acceptable. I will use the word 
 'acceptable' a lot. Sorry.
 
 My thoughts are that the ZFS file-system will simply write a 
 'transaction log' on a pipe. By transaction log I mean uid, gid and 
 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but 
 it could be a fifo, pipe or socket. But currently I'm thinking 
 '/dev/quota' style.
 
 User-land will then have a daemon, whether or not it is one daemon per 
 file-system or really just one daemon does not matter. This process will 
 open '/dev/quota' and empty the transaction log entries constantly. Take 
 the uid,gid entries and update the byte-count in its database. How we 
 store this database is up to us, but since it is in user-land it should 
 have more flexibility, and is not as critical to be fast as it would 
 have to be in kernel.
 
 The daemon process can also grow in number of threads as demand increases.
 
 Once a user's quota reaches the limit (note here that /the/ call to 
 write() that goes over the limit will succeed, and probably a couple 
 more after. This is acceptable) the process will blacklist the uid in 
 kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
 will be denied. Naturally calls to unlink/read etc should still succeed. 
 If the uid goes under the limit, the uid black-listing will be removed.
 
 If the user-land process crashes or dies, for whatever reason, the 
 buffer of the pipe will grow in the kernel. If the daemon is restarted 
 sufficiently quickly, all is well, it merely needs to catch up. If the 
 pipe does ever get full and items have to be discarded, a full-scan will 
 be required of the file-system. Since even with UFS quotas we need to 
 occasionally run 'quotacheck', it would seem this too, is acceptable (if 
 undesirable).
 
 If you have no daemon process running at all, you have no quotas at all. 
 But the same can be said about quite a few daemons. The administrators 
 need to adjust their usage.
 
 I can see a complication with doing a rescan. How could this be done 
 efficiently? I don't know if there is a neat way to make this happen 
 internally to ZFS, but from a user-land only point of view, perhaps a 
 snapshot could be created (synchronised with the /dev/quota pipe 
 reading?) and start a scan on the snapshot, while still processing 
 kernel log. Once the scan is complete, merge the two sets.
 
 Advantages are that only small hooks are required in ZFS. The byte 
 updates, and the blacklist with checks for being blacklisted.
 
 Disadvantages are that it is loss of precision, and possibly slower 
 rescans? Sanity?
 
 But I do not really know the internals of ZFS, so I might be completely 
 wrong, and everyone is laughing already.
 
 Discuss?
 
 Lund
 
 -- 
 Jorgen Lundman   | lund...@lundman.net
 Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
 Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
 Japan| +81 (0)3 -3375-1767  (home)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] Export ZFS via ISCSI to Linux - Is it stable for production use now?

2009-03-12 Thread Darren J Moffat

howard chen wrote:

Hi,

On Thu, Mar 12, 2009 at 1:19 AM, Darren J Moffat
darr...@opensolaris.org wrote:


That is all that has to be done on the OpenSolaris side to make a 10g lun
available over iSCSI.  The rest of it is all how Linux sets up its iSCSI
client side which I don't know but I know on Solaris it is very easy using
iscsiadm(1M).



Thanks for your detail steps.

Bbut I think using this setup, only one client can mount the share
blocks at a time? So there must be a need of clustered file system.
(e.g. gfs)


iSCSI doesn't enforce that but the filesystem you run on top of the LUNs 
might.  All the Linux side sees is a block device - that is the whole 
point of using iSCSI.  If you don't want a block device then iSCSI (and 
FCoE) are the wrong protocols to be using.



Just out of curious, what is the clustered file system used in Sun
Unified Storage 7000 series for data sharing ammong clients?


The S7000 doesn't use a cluster filesystem it exports ZFS datasets using 
 one or more of iSCSI, NFS, CIFS, WebDAV, FTP, ie 
network filesystems or filetransfer protocols or a block protocol.


When there is an S7000 cluster configuration the cluster is 
Active/Active with each head controlling one data pool and the services 
for it.  When a cluster head fails the other head takes over the pool 
and the network addresses and starts to provide the services from a 
single head.


This doesn't require a cluster filesystem

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Blake
That is pretty freaking cool.

On Thu, Mar 12, 2009 at 11:38 AM, Eric Schrock eric.schr...@sun.com wrote:
 Note that:

 6501037 want user/group quotas on ZFS

 Is already committed to be fixed in build 113 (i.e. in the next month).

 - Eric

 On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote:

 In the style of a discussion over a beverage, and talking about
 user-quotas on ZFS, I recently pondered a design for implementing user
 quotas on ZFS after having far too little sleep.

 It is probably nothing new, but I would be curious what you experts
 think of the feasibility of implementing such a system and/or whether or
 not it would even realistically work.

 I'm not suggesting that someone should do the work, or even that I will,
 but rather in the interest of chatting about it.

 Feel free to ridicule me as required! :)

 Thoughts:

 Here at work we would like to have user quotas based on uid (and
 presumably gid) to be able to fully replace the NetApps we run. Current
 ZFS are not good enough for our situation. We simply can not mount
 500,000 file-systems on all the NFS clients. Nor do all servers we run
 support mirror-mounts. Nor do auto-mount see newly created directories
 without a full remount.

 Current UFS-style-user-quotas are very exact. To the byte even. We do
 not need this precision. If a user has 50MB of quota, and they are able
 to reach 51MB usage, then that is acceptable to us. Especially since
 they have to go under 50MB to be able to write new data, anyway.

 Instead of having complicated code in the kernel layer, slowing down the
 file-system with locking and semaphores (and perhaps avoiding learning
 indepth ZFS code?), I was wondering if a more simplistic setup could be
 designed, that would still be acceptable. I will use the word
 'acceptable' a lot. Sorry.

 My thoughts are that the ZFS file-system will simply write a
 'transaction log' on a pipe. By transaction log I mean uid, gid and
 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but
 it could be a fifo, pipe or socket. But currently I'm thinking
 '/dev/quota' style.

 User-land will then have a daemon, whether or not it is one daemon per
 file-system or really just one daemon does not matter. This process will
 open '/dev/quota' and empty the transaction log entries constantly. Take
 the uid,gid entries and update the byte-count in its database. How we
 store this database is up to us, but since it is in user-land it should
 have more flexibility, and is not as critical to be fast as it would
 have to be in kernel.

 The daemon process can also grow in number of threads as demand increases.

 Once a user's quota reaches the limit (note here that /the/ call to
 write() that goes over the limit will succeed, and probably a couple
 more after. This is acceptable) the process will blacklist the uid in
 kernel. Future calls to creat/open(CREAT)/write/(insert list of calls)
 will be denied. Naturally calls to unlink/read etc should still succeed.
 If the uid goes under the limit, the uid black-listing will be removed.

 If the user-land process crashes or dies, for whatever reason, the
 buffer of the pipe will grow in the kernel. If the daemon is restarted
 sufficiently quickly, all is well, it merely needs to catch up. If the
 pipe does ever get full and items have to be discarded, a full-scan will
 be required of the file-system. Since even with UFS quotas we need to
 occasionally run 'quotacheck', it would seem this too, is acceptable (if
 undesirable).

 If you have no daemon process running at all, you have no quotas at all.
 But the same can be said about quite a few daemons. The administrators
 need to adjust their usage.

 I can see a complication with doing a rescan. How could this be done
 efficiently? I don't know if there is a neat way to make this happen
 internally to ZFS, but from a user-land only point of view, perhaps a
 snapshot could be created (synchronised with the /dev/quota pipe
 reading?) and start a scan on the snapshot, while still processing
 kernel log. Once the scan is complete, merge the two sets.

 Advantages are that only small hooks are required in ZFS. The byte
 updates, and the blacklist with checks for being blacklisted.

 Disadvantages are that it is loss of precision, and possibly slower
 rescans? Sanity?

 But I do not really know the internals of ZFS, so I might be completely
 wrong, and everyone is laughing already.

 Discuss?

 Lund

 --
 Jorgen Lundman       | lund...@lundman.net
 Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
 Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
 Japan                | +81 (0)3 -3375-1767          (home)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock
 ___
 

Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Matthew Ahrens

Jorgen Lundman wrote:


In the style of a discussion over a beverage, and talking about 
user-quotas on ZFS, I recently pondered a design for implementing user 
quotas on ZFS after having far too little sleep.


It is probably nothing new, but I would be curious what you experts 
think of the feasibility of implementing such a system and/or whether or 
not it would even realistically work.


I'm not suggesting that someone should do the work, or even that I will, 
but rather in the interest of chatting about it.


As it turns out, I'm working on zfs user quotas presently, and expect to 
integrate in about a month.  My implementation is in-kernel, integrated with 
the rest of ZFS, and does not have the drawbacks you mention below.



Feel free to ridicule me as required! :)

Thoughts:

Here at work we would like to have user quotas based on uid (and 
presumably gid) to be able to fully replace the NetApps we run. Current 
ZFS are not good enough for our situation. We simply can not mount 
500,000 file-systems on all the NFS clients. Nor do all servers we run 
support mirror-mounts. Nor do auto-mount see newly created directories 
without a full remount.


Current UFS-style-user-quotas are very exact. To the byte even. We do 
not need this precision. If a user has 50MB of quota, and they are able 
to reach 51MB usage, then that is acceptable to us. Especially since 
they have to go under 50MB to be able to write new data, anyway.


Good, that's the behavior that user quotas will have -- delayed enforcement.

Instead of having complicated code in the kernel layer, slowing down the 
file-system with locking and semaphores (and perhaps avoiding learning 
indepth ZFS code?), I was wondering if a more simplistic setup could be 
designed, that would still be acceptable. I will use the word 
'acceptable' a lot. Sorry.


My thoughts are that the ZFS file-system will simply write a 
'transaction log' on a pipe. By transaction log I mean uid, gid and 
'byte count changed'. And by pipe I don't necessarily mean pipe(2), but 
it could be a fifo, pipe or socket. But currently I'm thinking 
'/dev/quota' style.


User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process will 
open '/dev/quota' and empty the transaction log entries constantly. Take 
the uid,gid entries and update the byte-count in its database. How we 
store this database is up to us, but since it is in user-land it should 
have more flexibility, and is not as critical to be fast as it would 
have to be in kernel.


The daemon process can also grow in number of threads as demand increases.

Once a user's quota reaches the limit (note here that /the/ call to 
write() that goes over the limit will succeed, and probably a couple 
more after. This is acceptable) the process will blacklist the uid in 
kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
will be denied. Naturally calls to unlink/read etc should still succeed. 
If the uid goes under the limit, the uid black-listing will be removed.


If the user-land process crashes or dies, for whatever reason, the 
buffer of the pipe will grow in the kernel. If the daemon is restarted 
sufficiently quickly, all is well, it merely needs to catch up. If the 
pipe does ever get full and items have to be discarded, a full-scan will 
be required of the file-system. Since even with UFS quotas we need to 
occasionally run 'quotacheck', it would seem this too, is acceptable (if 
undesirable).


My implementation does not have this drawback.  Note that you would need to 
use the recovery mechanism in the case of a system crash / power loss as 
well.  Adding potentially hours to the crash recovery time is not acceptable.


If you have no daemon process running at all, you have no quotas at all. 
But the same can be said about quite a few daemons. The administrators 
need to adjust their usage.


I can see a complication with doing a rescan. How could this be done 
efficiently? I don't know if there is a neat way to make this happen 
internally to ZFS, but from a user-land only point of view, perhaps a 
snapshot could be created (synchronised with the /dev/quota pipe 
reading?) and start a scan on the snapshot, while still processing 
kernel log. Once the scan is complete, merge the two sets.


Advantages are that only small hooks are required in ZFS. The byte 
updates, and the blacklist with checks for being blacklisted.


Disadvantages are that it is loss of precision, and possibly slower 
rescans? Sanity?


Not to mention that this information needs to get stored somewhere, and dealt 
with when you zfs send the fs to another system.


But I do not really know the internals of ZFS, so I might be completely 
wrong, and everyone is laughing already.


Discuss?


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Tomas Ögren
On 12 March, 2009 - Matthew Ahrens sent me these 5,0K bytes:

 Jorgen Lundman wrote:

 In the style of a discussion over a beverage, and talking about  
 user-quotas on ZFS, I recently pondered a design for implementing user  
 quotas on ZFS after having far too little sleep.

 It is probably nothing new, but I would be curious what you experts  
 think of the feasibility of implementing such a system and/or whether 
 or not it would even realistically work.

 I'm not suggesting that someone should do the work, or even that I 
 will, but rather in the interest of chatting about it.

 As it turns out, I'm working on zfs user quotas presently, and expect to  
 integrate in about a month.  My implementation is in-kernel, integrated 
 with the rest of ZFS, and does not have the drawbacks you mention below.

Is there any chance of this getting into S10?

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Matthew Ahrens

Bob Friesenhahn wrote:

On Thu, 12 Mar 2009, Jorgen Lundman wrote:

User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process 
will open '/dev/quota' and empty the transaction log entries 
constantly. Take the uid,gid entries and update the byte-count in its 
database. How we store this database is up to us, but since it is in 
user-land it should have more flexibility, and is not as critical to 
be fast as it would have to be in kernel.


In order for this to work, ZFS data blocks need to somehow be associated 
with a POSIX user ID.  To start with, the ZFS POSIX layer is implemented 
on top of a non-POSIX Layer which does not need to know about POSIX user 
IDs.  ZFS also supports snapshots and clones.


Yes, the DMU needs to communicate with the ZPL to determine the uid  gid to 
charge each file to.  This is done using a callback.


The support for snapshots, clones, and potentially non-POSIX data 
storage, results in ZFS data blocks which are owned by multiple users at 
the same time, or multiple users over a period of time spanned by 
multiple snapshots.  If ZFS clones are modified, then files may have 
their ownership changed, while the unmodified data continues to be 
shared with other users.  If a cloned file has its ownership changed, 
then it would be quite tedious to figure out which blocks are now 
wholely owned by the new user, and which blocks are shared with other 
users.  By the time the analysis is complete, it will be wrong.


Before ZFS can apply per-user quota management, it is necessary to 
figure out how individual blocks can be charged to a user.  This seems 
to be a very complex issue and common usage won't work with your proposal.


Indeed.  We have decided to charge for referenced space.  This is the same 
concept used by the referenced, refquota, and refreservation 
properties, and reported by stat(2) in st_blocks, and du(1) on files today.


This makes the issue much simpler.  We don't need to worry about blocks being 
shared between clones or snapshots, because we charge for every time a block 
is referenced.  When a clone is created, it starts with the same user 
accounting information as its origin snapshot.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Miles Nordin
 maj == Maidak Alexander J maidakalexand...@johndeere.com writes:

   maj If you're having issues with a disk contoller or disk IO
   maj driver its highly likely that a savecore to disk after the
   maj panic will fail.  I'm not sure how to work around this

not in Solaris, but as a concept for solving the problem:

 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/kdump/kdump.txt;h=3f4bc840da8b7c068076dd057216e846e098db9f;hb=4a6908a3a050aacc9c3a2f36b276b46c0629ad91

They load a second kernel into a reserved spot of RAM, like 64MB or
so, and forget about it.  After a crash, they boot the second kernel.
The second kernel runs using the reserved area of RAM as its working
space, not touching any other memory, as if you were running on a very
old machine with tiny RAM.  It reprobes all the hardware, and then
performs the dump.  I don't know if it actually works, but the
approach is appropriate if you are trying to debug the storage stack.
You could even have a main kernel which crashes while taking an
ordinary coredump, and then use the backup dumping-kernel to coredump
the main kernel in mid-coredump---a dump of a dumping kernel.

I think some Solaris developers were discussing putting coredump
features into Xen, so the host could take the dump (or, maybe even
something better than a dump---for example, if you built host/target
debugging features into Xen for debugging running kernels, then you
could just force a breakpoint in the guest instead of panic.  Since
Xen can hibernate domU's onto disk (it can, right?), you can treat the
hibernated Xen-specific representation of the domU as the-dump,
groveling through the ``dump'' with the same host/target tools you
could use on a running kernel without any special dump support in the
debugger itself).  IIRC NetBSD developers discussed the same idea
years ago but neither implementation exists.


pgpsmSOamFWH7.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] usedby* properties for datasets created before v13

2009-03-12 Thread Matthew Ahrens

Gavin Maltby wrote:

Hi,

The manpage says

 Specifically,  used  =  usedbychildren + usedbydataset +
 usedbyrefreservation +, usedbysnapshots.  These  proper-
 ties  are  only  available for datasets created on zpool
 version 13 pools.

.. and I now realize that created at v13 is the important bit,
rather than created pre v13 and upgraded, and I
see that for datasets created on a version prior to 13
show - for these properties (might be nice to note that
in the manpage - I took - to mean zero for a while).

Anyway, is there any way to retrospectively populate these
statistics (avoiding dataset reconstruction, that is)?
No chance a scrub would/could do it?


In theory one could add code to calculate these after the fact.  A tricky 
part is differentiating between usedbydataset and usedbysnapshots for clones. 
 In that case you would need to examine all block pointers in the clone. 
Those born after the origin are usedbydataset, and usedbysnapshots is 
whatever's left over.  Doing this while things are changing may be nontrivial.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on a SAN

2009-03-12 Thread Scott Lawson

Grant,

Yes this is correct. If host A goes belly up, you can deassign the LUN 
from host A and assign to
host B. Being that host A has not gracefully exported it's zpool you 
will need to 'zpool import -f poolname'
to force the pool to be imported because it hasn't been exported prior 
to import due to the unexpected inaccessibility

of host A.

It is possible to have the LUN visible to both machines at the same 
time, just not in use by both machines. This
is in general how clusters work. Be aware that if you do do this and 
access the disk on both systems then you run

a very real risk of corruption of the volume.

I use the first approach here quite regularly in what I call 'poor  mans 
clustering'. ;) I tend to install all my software
and data environments on SAN based LUNS that allow ease of moving just 
by exporting the zpool , reassigning
the LUN then importing to the new system. Works well as long as both 
systems are of the same OS revision

or greater on the target system.

/Scott.


Grant Lowe wrote:

Hi Erik,

A couple of questions about what you said in your email.  In synopsis 2, if 
hostA has gone belly up and is no longer accessible, then a step that is 
implied (or maybe I'm just inferring it) is to go to the SAN and reassign the 
LUN from hostA to hostB.  Correct?



- Original Message 
From: Erik Trimble erik.trim...@sun.com
To: Grant Lowe gl...@sbcglobal.net
Cc: zfs-discuss@opensolaris.org
Sent: Wednesday, March 11, 2009 1:42:06 PM
Subject: Re: [zfs-discuss] ZFS on a SAN

I'm not 100% sure what your question here is, but let me give you a
(hopefully) complete answer:

(1) ZFS is NOT a clustered file system, in the sense that it is NOT
possible for two hosts to have the same LUN mounted at the same time,
even if both are hooked to a SAN and can normally see that LUN.

(2) ZFS can do failover, however.  If you have a LUN from a SAN on
hostA, create a ZFS pool in it, and use as normal.  Should you with to
failover the LUN to hostB, you need to do a 'zpool export zpool' on
hostA, then 'zpool import zpool' on hostB.  If hostA has been lost
completely (hung/died/etc) and you are unable to do an 'export' on it,
you can force the import on hostB via 'zpool import -f zpool'


ZFS requires that you import/export entire POOLS, not just filesystems.
So, given what you seem to want, I'd recommend this:

On the SAN, create (2) LUNs - one for your primary data, and one for
your snapshots/backups.

On hostA, create a zpool on the primary data LUN (call it zpool A), and
another zpool on the backup LUN (zpool B).  Take snapshots on A, then
use 'zfs send' and 'zfs receive' to copy the clone/snapshot over to
zpool B. then 'zpool export B'

On hostB, import the snapshot pool:  'zfs import B'



It might just be as easy to have two independent zpools on each host,
and just do a 'zfs send' on hostA, and 'zfs receive' on hostB to copy
the snapshot/clone over the wire.

-Erik



On Wed, 2009-03-11 at 13:18 -0700, Grant Lowe wrote:
  

Hi All,

I'm new on ZFS, so I hope this isn't too basic a question.  I have a host where 
I setup ZFS.  The Oracle DBAs did their thing and I know have a number of ZFS 
datasets with their respective clones and snapshots on serverA.  I want to 
export some of the clones to serverB.  Do I need to zone serverB to see the 
same LUNs as serverA?  Or does it have to have preexisting, empty LUNs to 
import the clones?  Please help.  Thanks.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
___


Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand

Phone  : +64 09 968 7611
Fax: +64 09 968 7641
Mobile : +64 27 568 7611

mailto:sc...@manukau.ac.nz

http://www.manukau.ac.nz




perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'

 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Blake
I've managed to get the data transfer to work by rearranging my disks
so that all of them sit on the integrated SATA controller.

So, I feel pretty certain that this is either an issue with the
Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard (though
I would think that the integrated SATA would also be using the PCI
bus?).

The motherboard, for those interested, is an HD8ME-2 (not, I now find
after buying this box from Silicon Mechanics, a board that's on the
Solaris HCL...)

http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm

So I'm not considering one of LSI's HBA's - what do list members think
about this device:

http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm



On Thu, Mar 12, 2009 at 2:18 AM, Nathan Kroenert
nathan.kroen...@sun.com wrote:
 definitely time to bust out some mdb -K or boot -k and see what it's moaning
 about.

 I did not see the screenshot earlier... sorry about that.

 Nathan.

 Blake wrote:

 I start the cp, and then, with prstat -a, watch the cpu load for the
 cp process climb to 25% on a 4-core machine.

 Load, measured for example with 'uptime', climbs steadily until the
 reboot.

 Note that the machine does not dump properly, panic or hang - rather,
 it reboots.

 I attached a screenshot earlier in this thread of the little bit of
 error message I could see on the console.  The machine is trying to
 dump to the dump zvol, but fails to do so.  Only sometimes do I see an
 error on the machine's local console - mos times, it simply reboots.



 On Thu, Mar 12, 2009 at 1:55 AM, Nathan Kroenert
 nathan.kroen...@sun.com wrote:

 Hm -

 Crashes, or hangs? Moreover - how do you know a CPU is pegged?

 Seems like we could do a little more discovery on what the actual problem
 here is, as I can read it about 4 different ways.

 By this last piece of information, I'm guessing the system does not
 crash,
 but goes really really slow??

 Crash == panic == we see stack dump on console and try to take a dump
 hang == nothing works == no response - might be worth looking at mdb -K
       or booting with a -k on the boot line.

 So - are we crashing, hanging, or something different?

 It might simply be that you are eating up all your memory, and your
 physical
 backing storage is taking a while to catch up?

 Nathan.

 Blake wrote:

 My dump device is already on a different controller - the motherboards
 built-in nVidia SATA controller.

 The raidz2 vdev is the one I'm having trouble with (copying the same
 files to the mirrored rpool on the nVidia controller work nicely).  I
 do notice that, when using cp to copy the files to the raidz2 pool,
 load on the machine climbs steadily until the crash, and one proc core
 pegs at 100%.

 Frustrating, yes.

 On Thu, Mar 12, 2009 at 12:31 AM, Maidak Alexander J
 maidakalexand...@johndeere.com wrote:

 If you're having issues with a disk contoller or disk IO driver its
 highly likely that a savecore to disk after the panic will fail.  I'm
 not
 sure how to work around this, maybe a dedicated dump device not on a
 controller that uses a different driver then the one that you're having
 issues with?

 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Blake
 Sent: Wednesday, March 11, 2009 4:45 PM
 To: Richard Elling
 Cc: Marc Bevand; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] reboot when copying large amounts of data

 I guess I didn't make it clear that I had already tried using savecore
 to
 retrieve the core from the dump device.

 I added a larger zvol for dump, to make sure that I wasn't running out
 of
 space on the dump device:

 r...@host:~# dumpadm
    Dump content: kernel pages
     Dump device: /dev/zvol/dsk/rpool/bigdump (dedicated) Savecore
 directory: /var/crash/host
  Savecore enabled: yes

 I was using the -L option only to try to get some idea of why the
 system
 load was climbing to 1 during a simple file copy.



 On Wed, Mar 11, 2009 at 4:58 PM, Richard Elling
 richard.ell...@gmail.com wrote:

 Blake wrote:

 I'm attaching a screenshot of the console just before reboot.  The
 dump doesn't seem to be working, or savecore isn't working.

 On Wed, Mar 11, 2009 at 11:33 AM, Blake blake.ir...@gmail.com
 wrote:

 I'm working on testing this some more by doing a savecore -L right
 after I start the copy.


 savecore -L is not what you want.

 By default, for OpenSolaris, savecore on boot is disabled.  But the
 core will have been dumped into the dump slice, which is not used for
 swap.
 So you should be able to run savecore at a later time to collect the
 core from the last dump.
 -- richard


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 

Re: [zfs-discuss] CLI grinds to a halt during backups

2009-03-12 Thread Jeff Williams

Maybe you're also seeing this one?

6586537 async zio taskqs can block out userland commands

-Jeff


Blake wrote:

I think we need some data to look at to find out what's being slow.
Try some commands like this to get data:

prstat -a

iostat -x 5

zpool iostat 5 (if you are using ZFS)

and then report sample output to this list.


You might also consider enabling sar (svcadm enable sar), then reading
the sar manpage.




On Thu, Mar 12, 2009 at 10:36 AM, Marius van Vuuren
mar...@breakpoint.co.za wrote:

Hi,

I have a X4150 with a J4200 connected populated with 12 x 1 TB Disks (SATA)

I run backup_pc as my software for backing up.

Is there anything I can do to make the command line more responsive during
backup windows? At the moment it grinds to a complete standstill.

Thanks



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Tim
On Thu, Mar 12, 2009 at 2:22 PM, Blake blake.ir...@gmail.com wrote:

 I've managed to get the data transfer to work by rearranging my disks
 so that all of them sit on the integrated SATA controller.

 So, I feel pretty certain that this is either an issue with the
 Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard (though
 I would think that the integrated SATA would also be using the PCI
 bus?).

 The motherboard, for those interested, is an HD8ME-2 (not, I now find
 after buying this box from Silicon Mechanics, a board that's on the
 Solaris HCL...)

 http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm
 

 So I'm not considering one of LSI's HBA's - what do list members think
 about this device:

 http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htmhttp://www.provantage.com/lsi-logic-lsi00117%7E7LSIG03X.htm
 



I believe the MCP55's SATA controllers are actually PCI-E based.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Dave



Tim wrote:



On Thu, Mar 12, 2009 at 2:22 PM, Blake blake.ir...@gmail.com 
mailto:blake.ir...@gmail.com wrote:


I've managed to get the data transfer to work by rearranging my disks
so that all of them sit on the integrated SATA controller.

So, I feel pretty certain that this is either an issue with the
Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard (though
I would think that the integrated SATA would also be using the PCI
bus?).

The motherboard, for those interested, is an HD8ME-2 (not, I now find
after buying this box from Silicon Mechanics, a board that's on the
Solaris HCL...)

http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm

So I'm not considering one of LSI's HBA's - what do list members think
about this device:

http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm
http://www.provantage.com/lsi-logic-lsi00117%7E7LSIG03X.htm



I believe the MCP55's SATA controllers are actually PCI-E based.


I use Tyan 2927 motherboards. They have on-board nVidia MCP55 chipsets, 
which is the same chipset at the X4500 (IIRC). I wouldn't trust the 
MCP55 chipset in OpenSolaris. I had random disk hangs even while the 
machine was mostly idle.


In Feb 2008 I bought AOC-SAT2-MV8 cards and moved all my drives to these 
add-in cards. I haven't had any issues with drive hanging since. There 
does not seem to be any problems with the SAT2-MV8 under heavy load in 
my servers from what I've seen.


When the SuperMicro AOC-USAS-L8i came out later last year, I started 
using them instead. They work better than the SAT2-MV8s.


This card needs a 3U or bigger case:
http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm

This is the low profile card that will fit in a 2U:
http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm

They both work in normal PCI-E slots on my Tyan 2927 mobos.

Finding good non-Sun hardware that works very well under OpenSolaris is 
frustrating to say the least. Good luck.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Miles Nordin
 b == Blake  blake.ir...@gmail.com writes:

 b http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm

I'm having trouble matching up chips, cards, drivers, platforms, and
modes with the LSI stuff.  The more I look at it the mroe confused I
get.

Platforms:
 x86
 SPARC

Drivers:
 mpt
 mega_sas
 mfi

Chips:
 1068   (SAS, PCI-X)
 1068E  (SAS, PCIe)
 1078   ???  
   -- from supermicro, seems to be SAS, PCIe, with support for 256 -
  512MB RAM instead of the 16 - 32MB RAM on the others
 1030   (parallel scsi)

Cards:
  LSI cards 
http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/index.html
  I love the way they use the numbers 3800 and 3080, so you are
  constantly transposing them thus leaving google littered with all
  this confusingly wrong information.

LSISAS3800X(PCI-X, external ports)
LSISAS3080X-R  (PCI-X, internal ports)

LSISAS3801X(PCI-X, external ports)

LSISAS3801E(PCIe, external ports)
LSISAS3081E-R  (PCIe, internal ports)

  I would have thought -R meant ``suports RAID'' but all I can really
  glean through the foggy marketing-glass behind which all the
  information is hidden, is -R means ``all the ports are internal''.

  Supermicro cards http://www.supermicro.com/products/accessories/index.cfm
wow, this is even more of a mess.
 These are all UIO cards so I assume they have the PCIe bracket on backwards
AOC-USAS-L4i(PCIe, 4 internal 4 external)
AOC-USAS-L8i, AOC-USASLP-L8i(PCIe, internal ports)
   based on 1068E
   sounds similar to LSISAS3081E.  Is that also 1068E?
   supports RAID0, RAID1, RAID10
   AOC-USAS-L4iR
   identical to the above, but ``includes iButton''
   which is an old type of smartcard-like device with 
   sometimes crypto and javacard support.
   apparently some kind of license key to 
   unlock RAID5?  no L8iR exists though, only L4iR.
   I have the L8i, and it does have an iButton socket
   with no button in it.
AOC-USAS-H4iR
AOC-USAS-H8iR, AOC-USASLP-H8iR  (PCIe, internal ports)
   based on 1078
   low-profile version has more memory than fullsize version?!

but here is the most fun thing about the supermicro cards.  All
cards have one driver *EXCEPT* the L8i, which has three drivers
for three modes: IT, IR, and SR.  When I google for this I find
notes on some of their integrated motherboards like:

 * The onboard LSI 1068E supported SR and IT mode but not IR mode.

I also found this:

 * SR = Software RAID IT = Integrate. Target mode. IR mode is not supported.

but no idea what the three modes are.  searching for SAS SR IT IR
doesn't work either, so it's not some SAS thing.  What *is* it?

also there seem to be two different kinds of quad-SATA connector on
these SAS cards so there are two different kinds of octopus cable.

Questions:

 * which chips are used by each of the LSI boards?  I can guess, but
   in particular LSISAS3800X and LSISAS3801X seem to be different
   chips, while from the list of chips I'd have no choice but to guess
   they are both 1068.

 * which drivers work on x86 and which SPARC?  I know some LSI cards
   work in SPARC but maybe not all---do the drivers support the same
   set of cards on both platforms?  Or will normal cards not work in
   SPARC for lack of Forth firmware to perform some LSI-proprietary
   ``initialization'' ritual?

 * which chips go with which drivers?  Is it even that simple---will
   adding an iButton RAID5 license to a SuperMicro board make the same
   card change from mega_sas to mpt attachment, or something similar?

   For example there is a bug here about a 1068E card which doesn't
   work, even though most 1068E cards do work:

http://bugs.opensolaris.org/view_bug.do?bug_id=6736187

   Maybe the Solaris driver needs IR mode and won't work with the
   onboard supermicro chip which supports only ``software raid''
   whatever that means, which is maybe denoted by SR?  What does the
   iButton unlock, then, features of IR mode which are abstracted from
   the OS driver?

 * What are SR, IT, and IR mode?  Which modes do the Solaris drivers
   use, or does it matter?

 * Has someone found the tool mentioned here by some above-the-table
   means, or only by request from LSI?:

http://www.opensolaris.org/jive/message.jspa?messageID=184811#184811

   The mention that a SPARC version of the tool exists is encouraging.
   The procedure to clear persistent mappings through the BIOS
   obviously won't work on SPARC.

Here are the notes I have so far:

-8-
 The driver for LSI's MegaRAID SAS card is mega_sas which
 was integrated into snv_88. It's planned for backporting to
 a Solaris 10 update.
There is also a BSD-licensed driver for that hardware, called
mfi. It's available from
http://www.itee.uq.edu.au/~dlg/mfi

 a scsi_vhci
 sort of driver for the LSI card in the Ultra {20,25}
Well yes, that's mpt(7d) as delivered 

Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Nathan Kroenert
For what it's worth, I have been running Nevada (so, same kernel as 
opensolaris) for ages (at least 18 months) on a Gigabyte board with the 
MCP55 chipset and it's been flawless.


I liked it so much, I bought it's newer brother, based on the nvidia 
750SLI chipset...   M750SLI-DS4


Cheers!

Nathan.


On 13/03/09 09:21 AM, Dave wrote:



Tim wrote:



On Thu, Mar 12, 2009 at 2:22 PM, Blake blake.ir...@gmail.com 
mailto:blake.ir...@gmail.com wrote:


I've managed to get the data transfer to work by rearranging my disks
so that all of them sit on the integrated SATA controller.

So, I feel pretty certain that this is either an issue with the
Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard 
(though

I would think that the integrated SATA would also be using the PCI
bus?).

The motherboard, for those interested, is an HD8ME-2 (not, I now find
after buying this box from Silicon Mechanics, a board that's on the
Solaris HCL...)


http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm 



So I'm not considering one of LSI's HBA's - what do list members 
think

about this device:

http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm
http://www.provantage.com/lsi-logic-lsi00117%7E7LSIG03X.htm



I believe the MCP55's SATA controllers are actually PCI-E based.


I use Tyan 2927 motherboards. They have on-board nVidia MCP55 chipsets, 
which is the same chipset at the X4500 (IIRC). I wouldn't trust the 
MCP55 chipset in OpenSolaris. I had random disk hangs even while the 
machine was mostly idle.


In Feb 2008 I bought AOC-SAT2-MV8 cards and moved all my drives to these 
add-in cards. I haven't had any issues with drive hanging since. There 
does not seem to be any problems with the SAT2-MV8 under heavy load in 
my servers from what I've seen.


When the SuperMicro AOC-USAS-L8i came out later last year, I started 
using them instead. They work better than the SAT2-MV8s.


This card needs a 3U or bigger case:
http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm

This is the low profile card that will fit in a 2U:
http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm

They both work in normal PCI-E slots on my Tyan 2927 mobos.

Finding good non-Sun hardware that works very well under OpenSolaris is 
frustrating to say the least. Good luck.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--


//
// Nathan Kroenert  nathan.kroen...@sun.com //
// Senior Systems Engineer  Phone:  +61 3 9869 6255 //
// Global Systems Engineering   Fax:+61 3 9869 6288 //
// Level 7, 476 St. Kilda Road  //
// Melbourne 3004   VictoriaAustralia   //
//
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Will Murnane
On Thu, Mar 12, 2009 at 18:30, Miles Nordin car...@ivy.net wrote:
  I love the way they use the numbers 3800 and 3080, so you are
  constantly transposing them thus leaving google littered with all
  this confusingly wrong information.
Think of the middle two digits as (number of external ports, number of
internal ports).  For example, I have a 3442E-R which has 4 internal
and 4 external ports, the 3800 has 8 external ports and 0 internal,
and so forth.  One place this breaks down is with cards like the ;
it has a total of 8 ports, any group of 4 of which can be mapped to
internal or external ports.

   AOC-USAS-L4iR
       identical to the above, but ``includes iButton''
       which is an old type of smartcard-like device with
       sometimes crypto and javacard support.
       apparently some kind of license key to
       unlock RAID5?  no L8iR exists though, only L4iR.
       I have the L8i, and it does have an iButton socket
       with no button in it.
I think the iButton is just used as an unlock code for the builtin
RAID 5 functionality.  Nothing the end user cares about, unless they
want RAID and have to spend the extra money.

     * SR = Software RAID IT = Integrate. Target mode. IR mode is not 
 supported.
Integrated target mode lets you export some storage attached to the
host system (through another adapter, presumably) as a storage device.
 IR mode is almost certainly Internal RAID, which that card doesn't
have support for.

 also there seem to be two different kinds of quad-SATA connector on
 these SAS cards so there are two different kinds of octopus cable.
Yes---SFF-8484 and SFF-8087 are the key words.

 SATA disks will always show up when attached to a SAS HBA,
 because that's one of the requirements of the SAS specification.
I'm not sure what you mean by this.  SAS controllers can control SATA
disks, and interact with them.  They don't just show up; they're
first-class citizens.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman



Bob Friesenhahn wrote:
In order for this to work, ZFS data blocks need to somehow be associated 
with a POSIX user ID.  To start with, the ZFS POSIX layer is implemented 
on top of a non-POSIX Layer which does not need to know about POSIX user 
IDs.  ZFS also supports snapshots and clones.


This I did not know, but now that you point it out, this would be the 
right way to design it. So the advantage of requiring less ZFS 
integration is no longer the case.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman



Eric Schrock wrote:

Note that:

6501037 want user/group quotas on ZFS 


Is already committed to be fixed in build 113 (i.e. in the next month).

- Eric


Wow, that would be fantastic. We have the Sun vendors camped out at the 
data center trying to apply fresh patches. I believe 6798540 fixed the 
largest issue but it would be desirable to be able to use just ZFS.


Is this a project needing donations? I see your address is at Sun.com, 
and we already have 9 x4500s, but maybe you need some pocky, asse, 
collon or pocari sweat...



Lundy


[1]
BugID:6798540
 3-way deadlock happens in ufs filesystem on zvol when writng ufs log

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman




As it turns out, I'm working on zfs user quotas presently, and expect to 
integrate in about a month.  My implementation is in-kernel, integrated 
with the rest of ZFS, and does not have the drawbacks you mention below.


I merely suggested my design as it may have been something I _could_ 
have implemented, as it required little ZFS knowledge. (Adding hooks is 
usually easier). But naturally that has already been shown not to be 
the case.


A proper implementation is always going to be much more desirable :)





Good, that's the behavior that user quotas will have -- delayed 
enforcement.


There probably are situations where precision is required, or perhaps 
historical reasons, but for us a delayed enforcement may even be better.


Perhaps it would be better for the delivery of an email message that 
goes over the quota, to be allowed to complete writing the entire 
message. Than it is to abort a write() call somewhere in the middle, and 
return failures all the way back to generating a bounce message. Maybe.. 
can't say I have thought about it.




My implementation does not have this drawback.  Note that you would need 
to use the recovery mechanism in the case of a system crash / power loss 
as well.  Adding potentially hours to the crash recovery time is not 
acceptable.


Great! Will there be any particular limits on how many uids, or size of 
uids in your implementation? UFS generally does not, but I did note that 
if uid go over 1000 it flips out and changes the quotas file to 
128GB in size.



Not to mention that this information needs to get stored somewhere, and 
dealt with when you zfs send the fs to another system.


That is a good point, I had not even planned to support quotas for ZFS 
send, but consider a rescan to be the answer.  We don't ZFS send very 
often as it is far too slow.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Matthew Ahrens

Jorgen Lundman wrote:
Great! Will there be any particular limits on how many uids, or size of 
uids in your implementation? UFS generally does not, but I did note that 
if uid go over 1000 it flips out and changes the quotas file to 
128GB in size.


All UIDs, as well as SIDs (from the SMB server), are permitted.  Any number 
of users and quotas are permitted, and handled efficiently.  Note, UID on 
Solaris is a 31-bit number.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Miles Nordin
 wm == Will Murnane will.murn...@gmail.com writes:

     * SR = Software RAID IT = Integrate. Target mode. IR mode
 is not supported.
wm Integrated target mode lets you export some storage attached
wm to the host system (through another adapter, presumably) as a
wm storage device.  IR mode is almost certainly Internal RAID,
wm which that card doesn't have support for.

no, the supermicro page for AOC-USAS-L8i does claim support for all
three, and supermicro has an ``IR driver'' available for download for
Linux and Windows, or at least a link to one.

I'm trying to figure out what's involved in determining and switching
modes, why you'd want to switch them, what cards support which modes,
which solaris drivers support which modes, u.s.w.

The answer may be very simple, like ``the driver supports only IR.
Most cards support IR, and cards that don't support IR won't work.  IR
can run in single-LUN mode.  Some IR cards support RAID5, others
support only RAID 0, 1, 10.''  Or it could be ``the driver supports
only SR.  The driver is what determines the mode, and it does this by
loading firmware into the card, and the first step in initializing the
card is always for the driver to load in a firmware blob.  All
currently-produced cards support SR.''  so...actually, now that I say
it, I guess the answer cannot be very simple.  It's going to have to
be a little complicated.

Anyway, I can guess, too.  I was hoping someone would know for sure
off-hand.


pgpv7SB8wKna7.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread James C. McPherson
On Thu, 12 Mar 2009 22:24:12 -0400
Miles Nordin car...@ivy.net wrote:

  wm == Will Murnane will.murn...@gmail.com writes:
 
      * SR = Software RAID IT = Integrate. Target mode. IR mode
  is not supported.
 wm Integrated target mode lets you export some storage attached
 wm to the host system (through another adapter, presumably) as a
 wm storage device.  IR mode is almost certainly Internal RAID,
 wm which that card doesn't have support for.
 
 no, the supermicro page for AOC-USAS-L8i does claim support for all
 three, and supermicro has an ``IR driver'' available for download for
 Linux and Windows, or at least a link to one.
 
 I'm trying to figure out what's involved in determining and switching
 modes, why you'd want to switch them, what cards support which modes,
 which solaris drivers support which modes, u.s.w.
 
 The answer may be very simple, like ``the driver supports only IR.
 Most cards support IR, and cards that don't support IR won't work.  IR
 can run in single-LUN mode.  Some IR cards support RAID5, others
 support only RAID 0, 1, 10.''  Or it could be ``the driver supports
 only SR.  The driver is what determines the mode, and it does this by
 loading firmware into the card, and the first step in initializing the
 card is always for the driver to load in a firmware blob.  All
 currently-produced cards support SR.''  so...actually, now that I say
 it, I guess the answer cannot be very simple.  It's going to have to
 be a little complicated.
 Anyway, I can guess, too.  I was hoping someone would know for sure
 off-hand.


Hi Miles,
the mpt(7D) driver supports that card. mpt(7D) supports both
IT and IR firmware variants. You can find out the specifics
for what RAID volume levels are supported by reading the 
raidctl(1M) manpage. I don't think you can switch between IT
and IR firmware, but not having needed to know this before,
I haven't tried it.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss