RE: [zfs-discuss] Performance of "zpool import"?

2007-02-26 Thread Paul Fisher
> From: Eric Schrock [mailto:[EMAIL PROTECTED] 
> Sent: Monday, February 26, 2007 12:05 PM
> 
> The slow part of zpool import is actually discovering the 
> pool configuration.  This involves examining every device on 
> the system (or every device within a 'import -d' directory) 
> and seeing if it has any labels.  Internally, the import 
> action itself shoudl be quite fast...

Thanks for the answer.  Let me ask a follow-up question related to zpool import 
and the sun cluster+zfs integration -- is the slow part done "early" on the 
backup node so that at the time of the failover the actual import is "fast" as 
you describe above?

--

paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of "zpool import"?

2007-02-26 Thread Nicolas Williams
On Mon, Feb 26, 2007 at 10:32:22AM -0800, Eric Schrock wrote:
> On Mon, Feb 26, 2007 at 12:27:48PM -0600, Nicolas Williams wrote:
> > 
> > What is slow, BTW?  The open(2)s of the devices?  Or the label reading?
> > And is there a way to do async open(2)s w/o a thread per-open?  The
> > open(2) man page isn't very detailed about O_NONBLOCK/O_NDELAY behaviour
> > on devices ("[s]ubsequent behaviour of the device is device-specific")...
> 
> Simply opening and reading 512K from the beginning and end of every
> device under /dev/dsk.  Async I/O would probably help.  If you use
> libaio, then it will do the right thing depending on the device (spawn a
> thread to do synchronous I/O, or use the driver entry points if
> provided).

So, are you saying that O_NONBLOCK/O_NDELAY opens of devices in /dev/dsk
is supported (that's what I was asking, albeit obliquely)?  And if so,
can the application issue an aioread(3AIO) immediately, and if not, how
can the application poll for the open to complete (not with poll(2)!)?

My guess (and if I've time I'll test it) is that there's no way to do
async opens of disk devices, that one would have to create multiple
threads, one-per-device up to some maximum,  for tasting devices in
parallel.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of "zpool import"?

2007-02-26 Thread Eric Schrock
On Mon, Feb 26, 2007 at 12:27:48PM -0600, Nicolas Williams wrote:
> 
> What is slow, BTW?  The open(2)s of the devices?  Or the label reading?
> And is there a way to do async open(2)s w/o a thread per-open?  The
> open(2) man page isn't very detailed about O_NONBLOCK/O_NDELAY behaviour
> on devices ("[s]ubsequent behaviour of the device is device-specific")...

Simply opening and reading 512K from the beginning and end of every
device under /dev/dsk.  Async I/O would probably help.  If you use
libaio, then it will do the right thing depending on the device (spawn a
thread to do synchronous I/O, or use the driver entry points if
provided).

> Also, I see this happens in user-land.  Is there any benefit of trying
> this in kernel-land?

No.  It's simpler and less brittle to keep it userland.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of "zpool import"?

2007-02-26 Thread Nicolas Williams
On Mon, Feb 26, 2007 at 10:10:15AM -0800, Eric Schrock wrote:
> On Mon, Feb 26, 2007 at 12:06:14PM -0600, Nicolas Williams wrote:
> > Couldn't all that tasting be done in parallel?
> 
> Yep, that's certainly possible.  Sounds like a perfect feature for
> someone in the community to work on :-)  Simply take
> zpool_find_import(), add some worker thread/pool model, and there you
> go.

What is slow, BTW?  The open(2)s of the devices?  Or the label reading?
And is there a way to do async open(2)s w/o a thread per-open?  The
open(2) man page isn't very detailed about O_NONBLOCK/O_NDELAY behaviour
on devices ("[s]ubsequent behaviour of the device is device-specific")...

Also, I see this happens in user-land.  Is there any benefit of trying
this in kernel-land?

OT: I've been trying to get ZFS boot on a USB flash device going, and
currently that's failing to find the pool named in /etc/system -- next
time I try it I will check if this is something to do with USB modules
not loading or not in the minitroot (I don't have that USB stick with me
atm) or if zpool.cache refers to the wrong device or doesn't match
/devices.  If ZFS boot could live without a zpool.cache to find the
volum with the boot root FS that would rock.  Incidentally, it'd be nice
if I could more easily observe what is going wrong here with kmdb --
there must be a way to get more info from the ZFS module that I'm just
missing.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of "zpool import"?

2007-02-26 Thread Eric Schrock
On Mon, Feb 26, 2007 at 12:06:14PM -0600, Nicolas Williams wrote:
> On Mon, Feb 26, 2007 at 10:05:08AM -0800, Eric Schrock wrote:
> > The slow part of zpool import is actually discovering the pool
> > configuration.  This involves examining every device on the system (or
> > every device within a 'import -d' directory) and seeing if it has any
> > labels.  Internally, the import action itself shoudl be quite fast, and
> > is essentially the same speed as opening a pool normally.  So the
> > scalability is really depending on the number of devices in the system,
> > not the number of devices within a pool.
> 
> Couldn't all that tasting be done in parallel?

Yep, that's certainly possible.  Sounds like a perfect feature for
someone in the community to work on :-)  Simply take
zpool_find_import(), add some worker thread/pool model, and there you
go.

- Ericd

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of "zpool import"?

2007-02-26 Thread Nicolas Williams
On Mon, Feb 26, 2007 at 10:05:08AM -0800, Eric Schrock wrote:
> The slow part of zpool import is actually discovering the pool
> configuration.  This involves examining every device on the system (or
> every device within a 'import -d' directory) and seeing if it has any
> labels.  Internally, the import action itself shoudl be quite fast, and
> is essentially the same speed as opening a pool normally.  So the
> scalability is really depending on the number of devices in the system,
> not the number of devices within a pool.

Couldn't all that tasting be done in parallel?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of "zpool import"?

2007-02-26 Thread Eric Schrock
The slow part of zpool import is actually discovering the pool
configuration.  This involves examining every device on the system (or
every device within a 'import -d' directory) and seeing if it has any
labels.  Internally, the import action itself shoudl be quite fast, and
is essentially the same speed as opening a pool normally.  So the
scalability is really depending on the number of devices in the system,
not the number of devices within a pool.

- Eric

On Mon, Feb 26, 2007 at 08:14:14AM -0600, Paul Fisher wrote:
> Has anyone done benchmarking on the scalability and performance of zpool 
> import in terms of the number of devices in the pool on recent opensolaris 
> builds?
> 
> In other words, what would the relative performance be for "zpool import" for 
> the following three pool configurations on multi-pathed 4G FC connected JBODs:
> 1) 1, 12 disk raidz2 in pool
> 2) 10, 12 disk raidz2 in pool
> 3) 100, 12 disk raidz2 in pool
> 
> Any feedback on your experiences would be greatly appreciated.
> 
> 
> --
> 
> paul
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss