[zfs-discuss] Thumper wedged somewhere in ZFS

2008-07-09 Thread Ceri Davies
Forwarding here, as suggested by chaps on storage-discuss.

Just to clarify, I was running filebench directly on the x4500, not from
an initiator, so this is probably not a COMSTAR thing.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere
--- Begin Message ---
We've got an x4500 running SXCE build 91 with stmf configured to share
out a (currently) small number (9) of LUs to a (currently) small number
of hosts (4).

The x4500 is configured with ZFS root mirror, 6 RAIDZ sets across all
six controllers, some hot spares in the gaps and a RAID10 set to use
everything else up.

Since this is an investigative setup, I have been running filebench
locally on the x4500 to get some stats before moving on to do the same
on the initiators against the x4500 and our current storage.

While running the filebench OLTP workload with $filesize=5g on one of
the RAIDZ pools, the x4500 seemed to hang while creating the fileset.
On further investigation, a lot of things actually still worked; log in
via SSH was fine, /usr/bin/ps worked ok, /usr/ucb/ps and any of the
/usr/proc ptools just hung, man hung, and so on.  "savecore -L" managed
to do a dump but couldn't seem to exit.

So I did a hard reset, the system came up fine and I actually do have
the dump from "savecore -L".  I'm kind of out of my depth with mdb, but
it looks pretty clear to me that all of the "hung" processes were
somewhere in ZFS:

# mdb -k unix.0 vmcore.0 
mdb: failed to read panicbuf and panic_reg -- current register set will
be unavailable
Loading modules: [ unix genunix specfs dtrace cpu.generic
cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs sd ip hook neti sctp
arp usba fctl nca lofs md cpc random crypto nfs fcip logindmux ptm nsctl
ufs sppp ipc ]
> ::memstat
Page SummaryPagesMB  %Tot
     
Kernel3085149 12051   74%
Anon20123780%
Exec and libs3565130%
Page cache 200779   7845%
Free (cachelist)   193955   7575%
Free (freelist)663990  2593   16%

Total 4167561 16279
Physical  4167560 16279
> ::pgrep ptree
SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
R   1825   1820   1825   1803  0 0x4a004000 ff04f5096c80 ptree
R   1798   1607   1798   1607  15000 0x4a004900 ff04f7b72930 ptree
R   1795   1302   1795   1294  0 0x4a004900 ff05179f7de0 ptree
> ::pgrep ptree | ::walk thread | ::findstack
stack pointer for thread ff04ea2ca440: ff00201777d0
[ ff00201777d0 _resume_from_idle+0xf1() ]
  ff0020177810 swtch+0x17f()
  ff00201778b0 turnstile_block+0x752()
  ff0020177920 rw_enter_sleep+0x1b0()
  ff00201779f0 zfs_getpage+0x10e()
  ff0020177aa0 fop_getpage+0x9f()
  ff0020177c60 segvn_fault+0x9ef()
  ff0020177d70 as_fault+0x5ae()
  ff0020177df0 pagefault+0x95()
  ff0020177f00 trap+0xbd3()
  ff0020177f10 0xfb8001d9()
stack pointer for thread ff04e8752400: ff001f9307d0
[ ff001f9307d0 _resume_from_idle+0xf1() ]
  ff001f930810 swtch+0x17f()
  ff001f9308b0 turnstile_block+0x752()
  ff001f930920 rw_enter_sleep+0x1b0()
  ff001f9309f0 zfs_getpage+0x10e()
  ff001f930aa0 fop_getpage+0x9f()
  ff001f930c60 segvn_fault+0x9ef()
  ff001f930d70 as_fault+0x5ae()
  ff001f930df0 pagefault+0x95()
  ff001f930f00 trap+0xbd3()
  ff001f930f10 0xfb8001d9()
stack pointer for thread ff066fbc6a80: ff001f27de90
[ ff001f27de90 _resume_from_idle+0xf1() ]
  ff001f27ded0 swtch+0x17f()
  ff001f27df00 cv_wait+0x61()
  ff001f27e040 vmem_xalloc+0x602()
  ff001f27e0b0 vmem_alloc+0x159()
  ff001f27e140 segkmem_xalloc+0x8c()
  ff001f27e1a0 segkmem_alloc_vn+0xcd()
  ff001f27e1d0 segkmem_zio_alloc+0x20()
  ff001f27e310 vmem_xalloc+0x4fc()
  ff001f27e380 vmem_alloc+0x159()
  ff001f27e410 kmem_slab_create+0x7d()
  ff001f27e450 kmem_slab_alloc+0x57()
  ff001f27e4b0 kmem_cache_alloc+0x136()
  ff001f27e4d0 zio_data_buf_alloc+0x28()
  ff001f27e510 arc_get_data_buf+0x175()
  ff001f27e560 arc_buf_alloc+0x9a()
  ff001f27e610 arc_read+0x122()
  ff001f27e6b0 dbuf_read_impl+0x129()
  ff001f27e710 dbuf_read+0xc5()
  ff001f27e7c0 dmu_buf_hold_array_by_dnode+0x1c4()
  ff001f27e860 dmu_read+0xd4()
  ff001f27e910 zfs_fillpage+0x15e()
  ff001f27e9f0 zfs_getpage+0x187()
  ff001f27eaa0 fop_getpage+0x9f()
  ff001f27ec60 segvn_fault+0x9ef()
  ff001f27ed70 as_fault+0x5ae()
  ff001f27edf0 pagefault+0x95()
  ff001f27ef00 trap+0xbd3()
  ff001f27ef10 0xfb8001d9()
> ::pgrep go_filebench | ::walk thread | ::findstack
stack pointer for thread 

Re: [zfs-discuss] ZFS committed to the FreeBSD base.

2007-04-06 Thread Ceri Davies
On Thu, Apr 05, 2007 at 09:58:47PM -0700, Rich Teer wrote:
> > I'm happy to inform that the ZFS file system is now part of the FreeBSD
> > operating system. ZFS is available in the HEAD branch and will be
> > available in FreeBSD 7.0-RELEASE as an experimental feature.
> 
> This is fantastic news!  At the risk of raking over ye olde arguments,
> as the old saying goes: "Dual licensing?  We don't need no stinkeen
> dual licensing!".  :-)

Actually, you might want to run that statement by a certain John Birrell
([EMAIL PROTECTED]) regarding the DTrace port and see what answer you get.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpkDt5mNPpkg.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool dumps core with did device

2007-01-23 Thread Ceri Davies
Hi Robert,

On Tue, Jan 23, 2007 at 02:42:33PM +0100, Robert Milkowski wrote:
> Tuesday, January 23, 2007, 1:48:50 PM, you wrote:
> CD> On Tue, Jan 23, 2007 at 12:07:34PM +0100, Robert Milkowski wrote:
>
> >> Of course the question is why use ZFS over DID?
> 
> CD> Actually the question is probably: why shouldn't I?  I can fall back
> CD> to the real device name, but d8 is a lot easier to remember than
> CD> c1t010CF1F459EE2A0045AF6B6Ed0 and has the advantage of being
> CD> guaranteed to be the same across all nodes.
> 
> CD> What's the disadvantage?
> 
> Another layer? Less performance? Ok, I'm only guessing.

OK, as long as I didn't miss anything, thanks!

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgp8HM3Hkjt8g.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool dumps core with did device

2007-01-23 Thread Ceri Davies
On Tue, Jan 23, 2007 at 12:07:34PM +0100, Robert Milkowski wrote:
> Hello Zoram,
> 
> Tuesday, January 23, 2007, 11:27:48 AM, you wrote:
> 
> ZT> Hi Ceri,
> 
> ZT> I just saw your mail today. I'm replying In case you haven't found a 
> ZT> solution.
> 
> ZT> This is
> 
> ZT> 6475304 zfs core dumps when trying to create new spool using "did" device
> 
> ZT> The workaround suggests:
> 
> ZT> Set environmental variable
> 
> ZT> NOINUSE_CHECK=1
> 
> ZT> And the problem does not exists.
> 
> Of course the question is why use ZFS over DID?

Actually the question is probably: why shouldn't I?  I can fall back
to the real device name, but d8 is a lot easier to remember than
c1t010CF1F459EE2A0045AF6B6Ed0 and has the advantage of being
guaranteed to be the same across all nodes.

What's the disadvantage?

> However it should not have core dumped.

Yep, that's why I posted :)

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpeb1YoZbZcA.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool dumps core with did device

2007-01-23 Thread Ceri Davies
On Tue, Jan 23, 2007 at 03:57:48PM +0530, Zoram Thanga wrote:
> Hi Ceri,
> 
> I just saw your mail today. I'm replying In case you haven't found a 
> solution.
> 
> This is
> 
> 6475304 zfs core dumps when trying to create new spool using "did" device
> 
> The workaround suggests:
> 
> Set environmental variable
> 
> NOINUSE_CHECK=1
> 
> And the problem does not exists.

Hi Zoram, that's great, thanks.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgprZLnIzDWGx.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How much do we really want zpool remove?

2007-01-19 Thread Ceri Davies
On Thu, Jan 18, 2007 at 06:55:39PM +0800, Jeremy Teo wrote:
> On the issue of the ability to remove a device from a zpool, how
> useful/pressing is this feature? Or is this more along the line of
> "nice to have"?

We definitely need it.  As a usage case, on occasion we have had to move
SAN sites, and the easiest way to that by far is to snap on the new site
and remove the old one once it's synced.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpFK45y3OxA6.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool dumps core with did device

2007-01-19 Thread Ceri Davies
On an up to date Solaris 10 11/06 with Sun Cluster 3.2 and iSCSI backed
did devices, zpool dumps core on creation if I try to use a did device.

Using the underlying device works, and this might not be supported
(though I don't know), but I thought you would probably prefer to see
the error than not (this is just a test set up and therefore we don't
have support for it).

  bash-3.00# scdidadm -l
  1peon:/dev/rdsk/c0t1d0  /dev/did/rdsk/d1 
  2peon:/dev/rdsk/c0t0d0  /dev/did/rdsk/d2 
  3peon:/dev/rdsk/c0t2d0  /dev/did/rdsk/d3 
  6peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B69d0 
/dev/did/rdsk/d6 
  7peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B6Ed0 
/dev/did/rdsk/d7 
  8peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B88d0 
/dev/did/rdsk/d8 
  9peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B85d0 
/dev/did/rdsk/d9 
  10   peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B83d0 
/dev/did/rdsk/d10
  11   peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B86d0 
/dev/did/rdsk/d11
  12   peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B87d0 
/dev/did/rdsk/d12
  13   peon:/dev/rdsk/c1t010CF1F459EE2A0045AF6B84d0 
/dev/did/rdsk/d13
  bash-3.00# zpool create wibble /dev/did/dsk/d12
  free(fe726420): invalid or corrupted buffer
  stack trace:
  libumem.so.1'?? (0xff24b460)
  libCrun.so.1'__1c2k6Fpv_v_+0x4
  
libCstd_isa.so.1'__1cDstdMbasic_string4Ccn0ALchar_traits4Cc__n0AJallocator4Cc___2G6Mrk1_r1_+0xb8
  
libCstd.so.1'__1cH__rwstdNlocale_vector4nDstdMbasic_string4Ccn0BLchar_traits4Cc__n0BJallocator4Cc_Gresize6MIn0E__p3_+0xc4
  libCstd.so.1'__1cH__rwstdKlocale_imp2t5B6MII_v_+0xc4
  libCstd.so.1'__1cDstdGlocaleEinit6F_v_+0x44
  
libCstd.so.1'__1cDstdNbasic_istream4Cwn0ALchar_traits4Cw___2t6Mn0AIios_baseJEmptyCtor__v_+0x84
  libCstd.so.1'?? (0xfe57b2b8)
  libCstd.so.1'?? (0xfe57b994)
  libCstd.so.1'_init+0x1e0
  ld.so.1'?? (0xff3bfea8)
  ld.so.1'?? (0xff3cca04)
  ld.so.1'_elf_rtbndr+0x10
  libCrun.so.1'?? (0xfe46a93c)
  libCrun.so.1'__1cH__CimplKcplus_init6F_v_+0x48
  libCstd_isa.so.1'_init+0xc8
  ld.so.1'?? (0xff3bfea8)
  ld.so.1'?? (0xff3c5318)
  ld.so.1'?? (0xff3c5474)
  ld.so.1'dlopen+0x64
  libmeta.so.1'sdssc_bind_library+0x88
  libdiskmgt.so.1'?? (0xff2b092c)
  libdiskmgt.so.1'?? (0xff2aa6b4)
  libdiskmgt.so.1'?? (0xff2aa42c)
  libdiskmgt.so.1'dm_get_stats+0x12c
  libdiskmgt.so.1'dm_get_slice_stats+0x44
  libdiskmgt.so.1'dm_inuse+0x74
  zpool'check_slice+0x20
  zpool'check_disk+0x144
  zpool'check_device+0x4c
  zpool'check_in_use+0x108
  zpool'check_in_use+0x174
  zpool'make_root_vdev+0x3c
  zpool'?? (0x1321c)
  zpool'main+0x130
  zpool'_start+0x108
  Abort (core dumped)

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpHGpklv4ls2.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: 'legacy' vs 'none'

2006-11-29 Thread Ceri Davies
On Wed, Nov 29, 2006 at 10:25:18AM +, Ceri Davies wrote:
> On Tue, Nov 28, 2006 at 04:48:19PM +, Dick Davies wrote:
> > Just spotted one - is this intentional?
> > 
> > You can't delegate a dataset to a zone if mountpoint=legacy.
> > Changing it to 'none' works fine.
> > 
> > 
> >   vera / # zfs create tank/delegated
> >   vera / # zfs get mountpoint tank/delegated
> >   NAMEPROPERTYVALUE   SOURCE
> >   tank/delegated  mountpoint  legacy  inherited from tank
> >   vera / # zfs create tank/delegated/ganesh
> >   vera / # zfs get mountpoint tank/delegated/ganesh
> >   NAME   PROPERTYVALUE  SOURCE
> >   tank/delegated/ganesh  mountpoint  legacy inherited from 
> >   tank
> >   vera / # zonecfg -z ganesh
> >   zonecfg:ganesh> add dataset
> >   zonecfg:ganesh:dataset> set name=tank/delegated/ganesh
> >   zonecfg:ganesh:dataset> end
> >   zonecfg:ganesh> commit
> >   zonecfg:ganesh> exit
> >   vera / # zoneadm -z ganesh boot
> >   could not verify zfs dataset tank/delegated/ganesh: mountpoint cannot be 
> > inherited
> >   zoneadm: zone ganesh failed to verify
> >   vera / # zfs set mountpoint=none tank/delegated/ganesh
> >   vera / # zoneadm -z ganesh boot
> >   vera / #
> 
> Does it actually boot then?  Eric is saying that the filesystem cannot
> be mounted in the 'none' case, so presumably it doesn't.

Not to worry, I see what you're doing now.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpBKHDiFmMYt.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: 'legacy' vs 'none'

2006-11-29 Thread Ceri Davies
On Tue, Nov 28, 2006 at 04:48:19PM +, Dick Davies wrote:
> Just spotted one - is this intentional?
> 
> You can't delegate a dataset to a zone if mountpoint=legacy.
> Changing it to 'none' works fine.
> 
> 
>   vera / # zfs create tank/delegated
>   vera / # zfs get mountpoint tank/delegated
>   NAMEPROPERTYVALUE   SOURCE
>   tank/delegated  mountpoint  legacy  inherited from tank
>   vera / # zfs create tank/delegated/ganesh
>   vera / # zfs get mountpoint tank/delegated/ganesh
>   NAME   PROPERTYVALUE  SOURCE
>   tank/delegated/ganesh  mountpoint  legacy inherited from 
>   tank
>   vera / # zonecfg -z ganesh
>   zonecfg:ganesh> add dataset
>   zonecfg:ganesh:dataset> set name=tank/delegated/ganesh
>   zonecfg:ganesh:dataset> end
>   zonecfg:ganesh> commit
>   zonecfg:ganesh> exit
>   vera / # zoneadm -z ganesh boot
>   could not verify zfs dataset tank/delegated/ganesh: mountpoint cannot be 
> inherited
>   zoneadm: zone ganesh failed to verify
>   vera / # zfs set mountpoint=none tank/delegated/ganesh
>   vera / # zoneadm -z ganesh boot
>   vera / #

Does it actually boot then?  Eric is saying that the filesystem cannot
be mounted in the 'none' case, so presumably it doesn't.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgp0ux4tVwC6W.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'legacy' vs 'none'

2006-11-29 Thread Ceri Davies
On Tue, Nov 28, 2006 at 11:13:02AM -0800, Eric Schrock wrote:
> On Tue, Nov 28, 2006 at 06:06:24PM +0000, Ceri Davies wrote:
> > 
> > But you could presumably get that exact effect by not listing a
> > filesystem in /etc/vfstab.
> > 
> 
> Yes, but someone could still manually mount the filesystem using 'mount
> -F zfs ...'.  If you set the mountpoint to 'none', then it cannot be
> mounted, period.

Aha, that's the key then, thanks.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpTk6riwrX7S.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'legacy' vs 'none'

2006-11-28 Thread Ceri Davies
On Tue, Nov 28, 2006 at 06:08:23PM +0100, Terence Patrick Donoghue wrote:
> Dick Davies wrote On 11/28/06 17:15,:
> 
> >Is there a difference between setting mountpoint=legacy and 
> >mountpoint=none?

> Is there a difference - Yep,
> 
> 'legacy' tells ZFS to refer to the /etc/vfstab file for FS mounts and 
> options
> whereas
> 'none' tells ZFS not to mount the ZFS filesystem at all. Then you would 
> need to manually mount the ZFS using 'zfs set mountpoint=/mountpoint 
> poolname/fsname' to get it mounted.
> 
> In a nutshell, setting 'none' means that 'zfs mount -a' wont mount the 
> FS cause there is no mount point specified anywhere

But you could presumably get that exact effect by not listing a
filesystem in /etc/vfstab.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpt0pV4zLQzt.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on patching + zfs root

2006-11-16 Thread Ceri Davies
On Wed, Nov 15, 2006 at 04:45:02PM -0700, Lori Alt wrote:
> Ceri Davies wrote:
> >On Tue, Nov 14, 2006 at 07:32:08PM +0100, [EMAIL PROTECTED] wrote:
> >  
> >>>Actually, we have considered this.  On both SPARC and x86, there will be
> >>>a way to specify the root file system (i.e., the bootable dataset) to be 
> >>>booted,
> >>>at either the GRUB prompt (for x86) or the OBP prompt (for SPARC).
> >>>If no root file system is specified, the current default 'bootfs' 
> >>>specified
> >>>in the root pool's metadata will be booted.  But it will be possible to
> >>>override the default, which will provide that "fallback" boot capability.
> >>>  
> >>I was thinking of some automated mechanism such as:
> >>
> >>- BIOS which, when reset during POST, will switch to safe
> >>  defaults and enter setup
> >>- Windows which, when reset during boot, will offer safe mode
> >>  at the next boot.
> >>
> >>I was thinking of something that on activation of a new boot environment
> >>would automatically fallback on catastrophic failure.
> >>
> >
> >I don't wish to sound ungrateful or unconstructive but there's no other
> >way to say this: I liked ZFS better when it was a filesystem + volume
> >manager rather than the one-tool-fits-all monster that it seems to be
> >heading in.
> >
> >I'm very concerned about bolting some flavour of boot loader on to the
> >side, particularly one that's automatic.  I'm not doubting that the
> >concept is way cool, but I want predictable behaviour every time; not
> >way cool.
> >  
> 
> All of these ideas about automated recovery are just ideas.   I don't think
> we've reached monsterdom just yet.  For right now, the planned behavior
> is more predictable:  there is one dataset specified as the 'default 
> bootable
> dataset' for the pool.  You will have to take explicit action (something
> like luactivate) to change that default.  You will always have a failsafe
> archive to boot if something goes terribly wrong and you need to
> fix your menu.lst or set a different default bootable dataset.  You will
> also be able to have multiple entries in the menu.list file, corresponding
> to multiple BEs, but that will be optional. 
> 
> But I'm open to these ideas of automatic recovery.  It's an interesting
> thing to consider.  Ultimately, it might need to be something that is
> optional, so that we could also get behavior that is more predictable.

OK, thanks for the clarification.  "Optional" sounds good to me,
whatever the default may be.

And thanks again for working on the monster :)

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpVyMEDYPq4k.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on patching + zfs root

2006-11-15 Thread Ceri Davies
On Tue, Nov 14, 2006 at 07:32:08PM +0100, [EMAIL PROTECTED] wrote:
> 
> >Actually, we have considered this.  On both SPARC and x86, there will be
> >a way to specify the root file system (i.e., the bootable dataset) to be 
> >booted,
> >at either the GRUB prompt (for x86) or the OBP prompt (for SPARC).
> >If no root file system is specified, the current default 'bootfs' specified
> >in the root pool's metadata will be booted.  But it will be possible to
> >override the default, which will provide that "fallback" boot capability.
> 
> 
> I was thinking of some automated mechanism such as:
> 
>   - BIOS which, when reset during POST, will switch to safe
> defaults and enter setup
>   - Windows which, when reset during boot, will offer safe mode
> at the next boot.
> 
> I was thinking of something that on activation of a new boot environment
> would automatically fallback on catastrophic failure.

I don't wish to sound ungrateful or unconstructive but there's no other
way to say this: I liked ZFS better when it was a filesystem + volume
manager rather than the one-tool-fits-all monster that it seems to be
heading in.

I'm very concerned about bolting some flavour of boot loader on to the
side, particularly one that's automatic.  I'm not doubting that the
concept is way cool, but I want predictable behaviour every time; not
way cool.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgp37F69Un8zL.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on patching + zfs root

2006-11-15 Thread Ceri Davies
On Wed, Nov 15, 2006 at 04:23:18PM -0600, Nicolas Williams wrote:
> On Wed, Nov 15, 2006 at 09:58:35PM +0000, Ceri Davies wrote:
> > On Wed, Nov 15, 2006 at 12:10:30PM +0100, [EMAIL PROTECTED] wrote:
> > > 
> > > >I think we first need to define what state "up" actually is.  Is it the 
> > > >kernel booted ?  Is it the root file system mounted ?  Is it we reached 
> > > >milestone all ?  Is it we reached milestone all with no services in 
> > > >maintenance ?  Is it no services in maintenance that weren't on the last 
> > > >  boot ?
> > > 
> > > I think that's fairly simple: "up is the state when the milestone we
> > > are booting to has been actually reached".
> > > 
> > > What should SMF do when it finds that it cannot reach that milestone?
> > 
> > Another question might be: how do I fix it when it's broken?
> 
> That's for monitoring systems.  The issue here is how to best select a
> BE at boot time.  IMO the last booted BE to have reached its default
> milestone should be that BE.

What I'm trying to say (and this is the only part that you didn't quote
:)) is that there is no way I want the BE programatically selected.

> > > Harder is:
> > > 
> > > What is the system does not come up quickly enough?
> 
> The user may note this and reboot the system.  BEs that once booted but
> now don't will still be selected at the GRUB menu as the last
> known-to-boot BEs, so we may want the ZFS boot code to reset the
> property of the BE's used for making this selection.

Not my text, but wtf?  Booting the wrong BE because my NIS server is
down (or whatever) isn't really acceptable (or likely to resolve
anything).  I think that's what "not quickly enough" was getting at.

> If you're netbooting then you're not doing a ZFS boot, so the point is
> moot (this thread is about how to best select a BE to boot at boot
> time).

I believe I could have /usr or /var on NFS still.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpx5pzictHCT.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on patching + zfs root

2006-11-15 Thread Ceri Davies
On Wed, Nov 15, 2006 at 12:10:30PM +0100, [EMAIL PROTECTED] wrote:
> 
> >I think we first need to define what state "up" actually is.  Is it the 
> >kernel booted ?  Is it the root file system mounted ?  Is it we reached 
> >milestone all ?  Is it we reached milestone all with no services in 
> >maintenance ?  Is it no services in maintenance that weren't on the last 
> >  boot ?
> 
> I think that's fairly simple: "up is the state when the milestone we
> are booting to has been actually reached".
> 
> What should SMF do when it finds that it cannot reach that milestone?

Another question might be: how do I fix it when it's broken?

> Harder is:
> 
> What is the system does not come up quickly enough?
> What if the system hangs before SMF is even starts?
> What if the system panics during boot or shortly after we
> reach our desired milestone?
> 
> And then, of course, "define shortly and quickly".

Such definitions to consider net and SAN booting.

Personally I think if a system is hosed then the best way to fail-safe
is to either panic or drop to single-user rather than trying to be
clever and booting some other kernel.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpT7hoII5pKc.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/iSCSI target integration

2006-11-02 Thread Ceri Davies
On Wed, Nov 01, 2006 at 04:00:43PM -0500, Torrey McMahon wrote:
> Spencer Shepler wrote:
> >On Wed, Adam Leventhal wrote:
> >  
> >>On Wed, Nov 01, 2006 at 01:17:02PM -0500, Torrey McMahon wrote:
> >>
> >>>Is there going to be a method to override that on the import? I can see 
> >>>a situation where you want to import the pool for some kind of 
> >>>maintenance procedure but you don't want the iSCSI target to fire up 
> >>>automagically.
> >>>  
> >>There isn't -- to my knowledge -- a way to do this today for NFS shares.
> >>This would be a reasonable RFE that would apply to both NFS and iSCSI.
> >>
> >
> >In the case of NFS, this can be dangerous if the "rest" of the NFS
> >server is allowed to come up and serve other filesystems.  The non-shared
> >filesystem will end up returning ESTALE errors to clients that are
> >active on that filesystem.  It should be an all or nothing selection...
> >  
> 
> Lets say server A has the pool with NFS shared, or iSCSI shared, 
> volumes. Server A exports the pool or goes down. Server B imports the pool.
> 
> Which clients would still be active on the filesystem(s)? The ones that 
> were mounting it when it was on Server A?

For NFS, it's possible (but likely suboptimal) for clients to be
configured to mount the filesystem from server A and fail over to
server B, assuming that the pool import can happen quickly enough for
them not to receive ENOENT.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpecPcwD3dgC.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/iSCSI target integration

2006-11-01 Thread Ceri Davies
On Wed, Nov 01, 2006 at 10:57:27AM -0800, Adam Leventhal wrote:
> On Wed, Nov 01, 2006 at 01:17:02PM -0500, Torrey McMahon wrote:
> > Is there going to be a method to override that on the import? I can see 
> > a situation where you want to import the pool for some kind of 
> > maintenance procedure but you don't want the iSCSI target to fire up 
> > automagically.
> 
> There isn't -- to my knowledge -- a way to do this today for NFS shares.
> This would be a reasonable RFE that would apply to both NFS and iSCSI.

svcadm disable nfs/server ?

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpFbSQwecRVp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/iSCSI target integration

2006-11-01 Thread Ceri Davies
On Wed, Nov 01, 2006 at 02:14:24AM -0800, Adam Leventhal wrote:
> On Wed, Nov 01, 2006 at 10:05:01AM +0000, Ceri Davies wrote:
> > On Wed, Nov 01, 2006 at 01:33:33AM -0800, Adam Leventhal wrote:
> > > Rick McNeal and I have been working on building support for sharing ZVOLs
> > > as iSCSI targets directly into ZFS. Below is the proposal I'll be
> > > submitting to PSARC. Comments and suggestions are welcome.
> > 
> > It looks great and I'd love to see it implemented.
> 
> It's implemented! This is the end of the process, not the beginning ;-)
> I expect it will be in OpenSolaris by the end of November.

Oh, right.  For my next wish... :)

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgphFSFgwc8Ge.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/iSCSI target integration

2006-11-01 Thread Ceri Davies
On Wed, Nov 01, 2006 at 01:33:33AM -0800, Adam Leventhal wrote:
> Rick McNeal and I have been working on building support for sharing ZVOLs
> as iSCSI targets directly into ZFS. Below is the proposal I'll be
> submitting to PSARC. Comments and suggestions are welcome.

It looks great and I'd love to see it implemented.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgphVzDJjHFGL.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Current status of a ZFS root

2006-10-30 Thread Ceri Davies
On Sun, Oct 29, 2006 at 12:01:45PM -0800, Richard Elling - PAE wrote:
> Chris Adams wrote:
> >We're looking at replacing a current Linux server with a T1000 + a fiber 
> >channel enclosure to take advantage of ZFS. Unfortunately, the T1000 only 
> >has a single drive bay (!) which makes it impossible to follow our normal 
> >practice of mirroring the root file system; naturally the idea of using 
> >that big ZFS pool is appealing.
> 
> Note: the original T1000 had the single disk limit.  This was unfortunate, 
> and a
> sales inhibitor.  Today, you have the option of single (SATA) or dual (SAS) 
> boot
> disks, with hardware RAID.  See:
>   http://www.sun.com/servers/coolthreads/t1000/specs.xml

Good to know that this limit has been removed.  Can the original
T1000s be backfitted, or do I just need to be very careful what
I'm ordering now?

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpSg5vXMreNE.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Where is the ZFS configuration data stored?

2006-10-12 Thread Ceri Davies
On Thu, Oct 12, 2006 at 02:54:05PM +0100, Ceri Davies wrote:
> On Thu, Oct 12, 2006 at 02:06:15PM +0100, Dick Davies wrote:
> > On 12/10/06, Ceri Davies <[EMAIL PROTECTED]> wrote:
> > >On Wed, Oct 11, 2006 at 11:49:48PM -0700, Matthew Ahrens wrote:
> > 
> > >> FYI, /etc/zfs/zpool.cache just tells us what pools to open when you boot
> > >> up.  Everything else (mountpoints, filesystems, etc) is stored in the
> > >> pool itself.
> > >
> > >What happens if the file does not exist?  Are the devices searched for
> > >metadata?
> > 
> > My understanding (I'll be delighted if I'm wrong) is that you would be 
> > stuffed.
> > 
> > I'd expect:
> > 
> > zpool import -f
> > 
> > (see the manpage)
> > to probe /dev/dsk/ and rebuild the zpool.cache file,
> > but my understanding is that this a) doesn't work yet or b) does
> > horrible things to your chances of surviving a reboot [0].
> 
> So how do I import a pool created on a different host for the first
> time?

Never mind, Mark just answered that.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpGc5be2cfdQ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Where is the ZFS configuration data stored?

2006-10-12 Thread Ceri Davies
On Thu, Oct 12, 2006 at 07:53:37AM -0600, Mark Maybee wrote:
> Ceri Davies wrote:
> >On Wed, Oct 11, 2006 at 11:49:48PM -0700, Matthew Ahrens wrote:
> >
> >>James McPherson wrote:
> >>
> >>>On 10/12/06, Steve Goldberg <[EMAIL PROTECTED]> wrote:
> >>>
> >>>>Where is the ZFS configuration (zpools, mountpoints, filesystems,
> >>>>etc) data stored within Solaris?  Is there something akin to vfstab
> >>>>or perhaps a database?
> >>>
> >>>
> >>>Have a look at the contents of /etc/zfs for an in-filesystem artefact
> >>>of zfs. Apart from that, the information required is stored on the
> >>>disk itself.
> >>>
> >>>There is really good documentation on ZFS at the ZFS community
> >>>pages found via http://www.opensolaris.org/os/community/zfs.
> >>
> >>FYI, /etc/zfs/zpool.cache just tells us what pools to open when you boot 
> >>up.  Everything else (mountpoints, filesystems, etc) is stored in the 
> >>pool itself.
> >
> >
> >What happens if the file does not exist?  Are the devices searched for
> >metadata?
> >
> >Ceri
> 
> If the file does not exist than ZFS will not attempt to open any
> pools at boot.  You must issue an explicit 'zpool import' command to
> probe the available devices for metadata to re-discover your pools.

OK, that's fine then.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpNFNE0tDgSA.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Where is the ZFS configuration data stored?

2006-10-12 Thread Ceri Davies
On Thu, Oct 12, 2006 at 02:06:15PM +0100, Dick Davies wrote:
> On 12/10/06, Ceri Davies <[EMAIL PROTECTED]> wrote:
> >On Wed, Oct 11, 2006 at 11:49:48PM -0700, Matthew Ahrens wrote:
> 
> >> FYI, /etc/zfs/zpool.cache just tells us what pools to open when you boot
> >> up.  Everything else (mountpoints, filesystems, etc) is stored in the
> >> pool itself.
> >
> >What happens if the file does not exist?  Are the devices searched for
> >metadata?
> 
> My understanding (I'll be delighted if I'm wrong) is that you would be 
> stuffed.
> 
> I'd expect:
> 
> zpool import -f
> 
> (see the manpage)
> to probe /dev/dsk/ and rebuild the zpool.cache file,
> but my understanding is that this a) doesn't work yet or b) does
> horrible things to your chances of surviving a reboot [0].

So how do I import a pool created on a different host for the first
time?

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpnORJ2EpiPx.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Where is the ZFS configuration data stored?

2006-10-12 Thread Ceri Davies
On Wed, Oct 11, 2006 at 11:49:48PM -0700, Matthew Ahrens wrote:
> James McPherson wrote:
> >On 10/12/06, Steve Goldberg <[EMAIL PROTECTED]> wrote:
> >>Where is the ZFS configuration (zpools, mountpoints, filesystems,
> >>etc) data stored within Solaris?  Is there something akin to vfstab
> >>or perhaps a database?
> >
> >
> >Have a look at the contents of /etc/zfs for an in-filesystem artefact
> >of zfs. Apart from that, the information required is stored on the
> >disk itself.
> >
> >There is really good documentation on ZFS at the ZFS community
> >pages found via http://www.opensolaris.org/os/community/zfs.
> 
> FYI, /etc/zfs/zpool.cache just tells us what pools to open when you boot 
> up.  Everything else (mountpoints, filesystems, etc) is stored in the 
> pool itself.

What happens if the file does not exist?  Are the devices searched for
metadata?

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpKKSlC0EwiG.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Inexpensive SATA Whitebox

2006-10-12 Thread Ceri Davies
On Wed, Oct 11, 2006 at 06:36:28PM -0500, David Dyer-Bennet wrote:
> On 10/11/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> 
> >> The more I learn about Solaris hardware support, the more I see it as
> >> a minefield.
> >
> >
> >I've found this to be true for almost all open source platforms where
> >you're trying to use something that hasn't been explicitly used and
> >tested by the developers.
> 
> I've been running Linux since kernel 0.99pl13, I think it was, and
> have had amazingly little trouble.  Whereas I'm now sitting on $2k of
> hardware that won't do what I wanted it to do under Solaris, so it's a
> bit of a hot-button issue for me right now.  I've never had to
> consider Linux issues in selecting hardware (in fact I haven't
> selected hardware, my linux boxes have all been castoffs originally
> purchased to run Windowsx)

Perhaps that's true of most Linux development machines too :)

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpJujKrQNL0n.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] re: ZFS snapshot for running oracle instance

2006-10-05 Thread Ceri Davies
On Thu, Oct 05, 2006 at 11:04:49AM -0400, Zhisong Jin wrote:
> would it possible to use ZFS snapshot as way 
> to doing hot backup for oracle database? 
> anybody have tried that? 

You would need to put the tablespaces with data files on the filesystem
being snapped into backup mode while you take the snapshot, but it
should work ok.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgp03XTRTg4cc.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-15 Thread Ceri Davies
On Fri, Sep 15, 2006 at 10:55:48AM -0500, Nicolas Williams wrote:
> On Fri, Sep 15, 2006 at 09:31:04AM +0100, Ceri Davies wrote:
> > On Thu, Sep 14, 2006 at 05:08:18PM -0500, Nicolas Williams wrote:
> > > Yes, but the checksum is stored with the pointer.
> > > 
> > > So then, for each file/directory there's a dnode, and that dnode has
> > > several block pointers to data blocks or indirect blocks, and indirect
> > > blocks have pointers to... and so on.
> > 
> > Does ZFS have block fragments?  If so, then updating an unrelated file
> > would change the checksum.
> 
> No.  It has variable sized blocks.

OK, thanks.

> A block pointer in ZFS is much more than just a block number.  Among
> other things a block pointer has the checksum of the block it points to.
> See the on-disk layout document for more info.

I am aware of the block checksum, but haven't got round to reading the
on disk format document yet, hence the question.

> There is no way that updating one file could change another's checksum.

That follows from the non-existence of fragments, sure.

Cheers,

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgptLzZIZQJCY.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-15 Thread Ceri Davies
On Thu, Sep 14, 2006 at 05:08:18PM -0500, Nicolas Williams wrote:
> On Thu, Sep 14, 2006 at 10:32:59PM +0200, Henk Langeveld wrote:
> > Bady, Brant RBCM:EX wrote:
> > >Part of the archiving process is to generate checksums (I happen to use
> > >MD5), and store them with other metadata about the digital object in
> > >order to verify data integrity and demonstrate the authenticity of the
> > >digital object over time.
> > 
> > >Wouldn't it be helpful if there was a utility to access/read  the
> > >checksum data created by ZFS, and use it for those same purposes.
> > 
> > Doesn't ZFS use block-level checksums?
> 
> Yes, but the checksum is stored with the pointer.
> 
> So then, for each file/directory there's a dnode, and that dnode has
> several block pointers to data blocks or indirect blocks, and indirect
> blocks have pointers to... and so on.

Does ZFS have block fragments?  If so, then updating an unrelated file
would change the checksum.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpzabNG9m5HW.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS imported simultanously on 2 systems...

2006-09-13 Thread Ceri Davies
On Wed, Sep 13, 2006 at 06:37:25PM +0100, Darren J Moffat wrote:
> Dale Ghent wrote:
> >On Sep 13, 2006, at 12:32 PM, Eric Schrock wrote:
> >
> >>Storing the hostid as a last-ditch check for administrative error is a
> >>reasonable RFE - just one that we haven't yet gotten around to.
> >>Claiming that it will solve the clustering problem oversimplifies the
> >>problem and will lead to people who think they have a 'safe' homegrown
> >>failover when in reality the right sequence of actions will irrevocably
> >>corrupt their data.
> >
> >HostID is handy, but it'll only tell you who MIGHT or MIGHT NOT have 
> >control of the pool.
> >
> >Such an RFE would even more worthwhile if it included something such as 
> >a time stamp. This time stamp (or similar time-oriented signature) would 
> >be updated regularly (bases on some internal ZFS event). If this stamp 
> >goes for an arbitrary length of time without being updated, another host 
> >in the cluster could force import it on the assumption that the original 
> >host is no longer able to communicate to the zpool.
> 
> That might be acceptable in some environments but that is going to cause 
>  disks to spin up.  That will be very unacceptable in a laptop and 
> maybe even in some energy conscious data centres.
> 
> What you are proposing sounds a lot like a cluster hear beat which IMO 
> really should not be implemented by writing to disks.

Wouldn't it be possible to implement this via SCSI reservations (where
available) a la quorum devices?

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpbrlHYCwiGr.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Proposal: multiple copies of user data

2006-09-12 Thread Ceri Davies
> Hi Matt,
> Interesting proposal.  Has there been any
> consideration if free space being reported for a ZFS
> filesystem would take into account the copies
>  setting?
> 
> Example:
> zfs create mypool/nonredundant_data
> zfs create mypool/redundant_data
> df -h /mypool/nonredundant_data
>  /mypool/redundant_data 
>(shows same amount of free space)
>  zfs set copies=3 mypool/redundant_data
> 
> Would a new df of /mypool/redundant_data now show a
> different amount of free space (presumably 1/3 if
> different) than /mypool/nonredundant_data?

As I understand the proposal, there's nothing new to do here.  The filesystem 
might be 25% full, and it would be 25% full no matter how many copies of the 
filesystem there are.

Similarly with quotas, I'd argue that the extra copies should not count towards 
a user's quota, since a quota is set on the filesystem.  If I'm using 500M on a 
filesystem, I only have 500M of data no matter how many copies of it the 
administrator has decided to keep (cf. RAID1).

I also don't see why a copy can't just be dropped if the "copies" value is 
decreased.

Having said this, I don't see any value in the proposal at all, to be honest.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss