Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Jeff Bonwick
I think we (the ZFS team) all generally agree with you.  The current
nevada code is much better at handling device failures than it was
just a few months ago.  And there are additional changes that were
made for the FishWorks (a.k.a. Amber Road, a.k.a. Sun Storage 7000)
product line that will make things even better once the FishWorks team
has a chance to catch its breath and integrate those changes into nevada.
And then we've got further improvements in the pipeline.

The reason this is all so much harder than it sounds is that we're
trying to provide increasingly optimal behavior given a collection of
devices whose failure modes are largely ill-defined.  (Is the disk
dead or just slow?  Gone or just temporarily disconnected?  Does this
burst of bad sectors indicate catastrophic failure, or just localized
media errors?)  The disks' SMART data is notoriously unreliable, BTW.
So there's a lot of work underway to model the physical topology of
the hardware, gather telemetry from the devices, the enclosures,
the environmental sensors etc, so that we can generate an accurate
FMA fault diagnosis and then tell ZFS to take appropriate action.

We have some of this today; it's just a lot of work to complete it.

Oh, and regarding the original post -- as several readers correctly
surmised, we weren't faking anything, we just didn't want to wait
for all the device timeouts.  Because the disks were on USB, which
is a hotplug-capable bus, unplugging the dead disk generated an
interrupt that bypassed the timeout.  We could have waited it out,
but 60 seconds is an eternity on stage.

Jeff

On Mon, Nov 24, 2008 at 10:45:18PM -0800, Ross wrote:
 But that's exactly the problem Richard:  AFAIK.
 
 Can you state that absolutely, categorically, there is no failure mode out 
 there (caused by hardware faults, or bad drivers) that won't lock a drive up 
 for hours?  You can't, obviously, which is why we keep saying that ZFS should 
 have this kind of timeout feature.
 
 For once I agree with Miles, I think he's written a really good writeup of 
 the problem here.  My simple view on it would be this:
 
 Drives are only aware of themselves as an individual entity.  Their job is to 
 save  restore data to themselves, and drivers are written to minimise any 
 chance of data loss.  So when a drive starts to fail, it makes complete sense 
 for the driver and hardware to be very, very thorough about trying to read or 
 write that data, and to only fail as a last resort.
 
 I'm not at all surprised that drives take 30 seconds to timeout, nor that 
 they could slow a pool for hours.  That's their job.  They know nothing else 
 about the storage, they just have to do their level best to do as they're 
 told, and will only fail if they absolutely can't store the data.
 
 The raid controller on the other hand (Netapp / ZFS, etc) knows all about the 
 pool.  It knows if you have half a dozen good drives online, it knows if 
 there are hot spares available, and it *should* also know how quickly the 
 drives under its care usually respond to requests.
 
 ZFS is perfectly placed to spot when a drive is starting to fail, and to take 
 the appropriate action to safeguard your data.  It has far more information 
 available than a single drive ever will, and should be designed accordingly.
 
 Expecting the firmware and drivers of individual drives to control the 
 failure modes of your redundant pool is just crazy imo.  You're throwing away 
 some of the biggest benefits of using multiple drives in the first place.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Ross Smith
Hey Jeff,

Good to hear there's work going on to address this.

What did you guys think to my idea of ZFS supporting a waiting for a
response status for disks as an interim solution that allows the pool
to continue operation while it's waiting for FMA or the driver to
fault the drive?

I do appreciate that it's hard to come up with a definative it's dead
Jim answer, and I agree that long term the FMA approach will pay
dividends.  But I still feel this is a good short term solution, and
one that would also compliment your long term plans.

My justification for this is that it seems to me that you can split
disk behavior into two states:
- returns data ok
- doesn't return data ok

And for the state where it's not returning data, you can again split
that in two:
- returns wrong data
- doesn't return data

The first of these is already covered by ZFS with its checksums (with
FMA doing the extra work to fault drives), so it's just the second
that needs immediate attention, and for the life of me I can't think
of any situation that a simple timeout wouldn't catch.

Personally I'd love to see two parameters, allowing this behavior to
be turned on if desired, and allowing timeouts to be configured:

zfs-auto-device-timeout
zfs-auto-device-timeout-fail-delay

The first sets whether to use this feature, and configures the maximum
time ZFS will wait for a response from a device before putting it in a
waiting status.  The second would be optional and is the maximum
time ZFS will wait before faulting a device (at which point it's
replaced by a hot spare).

The reason I think this will work well with the FMA work is that you
can implement this now and have a real improvement in ZFS
availability.  Then, as the other work starts bringing better modeling
for drive timeouts, the parameters can be either removed, or set
automatically by ZFS.

Long term I guess there's also the potential to remove the second
setting if you felt FMA etc ever got reliable enough, but personally I
would always want to have the final fail delay set.  I'd maybe set it
to a long value such as 1-2 minutes to give FMA, etc a fair chance to
find the fault.  But I'd be much happier knowing that the system will
*always* be able to replace a faulty device within a minute or two, no
matter what the FMA system finds.

The key thing is that you're not faulting devices early, so FMA is
still vital.  The idea is purely to let ZFS to keep the pool active by
removing the need for the entire pool to wait on the FMA diagnosis.

As I said before, the driver and firmware are only aware of a single
disk, and I would imagine that FMA also has the same limitation - it's
only going to be looking at a single item and trying to determine
whether it's faulty or not.  Because of that, FMA is going to be
designed to be very careful to avoid false positives, and will likely
take it's time to reach an answer in some situations.

ZFS however has the benefit of knowing more about the pool, and in the
vast majority of situations, it should be possible for ZFS to read or
write from other devices while it's waiting for an 'official' result
from any one faulty component.

Ross


On Tue, Nov 25, 2008 at 8:37 AM, Jeff Bonwick [EMAIL PROTECTED] wrote:
 I think we (the ZFS team) all generally agree with you.  The current
 nevada code is much better at handling device failures than it was
 just a few months ago.  And there are additional changes that were
 made for the FishWorks (a.k.a. Amber Road, a.k.a. Sun Storage 7000)
 product line that will make things even better once the FishWorks team
 has a chance to catch its breath and integrate those changes into nevada.
 And then we've got further improvements in the pipeline.

 The reason this is all so much harder than it sounds is that we're
 trying to provide increasingly optimal behavior given a collection of
 devices whose failure modes are largely ill-defined.  (Is the disk
 dead or just slow?  Gone or just temporarily disconnected?  Does this
 burst of bad sectors indicate catastrophic failure, or just localized
 media errors?)  The disks' SMART data is notoriously unreliable, BTW.
 So there's a lot of work underway to model the physical topology of
 the hardware, gather telemetry from the devices, the enclosures,
 the environmental sensors etc, so that we can generate an accurate
 FMA fault diagnosis and then tell ZFS to take appropriate action.

 We have some of this today; it's just a lot of work to complete it.

 Oh, and regarding the original post -- as several readers correctly
 surmised, we weren't faking anything, we just didn't want to wait
 for all the device timeouts.  Because the disks were on USB, which
 is a hotplug-capable bus, unplugging the dead disk generated an
 interrupt that bypassed the timeout.  We could have waited it out,
 but 60 seconds is an eternity on stage.

 Jeff

 On Mon, Nov 24, 2008 at 10:45:18PM -0800, Ross wrote:
 But that's exactly the problem Richard:  AFAIK.

 Can you state that absolutely, 

Re: [zfs-discuss] Race condition yields to kernel panic (u3, u4) or hanging zfs commands (u5)

2008-11-25 Thread Andreas Koppenhoefer
Hello Matt,

you wrote about panic in u3  u4:
 These stack traces look like 6569719 (fixed in s10u5).

Then I suppose it's also fixed by 127127-11 because that patch mentions 6569719.
According to my zfs-hardness-test script this is true.
Instead of crashing with an panic, with 127127-11 these servers now show 
hanging zfs commands like update 5.

Please try my test script on a test server or see below.

 For update 5, you could start with the kernel stack of the hung commands.
 (use ::pgrep and ::findstack)  We might also need the sync thread's stack
 (something like ::walk spa | ::print spa_t
 spa_dsl_pool-dp_txg.tx_sync_thread | ::findstack)

Okay, I'll give it a try.

$ uname -a  
SunOS qacult10 5.10 Generic_137111-08 sun4u sparc SUNW,Ultra-5_10
$ head -1 /etc/release 
   Solaris 10 5/08 s10s_u5wos_10 SPARC
$ ps -ef|grep zfs
root 23795 23466   0 11:02:45 pts/1   0:00 ssh localhost zfs receive 
hardness-test/received
root 23782 23779   0 11:02:45 ?   0:01 zfs receive 
hardness-test/received
root 23807 23804   0 11:02:52 ?   0:00 zfs receive 
hardness-test/received
root 23466 23145   0 11:00:35 pts/1   0:00 /usr/bin/bash 
./zfs-hardness-test.sh
root 23793 23466   0 11:02:45 pts/1   0:00 /usr/bin/bash 
./zfs-hardness-test.sh
root 23804 23797   0 11:02:52 ?   0:00 sh -c zfs receive 
hardness-test/received
root 23779 1   0 11:02:45 ?   0:00 sh -c zfs receive 
hardness-test/received

It seems that a receiving process (pid 23782) already killed has not yet 
finished.
After killing and aborting data transmission, the script does a retry of the 
send-receive pipe (with same arguments) with pid 23807 on receiving end.
There must be a deadlock/race condition.

$ mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs pcipsy ip hook neti 
sctp arp usba fcp fctl zfs random nfs audiosup md lofs logindmux sd ptm fcip 
crypto ipc ]
 ::pgrep zfs$
SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
R  23782  23779  23779  23779  0 0x4a004000 03000171cc90 zfs
R  23807  23804  23804  23804  0 0x4a004000 030001728058 zfs
 ::pgrep zfs$ | ::walk thread | ::findstack -v
stack pointer for thread 3d24480: 2a1007fc8c1
[ 02a1007fc8c1 cv_wait+0x38() ]
  02a1007fc971 delay+0x90(1, 183f000, 17cdef7, 17cdef8, 1, 18c0578)
  02a1007fca21 dnode_special_close+0x20(300221e0a58, 7, 1, 300221e0c68, 7, 
  300221e0a58)
  02a1007fcad1 dmu_objset_evict+0xb8(30003a8dc40, 300027cf500, 7b652000, 
  70407538, 7b652000, 70407400)
  02a1007fcb91 dsl_dataset_evict+0x34(30003a8dc40, 30003a8dc40, 0, 
  300027cf500, 3000418c2c0, 30022366200)
  02a1007fcc41 dbuf_evict_user+0x48(7b6140b0, 30022366200, 30003a8dc48, 0, 0
  , 30022355e20)
  02a1007fccf1 dbuf_rele+0x8c(30022355e78, 30022355e20, 70400400, 3, 3, 3)
  02a1007fcda1 dmu_recvbackup+0x94c(300017c7400, 300017c7d80, 300017c7c28, 
  300017c7416, 16, 1)
  02a1007fcf71 zfs_ioc_recvbackup+0x74(300017c7000, 0, 30004320150, 0, 0, 
  300017c7400)
  02a1007fd031 zfsdev_ioctl+0x15c(70401400, 57, ffbfee20, 1d, 74, ef0)
  02a1007fd0e1 fop_ioctl+0x20(30001d7a0c0, 5a1d, ffbfee20, 13, 
  300027da0c0, 12247f8)
  02a1007fd191 ioctl+0x184(3, 300043216f8, ffbfee20, 0, 1ec08, 5a1d)
  02a1007fd2e1 syscall_trap32+0xcc(3, 5a1d, ffbfee20, 0, 1ec08, ff34774c)
stack pointer for thread 30003d12e00: 2a1009dca41
[ 02a1009dca41 turnstile_block+0x600() ]
  02a1009dcaf1 mutex_vector_enter+0x3f0(0, 0, 30022355e78, 3d24480, 
  3d24480, 0)
  02a1009dcba1 dbuf_read+0x6c(30022355e20, 0, 1, 1, 0, 300220f1cf8)
  02a1009dcc61 dmu_bonus_hold+0xec(0, 15, 30022355e20, 2a1009dd5d8, 8, 0)
  02a1009dcd21 dsl_dataset_open_obj+0x2c(3000418c2c0, 15, 0, 9, 300043ebe88
  , 2a1009dd6a8)
  02a1009dcde1 dsl_dataset_open_spa+0x140(0, 7b64d000, 3000418c488, 
  300043ebe88, 2a1009dd768, 9)
  02a1009dceb1 dmu_objset_open+0x20(30003ca9000, 5, 9, 2a1009dd828, 1, 
  300043ebe88)
  02a1009dcf71 zfs_ioc_objset_stats+0x18(30003ca9000, 0, 0, 0, 70401400, 39
  )
  02a1009dd031 zfsdev_ioctl+0x15c(70401400, 39, ffbfc468, 13, 4c, ef0)
  02a1009dd0e1 fop_ioctl+0x20(30001d7a0c0, 5a13, ffbfc468, 13, 
  300027da010, 12247f8)
  02a1009dd191 ioctl+0x184(3, 300043208f8, ffbfc468, 0, 1010101, 5a13)
  02a1009dd2e1 syscall_trap32+0xcc(3, 5a13, ffbfc468, 0, 1010101, 7cb88)
 
 ::walk spa | ::print spa_t
{
spa_name = 0x30022613108 hardness-test
spa_avl = {
avl_child = [ 0, 0 ]
avl_pcb = 0x1
}
spa_config = 0x3002244abd0
spa_config_syncing = 0
spa_config_txg = 0x4
spa_config_cache_lock = {
_opaque = [ 0 ]
}
spa_sync_pass = 0x1
spa_state = 0
spa_inject_ref = 0
spa_traverse_wanted = 0
spa_sync_on = 0x1
spa_load_state = 0 (SPA_LOAD_NONE)
spa_zio_issue_taskq = [ 0x300225e5528, 0x300225e56d8, 0x300225e5888, 
0x300225e5a38, 0x300225e5be8, 0x300225e5d98 ]

Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Ross Smith
PS.  I think this also gives you a chance at making the whole problem
much simpler.  Instead of the hard question of is this faulty,
you're just trying to say is it working right now?.

In fact, I'm now wondering if the waiting for a response flag
wouldn't be better as possibly faulty.  That way you could use it
with checksum errors too, possibly with settings as simple as errors
per minute or error percentage.  As with the timeouts, you could
have it off by default (or provide sensible defaults), and let
administrators tweak it for their particular needs.

Imagine a pool with the following settings:
- zfs-auto-device-timeout = 5s
- zfs-auto-device-checksum-fail-limit-epm = 20
- zfs-auto-device-checksum-fail-limit-percent = 10
- zfs-auto-device-fail-delay = 120s

That would allow the pool to flag a device as possibly faulty
regardless of the type of fault, and take immediate proactive action
to safeguard data (generally long before the device is actually
faulted).

A device triggering any of these flags would be enough for ZFS to
start reading from (or writing to) other devices first, and should you
get multiple failures, or problems on a non redundant pool, you always
just revert back to ZFS' current behaviour.

Ross





On Tue, Nov 25, 2008 at 8:37 AM, Jeff Bonwick [EMAIL PROTECTED] wrote:
 I think we (the ZFS team) all generally agree with you.  The current
 nevada code is much better at handling device failures than it was
 just a few months ago.  And there are additional changes that were
 made for the FishWorks (a.k.a. Amber Road, a.k.a. Sun Storage 7000)
 product line that will make things even better once the FishWorks team
 has a chance to catch its breath and integrate those changes into nevada.
 And then we've got further improvements in the pipeline.

 The reason this is all so much harder than it sounds is that we're
 trying to provide increasingly optimal behavior given a collection of
 devices whose failure modes are largely ill-defined.  (Is the disk
 dead or just slow?  Gone or just temporarily disconnected?  Does this
 burst of bad sectors indicate catastrophic failure, or just localized
 media errors?)  The disks' SMART data is notoriously unreliable, BTW.
 So there's a lot of work underway to model the physical topology of
 the hardware, gather telemetry from the devices, the enclosures,
 the environmental sensors etc, so that we can generate an accurate
 FMA fault diagnosis and then tell ZFS to take appropriate action.

 We have some of this today; it's just a lot of work to complete it.

 Oh, and regarding the original post -- as several readers correctly
 surmised, we weren't faking anything, we just didn't want to wait
 for all the device timeouts.  Because the disks were on USB, which
 is a hotplug-capable bus, unplugging the dead disk generated an
 interrupt that bypassed the timeout.  We could have waited it out,
 but 60 seconds is an eternity on stage.

 Jeff

 On Mon, Nov 24, 2008 at 10:45:18PM -0800, Ross wrote:
 But that's exactly the problem Richard:  AFAIK.

 Can you state that absolutely, categorically, there is no failure mode out 
 there (caused by hardware faults, or bad drivers) that won't lock a drive up 
 for hours?  You can't, obviously, which is why we keep saying that ZFS 
 should have this kind of timeout feature.

 For once I agree with Miles, I think he's written a really good writeup of 
 the problem here.  My simple view on it would be this:

 Drives are only aware of themselves as an individual entity.  Their job is 
 to save  restore data to themselves, and drivers are written to minimise 
 any chance of data loss.  So when a drive starts to fail, it makes complete 
 sense for the driver and hardware to be very, very thorough about trying to 
 read or write that data, and to only fail as a last resort.

 I'm not at all surprised that drives take 30 seconds to timeout, nor that 
 they could slow a pool for hours.  That's their job.  They know nothing else 
 about the storage, they just have to do their level best to do as they're 
 told, and will only fail if they absolutely can't store the data.

 The raid controller on the other hand (Netapp / ZFS, etc) knows all about 
 the pool.  It knows if you have half a dozen good drives online, it knows if 
 there are hot spares available, and it *should* also know how quickly the 
 drives under its care usually respond to requests.

 ZFS is perfectly placed to spot when a drive is starting to fail, and to 
 take the appropriate action to safeguard your data.  It has far more 
 information available than a single drive ever will, and should be designed 
 accordingly.

 Expecting the firmware and drivers of individual drives to control the 
 failure modes of your redundant pool is just crazy imo.  You're throwing 
 away some of the biggest benefits of using multiple drives in the first 
 place.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 

Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Ross Smith
No, I count that as doesn't return data ok, but my post wasn't very
clear at all on that.

Even for a write, the disk will return something to indicate that the
action has completed, so that can also be covered by just those two
scenarios, and right now ZFS can lock the whole pool up if it's
waiting for that response.

My idea is simply to allow the pool to continue operation while
waiting for the drive to fault, even if that's a faulty write.  It
just means that the rest of the operations (reads and writes) can keep
working for the minute (or three) it takes for FMA and the rest of the
chain to flag a device as faulty.

For write operations, the data can be safely committed to the rest of
the pool, with just the outstanding writes for the drive left waiting.
 Then as soon as the device is faulted, the hot spare can kick in, and
the outstanding writes quickly written to the spare.

For single parity, or non redundant volumes there's some benefit in
this.  For dual parity pools there's a massive benefit as your pool
stays available, and your data is still well protected.

Ross



On Tue, Nov 25, 2008 at 10:44 AM,  [EMAIL PROTECTED] wrote:


My justification for this is that it seems to me that you can split
disk behavior into two states:
- returns data ok
- doesn't return data ok


 I think you're missing won't write.

 There's clearly a difference between get data from a different copy
 which you can fix but retrying data to a different part of the redundant
 data and writing data: the data which can't be written must be kept
 until the drive is faulted.


 Casper


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Casper . Dik


My idea is simply to allow the pool to continue operation while
waiting for the drive to fault, even if that's a faulty write.  It
just means that the rest of the operations (reads and writes) can keep
working for the minute (or three) it takes for FMA and the rest of the
chain to flag a device as faulty.

Except when you're writing a lot; 3 minutes can cause a 20GB backlog
for a single disk.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Casper . Dik


My justification for this is that it seems to me that you can split
disk behavior into two states:
- returns data ok
- doesn't return data ok


I think you're missing won't write.

There's clearly a difference between get data from a different copy
which you can fix but retrying data to a different part of the redundant 
data and writing data: the data which can't be written must be kept
until the drive is faulted.


Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Ross Smith
Hmm, true.  The idea doesn't work so well if you have a lot of writes,
so there needs to be some thought as to how you handle that.

Just thinking aloud, could the missing writes be written to the log
file on the rest of the pool?  Or temporarily stored somewhere else in
the pool?  Would it be an option to allow up to a certain amount of
writes to be cached in this way while waiting for FMA, and only
suspend writes once that cache is full?

With a large SSD slog device would it be possible to just stream all
writes to the log?  As a further enhancement, might it be possible to
commit writes to the working drives, and just leave the writes for the
bad drive(s) in the slog (potentially saving a lot of space)?

For pools without log devices, I suspect that you would probably need
the administrator to specify the behavior as I can see several options
depending on the raid level and that pools priorities for data
availability / integrity:

Drive fault write cache settings:
default - pool waits for device, no writes occur until device or spare
comes online
slog - writes are cached to slog device until full, then pool reverts
to default behavior (could this be the default with slog devices
present?)
pool - writes are cached to the pool itself, up to a set maximum, and
are written to the device or spare as soon as possible.  This assumes
a single parity pool with the other devices available.  If the upper
limit is reached, or another devices goes faulty, pool reverts to
default behaviour.

Storing directly to the rest of the pool would probably want to be off
by default on single parity pools, but I would imagine that it could
be on by default on dual parity pools.

Would that be enough to allow writes to continue in most circumstances
while the pool waits for FMA?

Ross



On Tue, Nov 25, 2008 at 10:55 AM,  [EMAIL PROTECTED] wrote:


My idea is simply to allow the pool to continue operation while
waiting for the drive to fault, even if that's a faulty write.  It
just means that the rest of the operations (reads and writes) can keep
working for the minute (or three) it takes for FMA and the rest of the
chain to flag a device as faulty.

 Except when you're writing a lot; 3 minutes can cause a 20GB backlog
 for a single disk.

 Casper


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] So close to better, faster, cheaper....

2008-11-25 Thread Darren J Moffat
marko b wrote:
 Let me see if I'm understanding your suggestion. A stripe of mirrored pairs.
 I can grow by resizing an existing mirrored pair, or just attaching 
 another mirrored pair to the stripe?

Both adding an additional mirrored pair to the stripe and by replacing 
the sides of the mirror of an existing one with larger disks.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MIgrating to ZFS root/boot with system in several datasets

2008-11-25 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Lori Alt wrote:
 The SXCE code base really only supports BEs that are
 either all in one dataset, or have everything but /var in
 one dataset and /var in its own dataset (the reason for
 supporting a separate /var is to be able to set a set a
 quota on it so growth in log files, etc. can't fill up a
 root pool).

OK. I have unified root dataset now. I want to segregate /var. How
is it done by hand?. Must I use a legacy ZFS mountpoint or what?. There
is an option for that in Live Update?.

We are talking about Solaris 10 Update 6.

Thanks in advance.

- --
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[EMAIL PROTECTED] - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSSvzzJlgi5GaxT1NAQIK9gP9GDGDdNvQuB3d2p4lG8TsbnKlKNRLQ1oA
jFAGwpD6/0p74fNVFGGZZVsM/6BCxZZlDMUvRygHSOK4TZNV3EuiABOoBCdYtxoV
bsCTNwmg4R/fQUUkV8LP+BfQPzTjEaxsn3GpvyYwlS/Fj+lpKOhu4usZlv6cHPqt
73iCnmKDQto=
=TuCa
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Toby Thain

On 25-Nov-08, at 5:10 AM, Ross Smith wrote:

 Hey Jeff,

 Good to hear there's work going on to address this.

 What did you guys think to my idea of ZFS supporting a waiting for a
 response status for disks as an interim solution that allows the pool
 to continue operation while it's waiting for FMA or the driver to
 fault the drive?
 ...

 The first of these is already covered by ZFS with its checksums (with
 FMA doing the extra work to fault drives), so it's just the second
 that needs immediate attention, and for the life of me I can't think
 of any situation that a simple timeout wouldn't catch.

 Personally I'd love to see two parameters, allowing this behavior to
 be turned on if desired, and allowing timeouts to be configured:

 zfs-auto-device-timeout
 zfs-auto-device-timeout-fail-delay

 The first sets whether to use this feature, and configures the maximum
 time ZFS will wait for a response from a device before putting it in a
 waiting status.


The shortcomings of timeouts have been discussed on this list before.  
How do you tell the difference between a drive that is dead and a  
path that is just highly loaded?

I seem to recall the argument strongly made in the past that making  
decisions based on a timeout alone can provoke various undesirable  
cascade effects.

   The second would be optional and is the maximum
 time ZFS will wait before faulting a device (at which point it's
 replaced by a hot spare).

 The reason I think this will work well with the FMA work is that you
 can implement this now and have a real improvement in ZFS
 availability.  Then, as the other work starts bringing better modeling
 for drive timeouts, the parameters can be either removed, or set
 automatically by ZFS.
 ... it should be possible for ZFS to read or
 write from other devices while it's waiting for an 'official' result
 from any one faulty component.

Sounds good - devil, meet details, etc.

--Toby


 Ross


 On Tue, Nov 25, 2008 at 8:37 AM, Jeff Bonwick  
 [EMAIL PROTECTED] wrote:
 I think we (the ZFS team) all generally agree with you. ...

 The reason this is all so much harder than it sounds is that we're
 trying to provide increasingly optimal behavior given a collection of
 devices whose failure modes are largely ill-defined.  (Is the disk
 dead or just slow?  Gone or just temporarily disconnected? ...

 Jeff

 On Mon, Nov 24, 2008 at 10:45:18PM -0800, Ross wrote:
 But that's exactly the problem Richard:  AFAIK.

 Can you state that absolutely, categorically, there is no failure  
 mode out there (caused by hardware faults, or bad drivers) that  
 won't lock a drive up for hours?  You can't, obviously, which is  
 why we keep saying that ZFS should have this kind of timeout  
 feature.
 ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replacing disk

2008-11-25 Thread Krzys
Anyway I did not get any help but I was able to figure it out.

[12:58:08] [EMAIL PROTECTED]: /root  zpool status mypooladas
   pool: mypooladas
  state: DEGRADED
status: One or more devices could not be used because the label is missing or
 invalid.  Sufficient replicas exist for the pool to continue
 functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
  scrub: resilver completed after 0h34m with 0 errors on Tue Nov 25 03:59:23 
2008
config:

 NAME  STATE READ WRITE CKSUM
 mypooladasDEGRADED 0 0 0
   raidz2  DEGRADED 0 0 0
 c4t2d0ONLINE   0 0 0
 c4t3d0ONLINE   0 0 0
 c4t4d0ONLINE   0 0 0
 c4t5d0ONLINE   0 0 0
 c4t8d0ONLINE   0 0 0
 c4t9d0ONLINE   0 0 0
 c4t10d0   ONLINE   0 0 0
 c4t11d0   ONLINE   0 0 0
 c4t12d0   ONLINE   0 0 0
 16858115878292111089  FAULTED  0 0 0  was 
/dev/dsk/c4t13d0s0
 c4t14d0   ONLINE   0 0 0
 c4t15d0   ONLINE   0 0 0

errors: No known data errors
[12:58:23] [EMAIL PROTECTED]: /root 


Anyway the way I fixed my problem was that I did export my pool so it did not 
exist, then I took that disk which manually had to be imported and I just 
created a test pool out of it with -f option on just that one disk. then I did 
destroy that test pool, then I imported my original pool and I was able to 
replace my bad disk with old disk from that particulat pool... It is kind of 
work around but sucks that there is no easy way of getting it rather than going 
around this way. and format -e, changing label on that disk did not help, I 
even 
recreated partition table and I did make a huge file, I was trying to dd to 
that 
disk hoping it would overwrite any zfs info, but I was unable to do any of 
that... so my work around trick did work and I have one extra disk to go, just 
need to buy it as I am short on one disk at this moment.

On Mon, 24 Nov 2008, Krzys wrote:


 somehow I have issue replacing my disk.

 [20:09:29] [EMAIL PROTECTED]: /root  zpool status mypooladas
   pool: mypooladas
  state: DEGRADED
 status: One or more devices could not be opened.  Sufficient replicas exist 
 for
 the pool to continue functioning in a degraded state.
 action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
  scrub: resilver completed after 0h0m with 0 errors on Mon Nov 24 20:06:48 
 2008
 config:

 NAME  STATE READ WRITE CKSUM
 mypooladasDEGRADED 0 0 0
   raidz2  DEGRADED 0 0 0
 c4t2d0ONLINE   0 0 0
 c4t3d0ONLINE   0 0 0
 c4t4d0ONLINE   0 0 0
 c4t5d0ONLINE   0 0 0
 c4t8d0UNAVAIL  0 0 0  cannot open
 c4t9d0ONLINE   0 0 0
 c4t10d0   ONLINE   0 0 0
 c4t11d0   ONLINE   0 0 0
 c4t12d0   ONLINE   0 0 0
 16858115878292111089  FAULTED  0 0 0  was
 /dev/dsk/c4t13d0s0
 c4t14d0   ONLINE   0 0 0
 c4t15d0   ONLINE   0 0 0

 errors: No known data errors
 [20:09:38] [EMAIL PROTECTED]: /root 

 I am trying to replace c4t13d0 disk.

 [20:09:38] [EMAIL PROTECTED]: /root  zpool replace -f mypooladas c4t13d0
 invalid vdev specification
 the following errors must be manually repaired:
 /dev/dsk/c4t13d0s0 is part of active ZFS pool mypooladas. Please see 
 zpool(1M).
 [20:10:13] [EMAIL PROTECTED]: /root  zpool online mypooladas c4t13d0
 zpool replace -f mypooladas c4t13d0
 warning: device 'c4t13d0' onlined, but remains in faulted state
 use 'zpool replace' to replace devices that are no longer present
 [20:11:14] [EMAIL PROTECTED]: /root  zpool replace -f mypooladas c4t13d0
 invalid vdev specification
 the following errors must be manually repaired:
 /dev/dsk/c4t13d0s0 is part of active ZFS pool mypooladas. Please see 
 zpool(1M).
 [20:11:45] [EMAIL PROTECTED]: /root  zpool replace -f mypooladas c4t8d0 
 c4t13d0
 invalid vdev specification
 the following errors must be manually repaired:
 /dev/dsk/c4t13d0s0 is part of active ZFS pool mypooladas. Please see 
 zpool(1M).
 [20:13:24] [EMAIL 

Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Ross Smith
 The shortcomings of timeouts have been discussed on this list before. How do
 you tell the difference between a drive that is dead and a path that is just
 highly loaded?

A path that is dead is either returning bad data, or isn't returning
anything.  A highly loaded path is by definition reading  writing
lots of data.  I think you're assuming that these are file level
timeouts, when this would actually need to be much lower level.


 Sounds good - devil, meet details, etc.

Yup, I imagine there are going to be a few details to iron out, many
of which will need looking at by somebody a lot more technical than
myself.

Despite that I still think this is a discussion worth having.  So far
I don't think I've seen any situation where this would make things
worse than they are now, and I can think of plenty of cases where it
would be a huge improvement.

Of course, it also probably means a huge amount of work to implement.
I'm just hoping that it's not prohibitively difficult, and that the
ZFS team see the benefits as being worth it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] HELP!!! Need to disable zfs

2008-11-25 Thread Mike DeMarco
My root drive is ufs. I have corrupted my zpool which is on a different drive 
than the root drive.
My system paniced and now it core dumps when it boots up and hits zfs start. I 
have a alt root drive that  can boot the system up with but how can I disable 
zfs from starting on a different drive?

HELP HELP HELP
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP!!! Need to disable zfs

2008-11-25 Thread Mike Gerdts
Boot from the other root drive, mount up the bad one at /mnt.  Then:

# mv /mnt/etc/zfs/zpool.cache /mnt/etc/zpool.cache.bad



On Tue, Nov 25, 2008 at 8:18 AM, Mike DeMarco [EMAIL PROTECTED] wrote:
 My root drive is ufs. I have corrupted my zpool which is on a different drive 
 than the root drive.
 My system paniced and now it core dumps when it boots up and hits zfs start. 
 I have a alt root drive that  can boot the system up with but how can I 
 disable zfs from starting on a different drive?

 HELP HELP HELP
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP!!! Need to disable zfs

2008-11-25 Thread Enda O'Connor
Mike DeMarco wrote:
 My root drive is ufs. I have corrupted my zpool which is on a different drive 
 than the root drive.
 My system paniced and now it core dumps when it boots up and hits zfs start. 
 I have a alt root drive that  can boot the system up with but how can I 
 disable zfs from starting on a different drive?
 
 HELP HELP HELP
boot the working alt root drive, mount the other drive to /a
then
mv /a/etc/zfs/zpool.cache /a/etc/zfs/zpool.cache.corrupt

reboot

Enda
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Odd filename in zpool status -v output

2008-11-25 Thread Chris Ridd
My non-redundant rpool (2 replacement disks have been ordered :-) is  
reporting errors:

canopus% pfexec zpool status -v rpool
   pool: rpool
  state: ONLINE
status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub in progress for 4h18m, 72.07% done, 1h40m to go
config:

 NAMESTATE READ WRITE CKSUM
 rpool   ONLINE   8 0 0
   c5d0s0ONLINE 818 0 0  540K repaired

errors: Permanent errors have been detected in the following files:

 rpool/ROOT/opensolaris-101:/var/tmp/stmAAAaXaWkb.0015
 rpool/canopus1:0x0

So I don't think I care about the damage to /var/tmp/stmAAAaXaWkb. 
0015, but what's the second filename printed there?

The pool has an rpool/canopus1 filesystem so I guess it is somehow  
related to that.

I'm running the current public build (101b) of OpenSolaris.

Cheers,

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Scara Maccai
 Oh, and regarding the original post -- as several
 readers correctly
 surmised, we weren't faking anything, we just didn't
 want to wait
 for all the device timeouts.  Because the disks were
 on USB, which
 is a hotplug-capable bus, unplugging the dead disk
 generated an
 interrupt that bypassed the timeout.  We could have
 waited it out,
 but 60 seconds is an eternity on stage.

I'm sorry, I didn't mean to sound offensive. Anyway I think that people should 
know that their drives can stuck the system for minutes, despite ZFS. I mean: 
there are a lot of writings about how ZFS is great for recovery in case a drive 
fails, but there's nothing regarding this problem. I know now it's not ZFS 
fault; but I wonder how many people set up their drives with ZFS assuming that 
as soon as something goes bad, ZFS will fix it. 
Is there any way to test these cases other than smashing the drive with a 
hammer? Having a failover policy where the failover can't be tested sounds 
scary...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Moore, Joe
Ross Smith wrote:
 My justification for this is that it seems to me that you can split
 disk behavior into two states:
 - returns data ok
 - doesn't return data ok
 
 And for the state where it's not returning data, you can again split
 that in two:
 - returns wrong data
 - doesn't return data

The state in discussion in this thread is the I/O requested by ZFS hasn't 
finished after 60, 120, 180, 3600, etc. seconds

The pool is waiting (for device timeouts) to distinguish between the first two 
states.

More accurate state descriptions are:
- The I/O has returned data
- The I/O hasn't yet returned data and the user (admin) is justifiably 
impatient.

For the first state, the data is either correct (verified by the ZFS checksums, 
or ESUCCESS on write) or incorrect and retried.

 
 The first of these is already covered by ZFS with its checksums (with
 FMA doing the extra work to fault drives), so it's just the second
 that needs immediate attention, and for the life of me I can't think
 of any situation that a simple timeout wouldn't catch.
 
 Personally I'd love to see two parameters, allowing this behavior to
 be turned on if desired, and allowing timeouts to be configured:
 
 zfs-auto-device-timeout
 zfs-auto-device-timeout-fail-delay

I'd prefer these be set at the (default) pool level:
zpool-device-timeout
zpool-device-timeout-fail-delay

with specific per-VDEV overrides possible:
vdev-device-timeout and vdev-device-fail-delay

This would allow but not require slower VDEVs to be tuned specifically for that 
case without hindering the default pool behavior on the local fast disks.  
Specifically, consider where I'm using mirrored VDEVs with one half over iSCSI, 
and want to have the iSCSI retry logic to still apply.  Writes that failed 
while the iSCSI link is down would have to be resilvered, but at least reads 
would switch to the local devices faster.

Set them to the default magic 0 value to have the system use the current 
behavior, of relying on the device drivers to report failures.
Set to a number (in ms probably) and the pool would consider an I/O that takes 
longer than that as returns invalid data

When the FMA work discussed below, these could be augmented by the pools best 
heuristic guess as to what the proper timeouts should be, which could be saved 
in (kstat?) vdev-device-autotimeout.

If you set the timeout to the magic -1 value, the pool would use 
vdev-device-autotimeout.

All that would be required is for the I/O that caused the disk to take a long 
time to be given a deadline (now + (vdev-device-timeout ?: 
(zpool-device-timeout?: forever)))* and consider the I/O complete with whatever 
data has returned after that deadline: if that's a bunch of 0's in a read, 
which would have a bad checksum; or a partially-completed write that would have 
to be committed somewhere else.

Unfortunately, I'm not enough of a programmer to implement this.

--Joe
* with the -1 magic, it would be a little more complicated calculation.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP!!! Need to disable zfs

2008-11-25 Thread Mike DeMarco
 Boot from the other root drive, mount up the bad
 one at /mnt.  Then:
 
 # mv /mnt/etc/zfs/zpool.cache
 /mnt/etc/zpool.cache.bad
 
 
 
 On Tue, Nov 25, 2008 at 8:18 AM, Mike DeMarco
 [EMAIL PROTECTED] wrote:
  My root drive is ufs. I have corrupted my zpool
 which is on a different drive than the root drive.
  My system paniced and now it core dumps when it
 boots up and hits zfs start. I have a alt root drive
 that  can boot the system up with but how can I
 disable zfs from starting on a different drive?
 
  HELP HELP HELP
  --
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
 
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
 
 
 
 
 -- 
 Mike Gerdts
 http://mgerdts.blogspot.com/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

That got it. Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Bob Friesenhahn
On Tue, 25 Nov 2008, Ross Smith wrote:

 Good to hear there's work going on to address this.

 What did you guys think to my idea of ZFS supporting a waiting for a
 response status for disks as an interim solution that allows the pool
 to continue operation while it's waiting for FMA or the driver to
 fault the drive?

A stable and sane system never comes with two brains.  It is wrong 
to put this sort of logic into ZFS when ZFS is already depending on 
FMA to make the decisions and Solaris already has an infrastructure to 
handle faults.  The more appropriate solution is that this feature 
should be in FMA.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Richard Elling
Scara Maccai wrote:
 Oh, and regarding the original post -- as several
 readers correctly
 surmised, we weren't faking anything, we just didn't
 want to wait
 for all the device timeouts.  Because the disks were
 on USB, which
 is a hotplug-capable bus, unplugging the dead disk
 generated an
 interrupt that bypassed the timeout.  We could have
 waited it out,
 but 60 seconds is an eternity on stage.
 

 I'm sorry, I didn't mean to sound offensive. Anyway I think that people 
 should know that their drives can stuck the system for minutes, despite 
 ZFS. I mean: there are a lot of writings about how ZFS is great for recovery 
 in case a drive fails, but there's nothing regarding this problem. I know now 
 it's not ZFS fault; but I wonder how many people set up their drives with ZFS 
 assuming that as soon as something goes bad, ZFS will fix it. 
 Is there any way to test these cases other than smashing the drive with a 
 hammer? Having a failover policy where the failover can't be tested sounds 
 scary...
   

It is with this idea in mind that I wrote part of Chapter 1 of the book
Designing Enterprise Solutions with Sun Cluster 3.0.  For convenience,
I also published chapter 1 as a Sun BluePrint Online article.
http://www.sun.com/blueprints/1101/clstrcomplex.pdf
False positives are very expensive in highly available systems, so we
really do want to avoid them.

One thing that we can do, and I've already (again[1]) started down the path
to document, is to show where and how the various (common) timeouts
are in the system.  Once you know how sd, cmdk, dbus, and friends work
you can make better decisions on where to look when the behaviour is not
as you expect.  But this is a very tedious path because there are so many
different failure modes and real-world devices can react ambiguously
when they fail.

[1] we developed a method to benchmark cluster dependability. The
description of the benchmark was published in several papers, but is
now available in the new IEEE book on Dependability Benchmarking.
This is really the first book of its kind and the first steps toward making
dependability benchmarks more mainstream. Anyway, the work done
for that effort included methods to improve failure detection and handling,
so we have a detailed understanding of those things for SPARC, in lab
form.  Expanding that work to cover the random-device-bought-at-Frys
will be a substantial undertaking.  Co-conspirators welcome.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Nicolas Williams
On Tue, Nov 25, 2008 at 11:55:17AM +0100, [EMAIL PROTECTED] wrote:
 My idea is simply to allow the pool to continue operation while
 waiting for the drive to fault, even if that's a faulty write.  It
 just means that the rest of the operations (reads and writes) can keep
 working for the minute (or three) it takes for FMA and the rest of the
 chain to flag a device as faulty.
 
 Except when you're writing a lot; 3 minutes can cause a 20GB backlog
 for a single disk.

If we're talking isolated, or even clumped-but-relatively-few bad
sectors, then having a short timeout for writes and remapping
should be possible to do without running out of memory to cache
those writes.  But...

...writes to bad sectors will happen when txgs flush, and depending on
how bad sector remapping is done (say, by picking a new block address
and changing the blkptrs that referred to the old one) that might mean
redoing large chunks of the txg in the next one, which might mean that
fsync() could be delayed an additional 5 seconds or so.  And even if
that's not the case, writes to mirrors are supposed to be synchronous,
so one would think that bad block remapping should be synchronous also,
thus there must be a delay on writes to bad blocks no matter what --
though that delay could be tuned to be no more than a few seconds.

That points to a possibly decent heuristic on writes: vdev-level
timeouts that result in bad block remapping, but if the queue of
outstanding bad block remappings grows too large - treat the disk
as faulted and degrade the pool.

Sounds simple, but it needs to be combined at a higher layer with
information from other vdevs.  Unplugging a whole jbod shouldn't
necessarily fault all the vdevs on it -- perhaps it should cause
pool operation to pause until the jbod is plugged back in... which
should then cause those outstanding bad block remappings to be
rolled back since they weren't bad blocks after all.

That's a lot of fault detection and handling logic across many layers.

Incidentally, cables to fall out, or, rather, get pulled out
accidentally.  What should be the failure mode of a jbod disappearing
due to a pulled cable (or power supply failure)?  A pause in operation
(hangs)?  Or faulting of all affected vdevs, and if you're mirrored
across different jbods, incurring the need to re-silver later, with
degraded operation for hours on end?  I bet answers will vary.  The best
answer is to provide enough redundancy (multiple power supplies,
multi-pathing, ...) to make such situations less likely, but that's not
a complete answer.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Ross Smith
I disagree Bob, I think this is a very different function to that
which FMA provides.

As far as I know, FMA doesn't have access to the big picture of pool
configuration that ZFS has, so why shouldn't ZFS use that information
to increase the reliability of the pool while still using FMA to
handle device failures?

The flip side of the argument is that ZFS already checks the data
returned by the hardware.  You might as well say that FMA should deal
with that too since it's responsible for all hardware failures.

The role of ZFS is to manage the pool, availability should be part and
parcel of that.


On Tue, Nov 25, 2008 at 3:57 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Tue, 25 Nov 2008, Ross Smith wrote:

 Good to hear there's work going on to address this.

 What did you guys think to my idea of ZFS supporting a waiting for a
 response status for disks as an interim solution that allows the pool
 to continue operation while it's waiting for FMA or the driver to
 fault the drive?

 A stable and sane system never comes with two brains.  It is wrong to put
 this sort of logic into ZFS when ZFS is already depending on FMA to make the
 decisions and Solaris already has an infrastructure to handle faults.  The
 more appropriate solution is that this feature should be in FMA.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Bob Friesenhahn
On Tue, 25 Nov 2008, Ross Smith wrote:

 I disagree Bob, I think this is a very different function to that
 which FMA provides.

 As far as I know, FMA doesn't have access to the big picture of pool
 configuration that ZFS has, so why shouldn't ZFS use that information
 to increase the reliability of the pool while still using FMA to
 handle device failures?

If FMA does not currently have knowledge of the redundancy model but 
needs it to make well-informed decisions, then it should be updated to 
incorporate this information.

FMA sees all the hardware in the system, including devices used for 
UFS and other types of filesystems, and even tape devices.  It is able 
to see hardware at a much more detailed level than ZFS does.  ZFS only 
sees an abstracted level of the hardware.  If a HBA or part of the 
backplane fails, FMA should be able to determine the failing area (at 
least as far out as it can see based on available paths) whereas all 
ZFS knows is that it is having difficulty getting there from here.

 The flip side of the argument is that ZFS already checks the data
 returned by the hardware.  You might as well say that FMA should deal
 with that too since it's responsible for all hardware failures.

If bad data is returned, then I assume that there is a peg to FMA's 
error statistics counters.

 The role of ZFS is to manage the pool, availability should be part and
 parcel of that.

Too much complexity tends to clog up the works and keep other areas of 
ZFS from being enhanced expediently.  ZFS would soon become a chunk of 
source code that no mortal could understand and as such it would be 
put under maintenance with no more hope of moving forward and 
inability to address new requirements.

A rational system really does not want to have mutiple brains. 
Otherwise some parts of the system will think that the device is fine 
while other parts believe that it has failed. None of us want to deal 
with an insane system like that.  There is also the matter of fault 
isolation.  If a drive can not be reached, is it because the drive 
failed, or because a HBA supporting multiple drives failed, or a cable 
got pulled?  This sort of information is extremely important for large 
reliable systems.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Eric Schrock
It's hard to tell exactly what you are asking for, but this sounds
similar to how ZFS already works.  If ZFS decides that a device is
pathologically broken (as evidenced by vdev_probe() failure), it knows
that FMA will come back and diagnose the drive is faulty (becuase we
generate a probe_failure ereport).  So ZFS pre-emptively short circuits
all I/O and treats the drive as faulted, even though the diagnosis
hasn't come back yet.  We can only do this for errors that have a 1:1
correspondence with faults.

- Eric

On Tue, Nov 25, 2008 at 04:10:13PM +, Ross Smith wrote:
 I disagree Bob, I think this is a very different function to that
 which FMA provides.
 
 As far as I know, FMA doesn't have access to the big picture of pool
 configuration that ZFS has, so why shouldn't ZFS use that information
 to increase the reliability of the pool while still using FMA to
 handle device failures?
 
 The flip side of the argument is that ZFS already checks the data
 returned by the hardware.  You might as well say that FMA should deal
 with that too since it's responsible for all hardware failures.
 
 The role of ZFS is to manage the pool, availability should be part and
 parcel of that.
 
 
 On Tue, Nov 25, 2008 at 3:57 PM, Bob Friesenhahn
 [EMAIL PROTECTED] wrote:
  On Tue, 25 Nov 2008, Ross Smith wrote:
 
  Good to hear there's work going on to address this.
 
  What did you guys think to my idea of ZFS supporting a waiting for a
  response status for disks as an interim solution that allows the pool
  to continue operation while it's waiting for FMA or the driver to
  fault the drive?
 
  A stable and sane system never comes with two brains.  It is wrong to put
  this sort of logic into ZFS when ZFS is already depending on FMA to make the
  decisions and Solaris already has an infrastructure to handle faults.  The
  more appropriate solution is that this feature should be in FMA.
 
  Bob
  ==
  Bob Friesenhahn
  [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
  GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ESX integration

2008-11-25 Thread Ryan Arneson
Hi Ahmed

I'm part of the team that is working on such integration and snapshot 
integration (and SRM) is definitely on the roadmap.

Right now, there is nothing official, but as other have mentioned, some 
simple scripting wouldn't be too hard.

I like to use the Remote Command Line appliance and runs my scripts from 
there. Makes it easy to have one location to quiesce the VMs, run the 
ssh script to snap the 7000 (AmberRoad) array and then return the VMs to 
full operation.

Stay tuned for lots of work in this area.

-ryan

Ahmed Kamal wrote:
 Hi,
 Not sure if this is the best place to ask, but do Sun's new Amber road 
 storage boxes have any kind of integration with ESX? Most importantly, 
 quiescing the VMs, before snapshotting the zvols, and/or some level of 
 management integration thru either the web UI or ESX's console ? If 
 there's nothing official, did anyone hack any scripts for that?

 Regards
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Richard Morris - Sun Microsystems - Burlington United States

On 11/23/08 12:14, Paweł Tęcza wrote:

As others here have said, just issue 'zfs list -t snapshot' if you
just want to see the snapshots, or 'zfs list -t all' to see both
filesystems and snapshots.


OK, I can use that, but my dreamed `zfs list` syntax is like below:

zfs list [all|snapshots]

zfs list:   displays all filesystems and snapshots too only if
listsnaps=on
zfs list all:   displays all filesystems and snapshots too even if
listsnaps=off
zfs list snapshots: displays all snapshots, without filesystems

Do you agree with me that it's simple and beautiful? ;)


Pawel,

With http://bugs.opensolaris.org/view_bug.do?bug_id=6734907 
zfs list -t all would be useful once snapshots are omitted by default,

the syntax of zfs list is very close to the one you have dreamed of:

zfs list [filesystem|volume|snapshot|all]

zfs list:   displays all filesystems and volumes
   displays snapshots only if listsnaps=on

zfs list -t filesystem  displays all filesystems
   does not display volumes or snapshots

zfs list -t volume  displays all volumes
   does not display filesystems or snapshots

zfs list -t snapshotdisplays all snapshots
   does not display filesystems or volumes

zfs list -t all displays all filesystems, volumes, and
   snapshots (even if listsnaps=off)

-- Rich




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Malachi de Ælfweald
I did a fresh install a week ago. Because of Time Slider / auto-snapshot
being installed, I have 15 pages of snapshots.

Malachi

On Sun, Nov 23, 2008 at 8:53 AM, Paweł Tęcza [EMAIL PROTECTED] wrote:

 Dnia 2008-11-23, nie o godzinie 13:41 +0530, Sanjeev Bagewadi pisze:


 Thank your very much for your feedback! What is a large number of
 snapshots? 100? 1000? 1? Do people really need so many snapshots?
 I think that if some user has a large number of snapshots, then it's not
 `zfs list` problem, because it's a user's problem.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ESX integration

2008-11-25 Thread Ross
Will this be for Sun's xVM Server as well as for ESX?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Paweł Tęcza
Dnia 2008-11-25, wto o godzinie 10:16 -0800, Malachi de Ælfweald pisze:
 I did a fresh install a week ago. Because of Time Slider /
 auto-snapshot being installed, I have 15 pages of snapshots.

Malachi,

You only wrote that you have a lot of snapshots. You didn't wrote
whether you really need all of them. I doubt it. So if you don't want to
litter your pool more, then the best thing is removing the most of your
snapshots.

Cheers,

Pawel


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Paweł Tęcza
Dnia 2008-11-25, wto o godzinie 13:11 -0500, Richard Morris - Sun
Microsystems - Burlington United States pisze:

 Pawel,
 
 With http://bugs.opensolaris.org/view_bug.do?bug_id=6734907 
 zfs list -t all would be useful once snapshots are omitted by default,
 the syntax of zfs list is very close to the one you have dreamed of:
 
 zfs list [filesystem|volume|snapshot|all]
 
 zfs list:   displays all filesystems and volumes
 displays snapshots only if listsnaps=on
 
 zfs list -t filesystem  displays all filesystems
 does not display volumes or snapshots
 
 zfs list -t volume  displays all volumes
 does not display filesystems or snapshots
 
 zfs list -t snapshotdisplays all snapshots
 does not display filesystems or volumes
 
 zfs list -t all displays all filesystems, volumes, and
 snapshots (even if listsnaps=off)

Hi Rich,

Thanks a lot for your feedback! I was thinking that `zfs list` thread
is already dead ;)

The syntax above is pretty nice for me, but IMHO the -t switch is
rather needless here :)

I also asked Sun people about your well-known backward compatibility,
but unfortunately nobody commented on it :(

My best regards,

Pawel




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Malachi de Ælfweald
I think you are missing the point. They are auto-generated due to having
Time Slider setup. It does auto-snapshots of the entire drive every hour. It
removes old ones when the drive reaches 80% utilization.

http://blogs.sun.com/erwann/entry/zfs_on_the_desktop_zfs

Hope that helps,
Malachi

On Tue, Nov 25, 2008 at 1:24 PM, Paweł Tęcza [EMAIL PROTECTED] wrote:

 Dnia 2008-11-25, wto o godzinie 10:16 -0800, Malachi de Ælfweald pisze:
  I did a fresh install a week ago. Because of Time Slider /
  auto-snapshot being installed, I have 15 pages of snapshots.

 Malachi,

 You only wrote that you have a lot of snapshots. You didn't wrote
 whether you really need all of them. I doubt it. So if you don't want to
 litter your pool more, then the best thing is removing the most of your
 snapshots.

 Cheers,

 Pawel


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Paweł Tęcza
Dnia 2008-11-25, wto o godzinie 13:46 -0800, Malachi de Ælfweald pisze:
 I think you are missing the point. They are auto-generated due to
 having Time Slider setup. It does auto-snapshots of the entire drive
 every hour. It removes old ones when the drive reaches 80%
 utilization.

 http://blogs.sun.com/erwann/entry/zfs_on_the_desktop_zfs

Thanks a lot for the link! That blog entry is really very useful.

Well, I've havent used Time Slider yet. But I can see at the screenshot
that you can decrease % of file system capacity. Default 80% is too
big value for me.

Also I'm very curious whether I can configure Time Slider to taking
backup every 2 or 4 or 8 hours, for example. Maybe is it advanced
option? Unfortunately I can't check it now, because I'm writing on
Ubuntu box :P

Good night,

Pawel


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Paweł Tęcza
Dnia 2008-11-25, wto o godzinie 23:16 +0100, Paweł Tęcza pisze:

 Also I'm very curious whether I can configure Time Slider to taking
 backup every 2 or 4 or 8 hours, for example.

Or set the max number of snapshots?

Pawel


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] 'zeroing out' unused blocks on a ZFS?

2008-11-25 Thread Dave Brown
I have RTFM'd through this list and a number of Sun docs at docs.sun 
and can't find any information on how I might be able to write out 'hard 
zeros' to the unused blocks on a ZFS.  The reason I'd like to do this is 
because if the storage (LUN/s) I'm providing to the ZFS is 
thin-provisioned and doesn't know about a host O.S. file system and 
whether a previously written disk block still has data on it, only 
knowing it doesn't if all zero's are written, then how would I go about 
doing that with ZFS?  I was looking at the command mkfile, which looked 
like it might do it, but I wasn't sure.  Has anyone done this before and 
can provide the instructions?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Richard Elling
Paweł Tęcza wrote:
 Dnia 2008-11-25, wto o godzinie 23:16 +0100, Paweł Tęcza pisze:

   
 Also I'm very curious whether I can configure Time Slider to taking
 backup every 2 or 4 or 8 hours, for example.
 

 Or set the max number of snapshots?
   

UTSL
http://src.opensolaris.org/source/xref/jds/zfs-snapshot/src/

The service offers a way to manage cronjobs, but you can manage them
in other ways, too.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zeroing out' unused blocks on a ZFS?

2008-11-25 Thread Tomas Ögren
On 25 November, 2008 - Dave Brown sent me these 0,8K bytes:

 I have RTFM'd through this list and a number of Sun docs at docs.sun 
 and can't find any information on how I might be able to write out 'hard 
 zeros' to the unused blocks on a ZFS.  The reason I'd like to do this is 
 because if the storage (LUN/s) I'm providing to the ZFS is 
 thin-provisioned and doesn't know about a host O.S. file system and 
 whether a previously written disk block still has data on it, only 
 knowing it doesn't if all zero's are written, then how would I go about 
 doing that with ZFS?  I was looking at the command mkfile, which looked 
 like it might do it, but I wasn't sure.  Has anyone done this before and 
 can provide the instructions?

try turning compression off, then create a huge file (all free space)
with mkfile.. Not sure about the exact bit storage with regards to
checksum etc.. you might want to try with checksum off as well..

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS ACL/ACE issues with Samba - Access Denied

2008-11-25 Thread Eric Hill
Solaris 10u4 x64 using included Samba 3.0.28

Samba is AD integrated, and I have a share configured as follows:
[crlib1]
comment = Creative Lib1
path = /pool/creative/lib1
read only = No
vfs objects = zfsacl
acl check permissions = No
unix extensions = No
inherit permissions = Yes
map acl inherit = Yes

I have set both aclmode and aclinherit to be passthrough for the LIB1 
filesystem:
pool/creative/lib1  aclmodepassthroughlocal
pool/creative/lib1  aclinherit passthroughlocal

I have a user, Tom.  Tom is a member of Editors.  Another test user Sue is 
a member of Readers.  Both users are members of other groups as well.  I 
configured the permissions on LIB1 for 777, and created a test subfolder that I 
have applied permissions through Windows XP.  Windows complained about 
reordering the permissions when I first set them, and now doesn't complain when 
opening the security tab, so I assume they're ordered correctly.

[EMAIL PROTECTED]:/pool/creative/lib1# ls -dV Test/
d-+  2 eric  domain users   4 Nov 25 21:36 Test/
group:editors:rwxpd-aARWc--s:fd:allow
group:readers:r-x---a-R-c--s:fd:allow
group:domain admins:rwxpdDaARWcCos:fd:allow
  user:eric:rwxpd-aARWc--s:fd:allow
[EMAIL PROTECTED]:/pool/creative/lib1# 

The server can see the group (group ID 15130) and can verify the user in AD is 
a member of the group:

[EMAIL PROTECTED]:/pool/creative/lib1# wbinfo --group-info=editors
editors:x:15130
[EMAIL PROTECTED]:/pool/creative/lib1# wbinfo -r tom
15129
15018
15130
15166
15200
15127
15132
15027
15010
15120
15004
15041
15082
15133
15202
15001
[EMAIL PROTECTED]:/pool/creative/lib1# 

My problem is that Tom is a member of Editors, but getting an Access Denied 
message while trying to put a file into the Test folder.

The samba log for the client shows the following trace:

[2008/11/25 22:42:18, 3] smbd/process.c:(1068)   Transaction 966323 of length 
1604
[2008/11/25 22:42:18, 3] smbd/process.c:(926)   switch message SMBsesssetupX 
(pid 7616) conn 0x0
[2008/11/25 22:42:18, 3] smbd/sec_ctx.c:(241)   setting sec ctx (0, 0) - 
sec_ctx_stack_ndx = 0
[2008/11/25 22:42:18, 3] smbd/sesssetup.c:(1244)   wct=12 flg2=0xc807
[2008/11/25 22:42:18, 3] smbd/sesssetup.c:(1029)   Doing spnego session setup
[2008/11/25 22:42:18, 3] smbd/sesssetup.c:(1060)   NativeOS=[] NativeLanMan=[] 
PrimaryDomain=[]
[2008/11/25 22:42:18, 3] smbd/sesssetup.c:(697)   reply_spnego_negotiate: Got 
secblob of size 1471
[2008/11/25 22:42:18, 3] libads/kerberos_verify.c:(469)   ads_verify_ticket: 
did not retrieve auth data. continuing without PAC
[2008/11/25 22:42:18, 3] smbd/sesssetup.c:(321)   Ticket name is [EMAIL 
PROTECTED]
[2008/11/25 22:42:18, 4] lib/substitute.c:(407)   Home server: vault
[2008/11/25 22:42:18, 4] lib/substitute.c:(407)   Home server: vault
[2008/11/25 22:42:18, 3] smbd/sec_ctx.c:(208)   push_sec_ctx(0, 0) : 
sec_ctx_stack_ndx = 1
[2008/11/25 22:42:18, 3] smbd/uid.c:(358)   push_conn_ctx(0) : 
conn_ctx_stack_ndx = 0
[2008/11/25 22:42:18, 3] smbd/sec_ctx.c:(241)   setting sec ctx (0, 0) - 
sec_ctx_stack_ndx = 1
[2008/11/25 22:42:18, 3] smbd/sec_ctx.c:(356)   pop_sec_ctx (0, 0) - 
sec_ctx_stack_ndx = 0
[2008/11/25 22:42:18, 3] passdb/lookup_sid.c:(1069)   fetch sid from gid cache 
15004 - S-1-5-21-1409556225-1798326808-5522801-513
[2008/11/25 22:42:18, 3] passdb/lookup_sid.c:(1089)   fetch gid from cache 
15000 - S-1-5-32-544
[2008/11/25 22:42:18, 3] passdb/lookup_sid.c:(1089)   fetch gid from cache 
15001 - S-1-5-32-545
[2008/11/25 22:42:18, 3] smbd/sec_ctx.c:(208)   push_sec_ctx(0, 0) : 
sec_ctx_stack_ndx = 1
[2008/11/25 22:42:18, 3] smbd/uid.c:(358)   push_conn_ctx(0) : 
conn_ctx_stack_ndx = 0
[2008/11/25 22:42:18, 3] smbd/sec_ctx.c:(241)   setting sec ctx (0, 0) - 
sec_ctx_stack_ndx = 1
[2008/11/25 22:42:18, 3] smbd/sec_ctx.c:(356)   pop_sec_ctx (0, 0) - 
sec_ctx_stack_ndx = 0
[2008/11/25 22:42:18, 3] lib/privileges.c:(261)   get_privileges: No privileges 
assigned to SID [S-1-5-21-2469361529-1303801020-868054103-32338]
[2008/11/25 22:42:18, 3] lib/privileges.c:(261)   get_privileges: No privileges 
assigned to SID [S-1-5-21-1409556225-1798326808-5522801-513]
[2008/11/25 22:42:18, 3] lib/privileges.c:(261)   get_privileges: No privileges 
assigned to SID [S-1-5-2]
[2008/11/25 22:42:18, 3] lib/privileges.c:(261)   get_privileges: No privileges 
assigned to SID [S-1-5-11]
[2008/11/25 22:42:18, 3] lib/privileges.c:(261)   get_privileges: No privileges 
assigned to SID [S-1-5-21-1409556225-1798326808-5522801-5503]
[2008/11/25 22:42:18, 3] lib/privileges.c:(261)   get_privileges: No privileges 
assigned to SID [S-1-5-32-545]
[2008/11/25 22:42:18, 3] passdb/lookup_sid.c:(1089)   fetch gid from cache 
15004 - S-1-5-21-1409556225-1798326808-5522801-513
[2008/11/25 22:42:18, 3] 

Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Tim Foster
Paweł Tęcza wrote:
 Dnia 2008-11-25, wto o godzinie 23:16 +0100, Paweł Tęcza pisze:
 
 Also I'm very curious whether I can configure Time Slider to taking
 backup every 2 or 4 or 8 hours, for example.
 
 Or set the max number of snapshots?

Yes you can (though not in the time-slider gui yet).  Have a read of
http://src.opensolaris.org/source/xref/jds/zfs-snapshot/README.zfs-auto-snapshot.txt

In particular, look for the zfs/keep setting to configure the maximum 
number of snapshots you'd like each instance to keep, and zfs/period 
to set how many intervals you want to wait between snapshots.

cheers,
tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can a zpool cachefile be copied between systems?

2008-11-25 Thread Chris Siebenmann
 Suppose that you have a SAN environment with a lot of LUNs. In the
normal course of events this means that 'zpool import' is very slow,
because it has to probe all of the LUNs all of the time.

 In S10U6, the theoretical 'obvious' way to get around this for your
SAN filesystems seems to be to use a non-default cachefile (likely one
cachefile per virtual fileserver, although you could go all the way to
one cachefile per pool) and then copy this cachefile from the master
host to all of your other hosts. When you need to rapidly bring up a
virtual fileserver on a non-default host, you can just run
zpool import -c /where/ever/host.cache -a

 However, the S10U6 zpool documentation doesn't say if zpool cachefiles
can be copied between systems and used like this. Does anyone know if
this is a guaranteed property that is sure to keep working, something
that works right now but there's no guarantees that it will keep working
in future versions of Solaris and patches, or something that doesn't
work reliably in general?

(I have done basic tests with my S10U6 test machine, and it seems to
work ... but I might easily be missing something that makes it not
reliable.)

- cks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Richard Morris - Sun Microsystems - Burlington United States

On 11/25/08 16:41, Paweł Tęcza wrote:

Dnia 2008-11-25, wto o godzinie 13:11 -0500, Richard Morris - Sun
Microsystems - Burlington United States pisze:


Pawel,

With http://bugs.opensolaris.org/view_bug.do?bug_id=6734907 
zfs list -t all would be useful once snapshots are omitted by default,

the syntax of zfs list is very close to the one you have dreamed of:

zfs list -t [filesystem|volume|snapshot|all]

zfs listdisplays all filesystems and volumes
displays snapshots only if listsnaps=on

zfs list -t filesystem  displays all filesystems
does not display volumes or snapshots

zfs list -t volume  displays all volumes
does not display filesystems or snapshots

zfs list -t snapshotdisplays all snapshots
does not display filesystems or volumes

zfs list -t all displays all filesystems, volumes, and
snapshots (even if listsnaps=off)


Hi Rich,

Thanks a lot for your feedback! I was thinking that `zfs list` thread
is already dead ;)

The syntax above is pretty nice for me, but IMHO the -t switch is
rather needless here :)

I also asked Sun people about your well-known backward compatibility,
but unfortunately nobody commented on it :(

Pawel,

The fix for 6734907 did not add the -t option to zfs list.  That option 
already existed so there's no issue with backward compatibility.  
Listing all datasets could be done by a zfs list -t 
filesystem,volume,snapshot which produced the same output as zfs list.  
Now that snapshots are not displayed by default (unless listsnaps=on) 
there's some benefit in having a shorter option to list all datasets.  
So 6734907 added -t all which produces the same output as -t 
filesystem,volume,snapshot.


-- Rich


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-25 Thread Jens Elkner
On Tue, Nov 25, 2008 at 06:34:47PM -0500, Richard Morris - Sun Microsystems - 
Burlington United States wrote:

option to list all datasets.  So 6734907 added -t all which produces the
same output as -t filesystem,volume,snapshot.
1. http://bugs.opensolaris.org/view_bug.do?bug_id=6734907

Hmmm - very strange, when I run 'zfs list -t all' on b101 it says:

invalid type 'all'
...

But the bug report says:
Fixed In  snv_99
Release Fixed solaris_nevada(snv_99)

So, what do those fields really mean?

Regards,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss