Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-24 Thread Roch - PAE

  >
  > This should work. It shouldn't even lose the in-flight transactions.
  > ZFS reverts to using the main pool if a slog write fails or the
  > slog fills up.

  So, the only way to lose transactions would be a crash or power loss,
  leaving outstanding transactions in the log, followed by the log
  device failing to start up on reboot?  I assume that that would that
  be handled relatively cleanly (files have out of data data), as
  opposed to something nasty like the pool fails to start up.


It's   just data loss   from zpool perspective. However it's
data  loss from application  commited  data. So applications
that relied on commiting data for their own consistency might
end up with a corrupted view of the  world. NFS clients fall
in this bin.

Mirroring the  NVRAM cards in  the Separate Intent log seems
like a very good idea. 

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-22 Thread Will Murnane
On 10/22/07, Scott Laird <[EMAIL PROTECTED]> wrote:
> Oct 20 12:50:54 fs2 ahci: [ID 632458 kern.warning] WARNING:
> ahci_port_reset: port 1 the device hardware has been initialized and
> the power-up diagnostics failed
IIRC, Gigabyte cheaped out and didn't implement SMART.  Thus,
everything that tries to keep track of this marks the drives as
failed.

> I'm sending it back.
Good plan.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-22 Thread Scott Laird
On 10/18/07, Neil Perrin <[EMAIL PROTECTED]> wrote:
> >
> > The umem one is unavailable, but the Gigabyte model is easy to find.
> > I had Amazon overnight one to me, it's probably sitting at home right
> > now.
>
> Cool let us know how it goes.

Not so well.  I was completely unable to get the card to work at all.
The motherboard's BIOS wouldn't even list the GC-RAMDISK during the
bus scan.  Solaris saw it, but couldn't talk to it:

Oct 20 12:50:54 fs2 ahci: [ID 632458 kern.warning] WARNING:
ahci_port_reset: port 1 the device hardware has been initialized and
the power-up diagnostics failed

The Supermicro 8-port SATA card's BIOS saw it, but Solaris reported
errors at boot time:

Oct 20 12:06:00 fs2 marvell88sx: [ID 748163 kern.warning] WARNING:
marvell88sx0: device on port 5 still busy after reset

I tried using it with the motherboard's Marvell-based eSATA ports, but
that made the POST hang for a minute or two and Solaris spewed errors
all over the console after boot.

I'm sending it back.


Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-18 Thread Neil Perrin


Scott Laird wrote:
> On 10/18/07, Neil Perrin <[EMAIL PROTECTED]> wrote:
>>> So, the only way to lose transactions would be a crash or power loss,
>>> leaving outstanding transactions in the log, followed by the log
>>> device failing to start up on reboot?  I assume that that would that
>>> be handled relatively cleanly (files have out of data data), as
>>> opposed to something nasty like the pool fails to start up.
>> I just checked on the behaviour of this. The log is treated as part
>> of the main pool. If it is not replicated and disappears then the pool
>> can't be opened - just like any unreplicated device in the main pool.
>> If the slog is found but can't be opened or is corrupted then then the
>> pool will be opened but the slog isn't used.
>> This seems a bit inconsistent.
> 
> Hmm, yeah.  What would happen if I mirrored the ramdisk with a hard
> drive?  Would ZFS block until the data's stable on both devices, or
> would it continue once the write is complete on the ramdisk?

ZFS ensures all mirror sides have the data before returning.

>
> Failing that, would replacing the missing log with a blank device let
> me bring the pool back up, or would it be dead at that point?

Replacing the device would work:

: mull ; mkfile 100m /p1 /p2
: mull ; zpool create whirl /p1 log /p2
: mull ; echo abc > /whirl/f
: mull ; sync
: mull ; rm /p2
: mull ; sync

: mull ; zpool status
  pool: whirl
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
whirl   UNAVAIL  0 0 0  insufficient replicas
  /p1   ONLINE   0 0 0
logsUNAVAIL  0 0 0  insufficient replicas
  /p2   UNAVAIL  0 0 0  cannot open
: mull ; mkfile 100m /p2 /p3
: mull ; zpool online whirl /p2
warning: device '/p2' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
: mull ; zpool status
  pool: whirl
 state: ONLINE
status: One or more devices could not be used because the label is missing or
invalid.  Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
whirl   ONLINE   0 0 0
  /p1   ONLINE   0 0 0
logsONLINE   0 0 0
  /p2   UNAVAIL  0 0 0  corrupted data

errors: No known data errors
: mull ; zpool replace whirl /p2 /p3
: mull ; zpool status
  pool: whirl
 state: ONLINE
 scrub: resilver completed with 0 errors on Thu Oct 18 18:16:39 2007
config:

NAME STATE READ WRITE CKSUM
whirlONLINE   0 0 0
  /p1ONLINE   0 0 0
logs ONLINE   0 0 0
  replacing  ONLINE   0 0 0
/p2  UNAVAIL  0 0 0  corrupted data
/p3  ONLINE   0 0 0

errors: No known data errors
: mull ; zpool status
  pool: whirl
 state: ONLINE
 scrub: resilver completed with 0 errors on Thu Oct 18 18:16:39 2007
config:

NAMESTATE READ WRITE CKSUM
whirl   ONLINE   0 0 0
  /p1   ONLINE   0 0 0
logsONLINE   0 0 0
  /p3   ONLINE   0 0 0

errors: No known data errors
: mull ; zfs mount
: mull ; zfs mount -a
: mull ; cat /whirl/f
abc
: mull ;

> 
> 3.  What about corruption in the log?  Is it checksummed like the rest of 
> ZFS?
 Yes it's checksummed, but the checksumming is a bit different
 from the pool blocks in the uberblock tree.

 See also:
 http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
>>> That started this whole mess :-).  I'd like to try out using one of
>>> the Gigabyte SATA ramdisk cards that are discussed in the comments.
>> A while ago there was a comment on this alias that these cards
>> weren't purchasable. Unfortunately, I don't know what is available.
> 
> The umem one is unavailable, but the Gigabyte model is easy to find.
> I had Amazon overnight one to me, it's probably sitting at home right
> now.

Cool let us know how it goes.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-18 Thread Eric Schrock
On Thu, Oct 18, 2007 at 02:29:27PM -0600, Neil Perrin wrote:
> 
> > So, the only way to lose transactions would be a crash or power loss,
> > leaving outstanding transactions in the log, followed by the log
> > device failing to start up on reboot?  I assume that that would that
> > be handled relatively cleanly (files have out of data data), as
> > opposed to something nasty like the pool fails to start up.
> 
> I just checked on the behaviour of this. The log is treated as part
> of the main pool. If it is not replicated and disappears then the pool
> can't be opened - just like any unreplicated device in the main pool.
> If the slog is found but can't be opened or is corrupted then then the
> pool will be opened but the slog isn't used.
> This seems a bit inconsistent.
> 

It's worth noting that this is a generic problem.  In the world of
metadata replication (ditto blocks), even an unreplicated normal device
does not necessarily render a pool completely faulted.  The code needs
to be modified across the board so that the root vdev never ends up in
the FAULTED state, and then pool health is based solely on the ability
to read some basic piece of information, such as a successful
dsl_pool_open().  From the looks of things, this will "just work" if we
get rid of the too_many_errors() call and associated code in
vdev_root.c, but I'm sure there would be some odd edge conditions.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-18 Thread Scott Laird
On 10/18/07, Neil Perrin <[EMAIL PROTECTED]> wrote:
> > So, the only way to lose transactions would be a crash or power loss,
> > leaving outstanding transactions in the log, followed by the log
> > device failing to start up on reboot?  I assume that that would that
> > be handled relatively cleanly (files have out of data data), as
> > opposed to something nasty like the pool fails to start up.
>
> I just checked on the behaviour of this. The log is treated as part
> of the main pool. If it is not replicated and disappears then the pool
> can't be opened - just like any unreplicated device in the main pool.
> If the slog is found but can't be opened or is corrupted then then the
> pool will be opened but the slog isn't used.
> This seems a bit inconsistent.

Hmm, yeah.  What would happen if I mirrored the ramdisk with a hard
drive?  Would ZFS block until the data's stable on both devices, or
would it continue once the write is complete on the ramdisk?

Failing that, would replacing the missing log with a blank device let
me bring the pool back up, or would it be dead at that point?

> >>> 3.  What about corruption in the log?  Is it checksummed like the rest of 
> >>> ZFS?
> >> Yes it's checksummed, but the checksumming is a bit different
> >> from the pool blocks in the uberblock tree.
> >>
> >> See also:
> >> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
> >
> > That started this whole mess :-).  I'd like to try out using one of
> > the Gigabyte SATA ramdisk cards that are discussed in the comments.
>
> A while ago there was a comment on this alias that these cards
> weren't purchasable. Unfortunately, I don't know what is available.

The umem one is unavailable, but the Gigabyte model is easy to find.
I had Amazon overnight one to me, it's probably sitting at home right
now.


Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-18 Thread Neil Perrin


Scott Laird wrote:
> On 10/18/07, Neil Perrin <[EMAIL PROTECTED]> wrote:
>>
>> Scott Laird wrote:
>>> I'm debating using an external intent log on a new box that I'm about
>>> to start working on, and I have a few questions.
>>>
>>> 1.  If I use an external log initially and decide that it was a
>>> mistake, is there a way to move back to the internal log without
>>> rebuilding the entire pool?
>> It's not currently possible to remove a separate log.
>> This was working once, but was stripped out until the
>> more generic zpool remove devices was provided.
>> This is bug 6574286:
>>
>> http://bugs.opensolaris.org/view_bug.do?bug_id=6574286
> 
> Okay, so hopefully it'll work in a couple quarters?

It's not being worked on currently but hopefully will be fixed
in 6 months.
> 
>>> 2.  What happens if the logging device fails completely?  Does this
>>> damage anything else in the pool, other then potentially losing
>>> in-flight transactions?
>> This should work. It shouldn't even lose the in-flight transactions.
>> ZFS reverts to using the main pool if a slog write fails or the
>> slog fills up.
> 
> So, the only way to lose transactions would be a crash or power loss,
> leaving outstanding transactions in the log, followed by the log
> device failing to start up on reboot?  I assume that that would that
> be handled relatively cleanly (files have out of data data), as
> opposed to something nasty like the pool fails to start up.

I just checked on the behaviour of this. The log is treated as part
of the main pool. If it is not replicated and disappears then the pool
can't be opened - just like any unreplicated device in the main pool.
If the slog is found but can't be opened or is corrupted then then the
pool will be opened but the slog isn't used.
This seems a bit inconsistent.

> 
>>> 3.  What about corruption in the log?  Is it checksummed like the rest of 
>>> ZFS?
>> Yes it's checksummed, but the checksumming is a bit different
>> from the pool blocks in the uberblock tree.
>>
>> See also:
>> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
> 
> That started this whole mess :-).  I'd like to try out using one of
> the Gigabyte SATA ramdisk cards that are discussed in the comments.

A while ago there was a comment on this alias that these cards
weren't purchasable. Unfortunately, I don't know what is available.

> It supposedly has 18 hours of battery life, so a long-term power
> outage would kill the log.  I could reasonably expect one 18+ hour
> power outage over the life of the filesystem.  I'm fine with losing
> in-flight data (I'd expect the log to be replayed before the UPS shuts
> the system down anyway), but I'd rather not lose the whole pool or
> something extreme like that.
> 
> I'm willing to trade the chance of some transaction losses during an
> exceptional event for more performance, but I'd rather not have to
> pull out the backups if I can ever avoid it.
> 
> 
> Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-18 Thread Scott Laird
On 10/18/07, Neil Perrin <[EMAIL PROTECTED]> wrote:
>
>
> Scott Laird wrote:
> > I'm debating using an external intent log on a new box that I'm about
> > to start working on, and I have a few questions.
> >
> > 1.  If I use an external log initially and decide that it was a
> > mistake, is there a way to move back to the internal log without
> > rebuilding the entire pool?
>
> It's not currently possible to remove a separate log.
> This was working once, but was stripped out until the
> more generic zpool remove devices was provided.
> This is bug 6574286:
>
> http://bugs.opensolaris.org/view_bug.do?bug_id=6574286

Okay, so hopefully it'll work in a couple quarters?

> > 2.  What happens if the logging device fails completely?  Does this
> > damage anything else in the pool, other then potentially losing
> > in-flight transactions?
>
> This should work. It shouldn't even lose the in-flight transactions.
> ZFS reverts to using the main pool if a slog write fails or the
> slog fills up.

So, the only way to lose transactions would be a crash or power loss,
leaving outstanding transactions in the log, followed by the log
device failing to start up on reboot?  I assume that that would that
be handled relatively cleanly (files have out of data data), as
opposed to something nasty like the pool fails to start up.

> > 3.  What about corruption in the log?  Is it checksummed like the rest of 
> > ZFS?
>
> Yes it's checksummed, but the checksumming is a bit different
> from the pool blocks in the uberblock tree.
>
> See also:
> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

That started this whole mess :-).  I'd like to try out using one of
the Gigabyte SATA ramdisk cards that are discussed in the comments.
It supposedly has 18 hours of battery life, so a long-term power
outage would kill the log.  I could reasonably expect one 18+ hour
power outage over the life of the filesystem.  I'm fine with losing
in-flight data (I'd expect the log to be replayed before the UPS shuts
the system down anyway), but I'd rather not lose the whole pool or
something extreme like that.

I'm willing to trade the chance of some transaction losses during an
exceptional event for more performance, but I'd rather not have to
pull out the backups if I can ever avoid it.


Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL reliability/replication questions

2007-10-18 Thread Neil Perrin


Scott Laird wrote:
> I'm debating using an external intent log on a new box that I'm about
> to start working on, and I have a few questions.
> 
> 1.  If I use an external log initially and decide that it was a
> mistake, is there a way to move back to the internal log without
> rebuilding the entire pool?

It's not currently possible to remove a separate log.
This was working once, but was stripped out until the
more generic zpool remove devices was provided.
This is bug 6574286:

http://bugs.opensolaris.org/view_bug.do?bug_id=6574286 

> 2.  What happens if the logging device fails completely?  Does this
> damage anything else in the pool, other then potentially losing
> in-flight transactions?

This should work. It shouldn't even lose the in-flight transactions.
ZFS reverts to using the main pool if a slog write fails or the
slog fills up.

> 3.  What about corruption in the log?  Is it checksummed like the rest of ZFS?

Yes it's checksummed, but the checksumming is a bit different
from the pool blocks in the uberblock tree.

See also:
http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

> 
> Thanks.
> 
> 
> Scott
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZIL reliability/replication questions

2007-10-18 Thread Scott Laird
I'm debating using an external intent log on a new box that I'm about
to start working on, and I have a few questions.

1.  If I use an external log initially and decide that it was a
mistake, is there a way to move back to the internal log without
rebuilding the entire pool?
2.  What happens if the logging device fails completely?  Does this
damage anything else in the pool, other then potentially losing
in-flight transactions?
3.  What about corruption in the log?  Is it checksummed like the rest of ZFS?

Thanks.


Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss