Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-24 Thread Robert Milkowski
Hello Joe,

Monday, February 23, 2009, 7:23:39 PM, you wrote:

MJ> Mario Goebbels wrote:
>> One thing I'd like to see is an _easy_ option to fall back onto older
>> uberblocks when the zpool went belly up for a silly reason. Something
>> that doesn't involve esoteric parameters supplied to zdb.

MJ> Between uberblock updates, there may be many write operations to
MJ> a data file, each requiring a copy on write operation.  Some of
MJ> those operations may reuse blocks that were metadata blocks
MJ> pointed to by the previous uberblock.

MJ> In which case the old uberblock points to a metadata tree full of garbage.

MJ> Jeff, you must have some idea on how to overcome this in your bugfix, would 
you care to share?

As was suggested on the list before ZFS could keep a list of freed
blocks for last N txgs and if there are still other blocks to be used
it would not allocated those from the last N transactions.


-- 
Best regards,
 Robert Milkowski
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-23 Thread Moore, Joe
Mario Goebbels wrote:
> One thing I'd like to see is an _easy_ option to fall back onto older
> uberblocks when the zpool went belly up for a silly reason. Something
> that doesn't involve esoteric parameters supplied to zdb.

Between uberblock updates, there may be many write operations to a data file, 
each requiring a copy on write operation.  Some of those operations may reuse 
blocks that were metadata blocks pointed to by the previous uberblock.

In which case the old uberblock points to a metadata tree full of garbage.

Jeff, you must have some idea on how to overcome this in your bugfix, would you 
care to share?

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-18 Thread Tomasz Torcz
On Fri, Feb 13, 2009 at 9:47 PM, Richard Elling
 wrote:
> It has been my experience that USB sticks use FAT, which is an ancient
> file system which contains few of the features you expect from modern
> file systems. As such, it really doesn't do any write caching. Hence, it
> seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs,
> nor many of the other, high performance file systems are used by default
> for USB devices. Could it be that anyone not using FAT for USB devices
> is straining against architectural limits?

  There are no archtiectural limits. USB sticks can be used with whatever
you throw at them. On sticks I use to interchange data with Windows machines
I have NTFS, on others differente filesystems: ZFS, ext4, btrfs, often encrypted
on block level.
   USB sticks are generally very simple -- no discard commands and
other fancy stuff,
but overall they are block devices just like discs, arrays, SSDs...

-- 
Tomasz Torcz
xmpp: zdzich...@chrome.pl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-14 Thread Ross Smith
Hey guys,

I'll let this die in a sec, but I just wanted to say that I've gone
and read the on disk document again this morning, and to be honest
Richard, without the description you just wrote, I really wouldn't
have known that uberblocks are in a 128 entry circular queue that's 4x
redundant.

Please understand that I'm not asking for answers to these notes, this
post is purely to illustrate to you ZFS guys that much as I appreciate
having the ZFS docs available, they are very tough going for anybody
who isn't a ZFS developer.  I consider myself well above average in IT
ability, and I've really spent quite a lot of time in the past year
reading around ZFS, but even so I would definitely have come to the
wrong conclusion regarding uberblocks.

Richard's post I can understand really easily, but in the on disk
format docs, that information is spread over 7 pages of really quite
technical detail, and to be honest, for a user like myself raises as
many questions as it answers:

On page 6 I learn that labels are stored on each vdev, as well as each
disk.  So there will be a label on the pool, mirror (or raid group),
and disk.  I know the disk ones are at the start and end of the disk,
and it sounds like the mirror vdev is in the same place, but where is
the root vdev label?  The example given doesn't mention its location
at all.

Then, on page 7 it sounds like the entire label is overwriten whenever
on-disk data is updated - "any time on-disk data is overwritten, there
is potential for error".  To me, it sounds like it's not a 128 entry
queue, but just a group of 4 labels, all of which are overwritten as
data goes to disk.

Then finally, on page 12 the uberblock is mentioned (although as an
aside, the first time I read these docs I had no idea what the
uberblock actually was).  It does say that only one uberblock is
active at a time, but with it being part of the label I'd just assume
these were overwritten as a group..

And that's why I'll often throw ideas out - I can either rely on my
own limited knowledge of ZFS to say if it will work, or I can take
advantage of the excellent community we have here, and post the idea
for all to see.  It's a quick way for good ideas to be improved upon,
and bad ideas consigned to the bin.  I've done it before in my rather
lengthly 'zfs availability' thread.  My thoughts there were thrashed
out nicely, with some quite superb additions (namely the concept of
lop sided mirrors which I think are a great idea).

Ross

PS.  I've also found why I thought you had to search for these blocks,
it was after reading this thread where somebody used mdb to search a
corrupt pool to try to recover data:
http://opensolaris.org/jive/message.jspa?messageID=318009







On Fri, Feb 13, 2009 at 11:09 PM, Richard Elling
 wrote:
> Tim wrote:
>>
>>
>> On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn
>> mailto:bfrie...@simple.dallas.tx.us>> wrote:
>>
>>On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>>However, I've just had another idea.  Since the uberblocks are
>>pretty
>>vital in recovering a pool, and I believe it's a fair bit of
>>work to
>>search the disk to find them.  Might it be a good idea to
>>allow ZFS to
>>store uberblock locations elsewhere for recovery purposes?
>>
>>
>>Perhaps it is best to leave decisions on these issues to the ZFS
>>designers who know how things work.
>>
>>Previous descriptions from people who do know how things work
>>didn't make it sound very difficult to find the last 20
>>uberblocks.  It sounded like they were at known points for any
>>given pool.
>>
>>Those folks have surely tired of this discussion by now and are
>>working on actual code rather than reading idle discussion between
>>several people who don't know the details of how things work.
>>
>>
>>
>> People who "don't know how things work" often aren't tied down by the
>> baggage of knowing how things work.  Which leads to creative solutions those
>> who are weighed down didn't think of.  I don't think it hurts in the least
>> to throw out some ideas.  If they aren't valid, it's not hard to ignore them
>> and move on.  It surely isn't a waste of anyone's time to spend 5 minutes
>> reading a response and weighing if the idea is valid or not.
>
> OTOH, anyone who followed this discussion the last few times, has looked
> at the on-disk format documents, or reviewed the source code would know
> that the uberblocks are kept in an 128-entry circular queue which is 4x
> redundant with 2 copies each at the beginning and end of the vdev.
> Other metadata, by default, is 2x redundant and spatially diverse.
>
> Clearly, the failure mode being hashed out here has resulted in the defeat
> of those protections. The only real question is how fast Jeff can roll out
> the
> feature to allow reverting to previous uberblocks.  The procedure for doing
> this by hand has long been known, and was posted on this forum -- though
> it is te

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-14 Thread Bob Friesenhahn

On Fri, 13 Feb 2009, Frank Cusack wrote:


i'm sorry to berate you, as you do make very valuable contributions to
the discussion here, but i take offense at your attempts to limit
discussion simply because you know everything there is to know about
the subject.


The point is that those of us in the chattering class (i.e. people 
like you and me) clearly know very little about the subject, and 
continuting to chatter among ourselves is soon no longer rewarding.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread James C. McPherson

Hi Bob,

On Fri, 13 Feb 2009 19:58:51 -0600 (CST)
Bob Friesenhahn  wrote:

> On Fri, 13 Feb 2009, Tim wrote:
> 
> > I don't think it hurts in the least to throw out some ideas.  If 
> > they aren't valid, it's not hard to ignore them and move on.  It 
> > surely isn't a waste of anyone's time to spend 5 minutes reading a 
> > response and weighing if the idea is valid or not.
> 
> Today I sat down at 9:00 AM to read the new mail for the day and did 
> not catch up until five hours later.  Quite a lot of the reading was 
> this (now) useless discussion thread.  It is now useless since after 
> five hours of reading, there were no ideas expressed that had not
> been expressed before.

I've found this thread to be like watching a car accident, and
also really frustrating due to the inability to use search engines
on the part of many posters. 
 
> With this level of overhead, I am surprise that there is any
> remaining development motion on ZFS at all.

Good thing the ZFS developers have mail filters :-)


cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack
On February 13, 2009 7:58:51 PM -0600 Bob Friesenhahn 
 wrote:

With this level of overhead, I am surprise that there is any remaining
development motion on ZFS at all.


come on now.  with all due respect, you are attempting to stifle
relevant discussion and that is, well, bordering on ridiculous.

i sure have learned a lot from this thread.  now of course that is
meaningless because i don't and almost certainly never will contribute
to zfs, but i assume there are others who have learned from this thread.
that's definitely a good thing.

this thread also appears to be the impetus to change priorities on
zfs development.


Today I sat down at 9:00 AM to read the new mail for the day and did not
catch up until five hours later.  Quite a lot of the reading was this
(now) useless discussion thread.  It is now useless since after five
hours of reading, there were no ideas expressed that had not been
expressed before.


lastly, WOW!  if this thread is worthless to you, learn to use the
delete button.  especially if you read that slowly.  i know i certainly
couldn't keep up with all my incoming mail if i read everything.

i'm sorry to berate you, as you do make very valuable contributions to
the discussion here, but i take offense at your attempts to limit
discussion simply because you know everything there is to know about
the subject.

great, now i am guilty of being "overhead".

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn

On Fri, 13 Feb 2009, Tim wrote:

I don't think it hurts in the least to throw out some ideas.  If 
they aren't valid, it's not hard to ignore them and move on.  It 
surely isn't a waste of anyone's time to spend 5 minutes reading a 
response and weighing if the idea is valid or not.


Today I sat down at 9:00 AM to read the new mail for the day and did 
not catch up until five hours later.  Quite a lot of the reading was 
this (now) useless discussion thread.  It is now useless since after 
five hours of reading, there were no ideas expressed that had not been 
expressed before.


With this level of overhead, I am surprise that there is any remaining 
development motion on ZFS at all.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Richard Elling

Tim wrote:



On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn 
mailto:bfrie...@simple.dallas.tx.us>> 
wrote:


On Fri, 13 Feb 2009, Ross Smith wrote:

However, I've just had another idea.  Since the uberblocks are
pretty
vital in recovering a pool, and I believe it's a fair bit of
work to
search the disk to find them.  Might it be a good idea to
allow ZFS to
store uberblock locations elsewhere for recovery purposes?


Perhaps it is best to leave decisions on these issues to the ZFS
designers who know how things work.

Previous descriptions from people who do know how things work
didn't make it sound very difficult to find the last 20
uberblocks.  It sounded like they were at known points for any
given pool.

Those folks have surely tired of this discussion by now and are
working on actual code rather than reading idle discussion between
several people who don't know the details of how things work.



People who "don't know how things work" often aren't tied down by the 
baggage of knowing how things work.  Which leads to creative solutions 
those who are weighed down didn't think of.  I don't think it hurts in 
the least to throw out some ideas.  If they aren't valid, it's not 
hard to ignore them and move on.  It surely isn't a waste of anyone's 
time to spend 5 minutes reading a response and weighing if the idea is 
valid or not.


OTOH, anyone who followed this discussion the last few times, has looked
at the on-disk format documents, or reviewed the source code would know
that the uberblocks are kept in an 128-entry circular queue which is 4x
redundant with 2 copies each at the beginning and end of the vdev.
Other metadata, by default, is 2x redundant and spatially diverse.

Clearly, the failure mode being hashed out here has resulted in the defeat
of those protections. The only real question is how fast Jeff can roll 
out the

feature to allow reverting to previous uberblocks.  The procedure for doing
this by hand has long been known, and was posted on this forum -- though
it is tedious.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Tim
On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Fri, 13 Feb 2009, Ross Smith wrote:
>
>  However, I've just had another idea.  Since the uberblocks are pretty
>> vital in recovering a pool, and I believe it's a fair bit of work to
>> search the disk to find them.  Might it be a good idea to allow ZFS to
>> store uberblock locations elsewhere for recovery purposes?
>>
>
> Perhaps it is best to leave decisions on these issues to the ZFS designers
> who know how things work.
>
> Previous descriptions from people who do know how things work didn't make
> it sound very difficult to find the last 20 uberblocks.  It sounded like
> they were at known points for any given pool.
>
> Those folks have surely tired of this discussion by now and are working on
> actual code rather than reading idle discussion between several people who
> don't know the details of how things work.
>


People who "don't know how things work" often aren't tied down by the
baggage of knowing how things work.  Which leads to creative solutions those
who are weighed down didn't think of.  I don't think it hurts in the least
to throw out some ideas.  If they aren't valid, it's not hard to ignore them
and move on.  It surely isn't a waste of anyone's time to spend 5 minutes
reading a response and weighing if the idea is valid or not.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn

On Fri, 13 Feb 2009, Ross Smith wrote:


However, I've just had another idea.  Since the uberblocks are pretty
vital in recovering a pool, and I believe it's a fair bit of work to
search the disk to find them.  Might it be a good idea to allow ZFS to
store uberblock locations elsewhere for recovery purposes?


Perhaps it is best to leave decisions on these issues to the ZFS 
designers who know how things work.


Previous descriptions from people who do know how things work didn't 
make it sound very difficult to find the last 20 uberblocks.  It 
sounded like they were at known points for any given pool.


Those folks have surely tired of this discussion by now and are 
working on actual code rather than reading idle discussion between 
several people who don't know the details of how things work.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Greg Palmer

Richard Elling wrote:

Greg Palmer wrote:

Miles Nordin wrote:

gm> That implies that ZFS will have to detect removable devices
gm> and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior. The whole format vs rmformat mess is just
ridiculous. And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
Since this discussion is taking place in the context of someone 
removing a USB stick I think you're confusing the issue by dragging 
in other technologies. Let's keep this in the context of the posts 
preceding it which is how USB devices are treated. I would argue that 
one of the first design goals in an environment where you can expect 
people who are not computer professionals to be interfacing with 
computers is to make sure that the appropriate safeties are in place 
and that the system does not behave in a manner which a reasonable 
person might find unexpected.


It has been my experience that USB sticks use FAT, which is an ancient
file system which contains few of the features you expect from modern
file systems. As such, it really doesn't do any write caching. Hence, it
seems to work ok for casual users. I note that neither NTFS, ZFS, 
reiserfs,

nor many of the other, high performance file systems are used by default
for USB devices. Could it be that anyone not using FAT for USB devices
is straining against architectural limits?
-- richard
The default disabling of caching with Windows I mentioned is the same 
for either FAT or NTFS file systems. My personal guess would be that 
it's purely an effort to prevent software errors in the interface 
between the chair and keyboard. :-) I think a lot of users got trained 
in how to use a floppy disc and once they were trained, when they 
encountered the USB stick, they continued to treat it as an instance of 
the floppy class. This rubbed off on those around them. I can't tell you 
how many users have given me a blank stare and told me "But the light 
was out" when I saw them yank a USB stick out and mentioned it was a bad 
idea.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith
You don't, but that's why I was wondering about time limits.  You have
to have a cut off somewhere, but if you're checking the last few
minutes of uberblocks that really should cope with a lot.  It seems
like a simple enough thing to implement, and if a pool still gets
corrupted with these checks in place, you can absolutely, positively
blame it on the hardware.  :D

However, I've just had another idea.  Since the uberblocks are pretty
vital in recovering a pool, and I believe it's a fair bit of work to
search the disk to find them.  Might it be a good idea to allow ZFS to
store uberblock locations elsewhere for recovery purposes?

This could be as simple as a USB stick plugged into the server, a
separate drive, or a network server.  I guess even the ZIL device
would work if it's separate hardware.  But knowing the locations of
the uberblocks would save yet more time should recovery be needed.



On Fri, Feb 13, 2009 at 8:59 PM, Bob Friesenhahn
 wrote:
> On Fri, 13 Feb 2009, Ross Smith wrote:
>
>> Thinking about this a bit more, you've given me an idea:  Would it be
>> worth ZFS occasionally reading previous uberblocks from the pool, just
>> to check they are there and working ok?
>
> That sounds like a good idea.  However, how do you know for sure that the
> data returned is not returned from a volatile cache?  If the hardware is
> ignoring cache flush requests, then any data returned may be from a volatile
> cache.
>
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ian Collins

Richard Elling wrote:

Greg Palmer wrote:

Miles Nordin wrote:

gm> That implies that ZFS will have to detect removable devices
gm> and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior. The whole format vs rmformat mess is just
ridiculous. And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
Since this discussion is taking place in the context of someone 
removing a USB stick I think you're confusing the issue by dragging 
in other technologies. Let's keep this in the context of the posts 
preceding it which is how USB devices are treated. I would argue that 
one of the first design goals in an environment where you can expect 
people who are not computer professionals to be interfacing with 
computers is to make sure that the appropriate safeties are in place 
and that the system does not behave in a manner which a reasonable 
person might find unexpected.


It has been my experience that USB sticks use FAT, which is an ancient
file system which contains few of the features you expect from modern
file systems. As such, it really doesn't do any write caching. Hence, it
seems to work ok for casual users. I note that neither NTFS, ZFS, 
reiserfs,

nor many of the other, high performance file systems are used by default
for USB devices. Could it be that anyone not using FAT for USB devices
is straining against architectural limits?


I'd follow that up by saying that those of us who do use something other 
that FAT with USB devices have a reasonable understanding of the 
limitations of those devices.


Using ZFS is non-trivial from a typical user's perspective.  The device 
has to be identified and the pool created.  When a USB device is 
connected, the pool has to be manually imported before it can be used.  
Import/export could be fully integrated with gnome.  Once that is in 
place, using a ZFS formatted USB stick should be just as "safe" as a FAT 
formatted one.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Nicolas Williams
On Fri, Feb 13, 2009 at 02:00:28PM -0600, Nicolas Williams wrote:
> Ordering matters for atomic operations, and filesystems are full of
> those.

Also, note that ignoring barriers is effectively as bad as dropping
writes if there's any chance that some writes will never hit the disk
because of, say, power failures.  Imagine 100 txgs, but some writes from
the first txg never hitting the disk because the drive keeps them in the
cache without flushing them for too long, then you pull out the disk, or
power fails -- in that case not even fallback to older txgs will help
you, there'd be nothing that ZFS could do to help you.

Of course, presumably even with most lousy drives you'd still have to be
quite unlucky to lose writes written more than N txgs ago, for some
value of N.  But the point stands; what you lose will be a matter of
chance (and it could well be whole datasets) given the kinds of devices
we've been discussing.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn

On Fri, 13 Feb 2009, Ross Smith wrote:


Thinking about this a bit more, you've given me an idea:  Would it be
worth ZFS occasionally reading previous uberblocks from the pool, just
to check they are there and working ok?


That sounds like a good idea.  However, how do you know for sure that 
the data returned is not returned from a volatile cache?  If the 
hardware is ignoring cache flush requests, then any data returned may 
be from a volatile cache.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn

On Fri, 13 Feb 2009, Ross Smith wrote:


Also, that's a pretty extreme situation since you'd need a device that
is being written to but not read from to fail in this exact way.  It
also needs to have no scrubbing being run, so the problem has remained
undetected.


On systems with a lot of RAM, 100% write is a pretty common situation 
since reads are often against data which are already cached in RAM. 
This is common when doing bulk data copies from one device to another 
(e.g. a backup from an "internal" pool to a USB-based pool) since the 
necessary filesystem information for the destination filesystem can be 
cached in memory for quick access rather than going to disk.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith
On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn
 wrote:
> On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>> You have to consider that even with improperly working hardware, ZFS
>> has been checksumming data, so if that hardware has been working for
>> any length of time, you *know* that the data on it is good.
>
> You only know this if the data has previously been read.
>
> Assume that the device temporarily stops pysically writing, but otherwise
> responds normally to ZFS.  Then the device starts writing again (including a
> recent uberblock), but with a large gap in the writes.  Then the system
> loses power, or crashes.  What happens then?

Hey Bob,

Thinking about this a bit more, you've given me an idea:  Would it be
worth ZFS occasionally reading previous uberblocks from the pool, just
to check they are there and working ok?

I wonder if you could do this after a few uberblocks have been
written.  It would seem to be a good way of catching devices that
aren't writing correctly early on, as well as a way of guaranteeing
that previous uberblocks are available to roll back to should a write
go wrong.

I wonder what the upper limits for this kind of write failure is going
to be.  I've seen 30 second delays mentioned in this thread.  How
often are uberblocks written?  Is there any guarantee that we'll
always have more than 30 seconds worth of uberblocks on a drive?
Should ZFS be set so that it keeps either a given number of
uberblocks, or 5 minutes worth of uberblocks, whichever is the larger?

Ross
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Richard Elling

Greg Palmer wrote:

Miles Nordin wrote:

gm> That implies that ZFS will have to detect removable devices
gm> and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior. The whole format vs rmformat mess is just
ridiculous. And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
Since this discussion is taking place in the context of someone 
removing a USB stick I think you're confusing the issue by dragging in 
other technologies. Let's keep this in the context of the posts 
preceding it which is how USB devices are treated. I would argue that 
one of the first design goals in an environment where you can expect 
people who are not computer professionals to be interfacing with 
computers is to make sure that the appropriate safeties are in place 
and that the system does not behave in a manner which a reasonable 
person might find unexpected.


It has been my experience that USB sticks use FAT, which is an ancient
file system which contains few of the features you expect from modern
file systems. As such, it really doesn't do any write caching. Hence, it
seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs,
nor many of the other, high performance file systems are used by default
for USB devices. Could it be that anyone not using FAT for USB devices
is straining against architectural limits?
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith
On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn
 wrote:
> On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>> You have to consider that even with improperly working hardware, ZFS
>> has been checksumming data, so if that hardware has been working for
>> any length of time, you *know* that the data on it is good.
>
> You only know this if the data has previously been read.
>
> Assume that the device temporarily stops pysically writing, but otherwise
> responds normally to ZFS.  Then the device starts writing again (including a
> recent uberblock), but with a large gap in the writes.  Then the system
> loses power, or crashes.  What happens then?

Well in that case you're screwed, but if ZFS is known to handle even
corrupted pools automatically, when that happens the immediate
response on the forums is going to be "something really bad has
happened to your hardware", followed by troubleshooting to find out
what.  Instead of the response now, where we all know there's every
chance the data is ok, and just can't be gotten to without zdb.

Also, that's a pretty extreme situation since you'd need a device that
is being written to but not read from to fail in this exact way.  It
also needs to have no scrubbing being run, so the problem has remained
undetected.

However, even in that situation, if we assume that it happened and
that these recovery tools are available, ZFS will either report that
your pool is seriously corrupted, indicating a major hardware problem
(and ZFS can now state this with some confidence), or ZFS will be able
to open a previous uberblock, mount your pool and begin a scrub, at
which point all your missing writes will be found too and reported.

And then you can go back to your snapshots.  :-D


>
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn

On Fri, 13 Feb 2009, Ross Smith wrote:


You have to consider that even with improperly working hardware, ZFS
has been checksumming data, so if that hardware has been working for
any length of time, you *know* that the data on it is good.


You only know this if the data has previously been read.

Assume that the device temporarily stops pysically writing, but 
otherwise responds normally to ZFS.  Then the device starts writing 
again (including a recent uberblock), but with a large gap in the 
writes.  Then the system loses power, or crashes.  What happens then?


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread David Collier-Brown


Bob Friesenhahn wrote:
> On Fri, 13 Feb 2009, Ross wrote:
>>
>> Something like that will have people praising ZFS' ability to
>> safeguard their data, and the way it recovers even after system
>> crashes or when hardware has gone wrong.  You could even have a
>> "common causes of this are..." message, or a link to an online help
>> article if you wanted people to be really impressed.
> 
> I see a career in politics for you.  Barring an operating system
> implementation bug, the type of problem you are talking about is due to
> improperly working hardware.  Irreversibly reverting to a previous
> checkpoint may or may not obtain the correct data.  Perhaps it will
> produce a bunch of checksum errors.

Actually that's a lot like FMA replies when it sees a problem,
telling the person what happened and pointing them to a web page
which can be updated with the newest information on the problem.

That's a good spot for "This pool was not unmounted cleanly due
to a hardware fault and data has been lost.  The ""
line contains the date which can be recovered to.  Use the command
  # zfs reframbulocate   -t 
to revert to 

--dave
-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
dav...@sun.com |  -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith
On Fri, Feb 13, 2009 at 7:41 PM, Bob Friesenhahn
 wrote:
> On Fri, 13 Feb 2009, Ross wrote:
>>
>> Something like that will have people praising ZFS' ability to safeguard
>> their data, and the way it recovers even after system crashes or when
>> hardware has gone wrong.  You could even have a "common causes of this
>> are..." message, or a link to an online help article if you wanted people to
>> be really impressed.
>
> I see a career in politics for you.  Barring an operating system
> implementation bug, the type of problem you are talking about is due to
> improperly working hardware.  Irreversibly reverting to a previous
> checkpoint may or may not obtain the correct data.  Perhaps it will produce
> a bunch of checksum errors.

Yes, the root cause is improperly working hardware (or an OS bug like
6424510), but with ZFS being a copy on write system, when errors occur
with a recent write, for the vast majority of the pools out there you
still have huge amounts of data that is still perfectly valid and
should be accessible.  Unless I'm misunderstanding something,
reverting to a previous checkpoint gets you back to a state where ZFS
knows it's good (or at least where ZFS can verify whether it's good or
not).

You have to consider that even with improperly working hardware, ZFS
has been checksumming data, so if that hardware has been working for
any length of time, you *know* that the data on it is good.

Yes, if you have databases or files there that were mid-write, they
will almost certainly be corrupted.  But at least your filesystem is
back, and it's in as good a state as it's going to be given that in
order for your pool to be in this position, your hardware went wrong
mid-write.

And as an added bonus, if you're using ZFS snapshots, now your pool is
accessible, you have a bunch of backups available so you can probably
roll corrupted files back to working versions.

For me, that is about as good as you can get in terms of handling a
sudden hardware failure.  Everything that is known to be saved to disk
is there, you can verify (with absolute certainty) whether data is ok
or not, and you have backup copies of damaged files.  In the old days
you'd need to be reverting to tape backups for both of these, with
potentially hours of downtime before you even know where you are.
Achieving that in a few seconds (or minutes) is a massive step
forwards.

> There are already people praising ZFS' ability to safeguard their data, and
> the way it recovers even after system crashes or when hardware has gone
> wrong.

Yes there are, but the majority of these are praising the ability of
ZFS checksums to detect bad data, and to repair it when you have
redundancy in your pool.  I've not seen that many cases of people
praising ZFS' recovery ability - uberblock problems seem to have a
nasty habit of leaving you with tons of good, checksummed data on a
pool that you can't get to, and while many hardware problems are dealt
with, others can hang your entire pool.


>
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Nicolas Williams
On Fri, Feb 13, 2009 at 10:29:05AM -0800, Frank Cusack wrote:
> On February 13, 2009 1:10:55 PM -0500 Miles Nordin  wrote:
> >>"fc" == Frank Cusack  writes:
> >
> >fc> If you're misordering writes
> >fc> isn't that a completely different problem?
> >
> >no.  ignoring the flush cache command causes writes to be misordered.
> 
> oh.  can you supply a reference or if you have the time, some more
> explanation?  (or can someone else confirm this.)

Ordering matters for atomic operations, and filesystems are full of
those.

Now, if ordering is broken but the writes all eventually hit the disk
then no one will notice.  But if power failures and/or partitions
(cables get pulled, network partitions occur affecting an iSCSI
connection, ...) then bad things happen.

For ZFS the easiest way to ameliorate this is the txg fallback fix that
Jeff Bonwick has said is now a priority.  And if ZFS guarantees no block
re-use until N txgs pass after a block is freed, then the fallback can
be of up to N txgs, which gives you a decent chance that you'll recover
your pool in the face of buggy devices, but for each discarded txg you
lose that transaction's writes, you lose data incrementally.  (The
larger N is the better your chance that the oldest of the last N txg's
writes will all hit the disk in spite of the disk's lousy cache
behaviors.)

The next question is how to do the fallback, UI-wise.  Should it ever be
automatic?  A pool option for that would be nice (I'd use it on all-USB
pools).  If/when not automatic, how should the user/admin be informed of
the failure to open the pool and the option to fallback on an older txg
(with data loss)?  (For non-removable pools imported at boot time the
answer is that the service will fail, causing sulogin to be invoked so
you can fix the problem on console.  For removable pools there should be
a GUI.)

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn

On Fri, 13 Feb 2009, Ross wrote:


Something like that will have people praising ZFS' ability to 
safeguard their data, and the way it recovers even after system 
crashes or when hardware has gone wrong.  You could even have a 
"common causes of this are..." message, or a link to an online help 
article if you wanted people to be really impressed.


I see a career in politics for you.  Barring an operating system 
implementation bug, the type of problem you are talking about is due 
to improperly working hardware.  Irreversibly reverting to a previous 
checkpoint may or may not obtain the correct data.  Perhaps it will 
produce a bunch of checksum errors.


There are already people praising ZFS' ability to safeguard their 
data, and the way it recovers even after system crashes or when 
hardware has gone wrong.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross
Superb news, thanks Jeff.

Having that will really raise ZFS up a notch, and align it much better with 
peoples expectations.  I assume it'll work via zpool import, and let the user 
know what's gone wrong?

If you think back to this case, imagine how different the users response would 
have been if instead of being unable to mount the pool, ZFS had turned around 
and said:

"This pool was not unmounted cleanly, and data has been lost.  Do you want to 
restore your pool to the last viable state: (timestamp goes here)?"

Something like that will have people praising ZFS' ability to safeguard their 
data, and the way it recovers even after system crashes or when hardware has 
gone wrong.  You could even have a "common causes of this are..." message, or a 
link to an online help article if you wanted people to be really impressed.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin
> "fc" == Frank Cusack  writes:

fc> why would dropping a flush cache imply dropping every write
fc> after the flush cache?

it wouldn't and probably never does.  It was an imaginary scenario
invented to argue with you and to agree with the guy in the USB bug
who said ``dropping a cache flush command is as bad as dropping a
write.''

fc> oh.  can you supply a reference or if you have the time, some
fc> more explanation?  (or can someone else confirm this.)

I posted something long a few days ago that I need to revisit.  The
problem is, I don't actually understand how the disk commands work, so
I was talking out my ass.  Although I kept saying, ``I'm not sure it
actually works this way,'' my saying so doesn't help anyone who spends
the time to read it and then gets a bunch of mistaken garbage stuck in
his head, which people who actually recognize as garbage are too busy
to correct.  It'd be better for everyone if I didn't do that.

On the other hand, I think there's some worth to dreaming up several
possibilities of what I fantisize the various commands might mean or
do, rather than simply reading one of the specs to get the one right
answer, because from what people in here say it soudns as though
implementors of actual systems based on the SCSI commandset live in
this same imaginary world of fantastic and multiple realities without
any meaningful review or accountability that I do.  (disks, bridges,
iSCSI targets and initiators, VMWare/VBox storage, ...)


pgpkzKNL1NfqX.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack
On February 13, 2009 10:29:05 AM -0800 Frank Cusack  
wrote:

On February 13, 2009 1:10:55 PM -0500 Miles Nordin  wrote:

"fc" == Frank Cusack  writes:


fc> If you're misordering writes
fc> isn't that a completely different problem?

no.  ignoring the flush cache command causes writes to be misordered.


oh.  can you supply a reference or if you have the time, some more
explanation?  (or can someone else confirm this.)


uhh ... that question can be ignored as i answered it myself below.
sorry if i'm must being noisy now.


my understanding (weak, admittedly) is that drives will reorder writes
on their own, and this is generally considered normal behavior.  so
to guarantee consistency *in the face of some kind of failure like a
power loss*, we have write barriers.  flush-cache is a stronger kind
of write barrier.

now that i think more, i suppose yes if you ignore the flush cache,
then writes before and after the flush cache could be misordered,
however it's the same as if there were no flush cache at all, and
again as long as the drive has power and you can quiesce it then
the data makes it to disk, and all is consistent and well.  yes?


-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack

On February 13, 2009 1:10:55 PM -0500 Miles Nordin  wrote:

"fc" == Frank Cusack  writes:


fc> If you're misordering writes
fc> isn't that a completely different problem?

no.  ignoring the flush cache command causes writes to be misordered.


oh.  can you supply a reference or if you have the time, some more
explanation?  (or can someone else confirm this.)

my understanding (weak, admittedly) is that drives will reorder writes
on their own, and this is generally considered normal behavior.  so
to guarantee consistency *in the face of some kind of failure like a
power loss*, we have write barriers.  flush-cache is a stronger kind
of write barrier.

now that i think more, i suppose yes if you ignore the flush cache,
then writes before and after the flush cache could be misordered,
however it's the same as if there were no flush cache at all, and
again as long as the drive has power and you can quiesce it then
the data makes it to disk, and all is consistent and well.  yes?

whereas if you drop a write, well it's gone off into a black hole.


fc> Even then, I don't see how it's worse than DROPPING a write.
fc> The data eventually gets to disk, and at that point in time,
fc> the disk is consistent.  When dropping a write, the data never
fc> makes it to disk, ever.

If you drop the flush cache command and every write after the flush
cache command, yeah yeah it's bad, but in THAT case, the disk is still
always consistent because no writes have been misordered.


why would dropping a flush cache imply dropping every write after the
flush cache?


fc> In the face of a power loss, of course these result in the
fc> same problem,

no, it's completely different in a power loss, which is exactly the point.

If you pull the cord while the disk is inconsistent, you may lose the
entire pool.  If the disk is never inconsistent because you've never
misordered writes, you will only lose recent write activity.  Losing
everything you've ever written is usually much worse than losing what
you've written recently.


yeah, as soon as i wrote that i realized my error, so thank you and i
agree on that point.  *in the event of a power loss* being inconsistent
is a worse problem.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin
> "fc" == Frank Cusack  writes:

fc> If you're misordering writes
fc> isn't that a completely different problem?  

no.  ignoring the flush cache command causes writes to be misordered.

fc> Even then, I don't see how it's worse than DROPPING a write.
fc> The data eventually gets to disk, and at that point in time,
fc> the disk is consistent.  When dropping a write, the data never
fc> makes it to disk, ever.

If you drop the flush cache command and every write after the flush
cache command, yeah yeah it's bad, but in THAT case, the disk is still
always consistent because no writes have been misordered.

fc> In the face of a power loss, of course these result in the
fc> same problem,

no, it's completely different in a power loss, which is exactly the point.

If you pull the cord while the disk is inconsistent, you may lose the
entire pool.  If the disk is never inconsistent because you've never
misordered writes, you will only lose recent write activity.  Losing
everything you've ever written is usually much worse than losing what
you've written recently.

yeah yeah some devil's advocate will toss in, ``i *need* some
consistency promises or else it's better that the pool its hand and
say `broken, restore backup please' even if the hand-raising comes in
the form of losing the entire pool,'' well in that case neither one is
acceptable.  But if your requirements are looser, then dropping a
flush cache command plus every write after the flush cache command is
much better than just ignoring the flush cache command.  of course,
that is a weird kind of failure that never happens.  I described it
just to make a point, to argue against this overly-simple idea ``every
write is precious.  let's do them as soon as possible because there
could be Valuable Business Data inside the writes!  we don't want to
lose anything Valuable!''  The part of SYNC CACHE that's causing
people to lose entire pools isn't the ``hurry up!  write faster!''
part of the command, such that without it you still get your precious
writes, just a little slower.  NO.  It's the ``control the order of
writes'' part that's important for integrity on a single-device vdev.


pgpzrY74grvli.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Dick Hoogendijk
On Fri, 13 Feb 2009 17:53:00 +0100, Eric D. Mudama  
 wrote:



On Fri, Feb 13 at  9:14, Neil Perrin wrote:
Having a separate intent log on good hardware will not prevent  
corruption

on a pool with bad hardware. By "good" I mean hardware that correctly
flush their write caches when requested.


Can someone please name a specific piece of bad hardware?


Or better still, name a few -GOOD- ones.

--
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv107++
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack

On February 13, 2009 12:41:12 PM -0500 Miles Nordin  wrote:


"fc" == Frank Cusack  writes:


fc> if you have 100TB of data, wouldn't you have a completely
fc> redundant storage network

If you work for a ponderous leaf-eating brontosorous maybe.  If your
company is modern I think having such an oddly large amount of data in
one pool means you'd more likely have 70 whitebox peecees using
motherboard ethernet/sata only, connected to a mesh of unmanaged L2
switches (of some peculiar brand that happens to work well.)  There
will always be one or two peecees switched off, and constantly
something will be resilvering.  The home user case is not really just
for home users.  I think a lot of people are tired of paying quadruple
for stuff that still breaks, even serious people.


oh i dunno.  i recently worked for a company that practically defines
modern and we had multiples of 100TB of data.  Like you said, not all
in one place, but any given piece was fully redundant (well, if you
count RAID-5 as "fully" ... but I'm really referring to the infrastructure).

I can't imagine it any other way ... the cost of not having redundancy
in the face of a failure is so much higher compared to the cost of
building in that redundancy.

Also I'm not sure how you get 1 pool with more than 1 peecee as zfs is
not a cluster fs.  So what you are talking about is multiple pools,
and in that case if you do lose one (not redundant for whatever reason)
you only have to restore a fraction of the 100TB from backup.


fc> Isn't this easily worked around by having UPS power in
fc> addition to whatever the data center supplies?

In NYC over the last five years the power has been more reliable going
into my UPS than coming out of it.  The main reason for having a UPS
is wiring maintenance.  And the most important part of the UPS is the
externally-mounted bypass switch because the UPS also needs
maintenance.  UPS has never _solved_ anything, it always just helps.
so in the end we have to count on the software's graceful behavior,
not on absolutes.


I can't say I agree about the UPS, however I've already been pretty
forthright that UPS, etc. isn't the answer to the problem, just a
mitigating factor to the root problem.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack

On February 13, 2009 12:10:08 PM -0500 Miles Nordin  wrote:

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior.  The whole format vs rmformat mess is just
ridiculous.


thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack

On February 13, 2009 12:20:21 PM -0500 Miles Nordin  wrote:

"fc" == Frank Cusack  writes:


>> Dropping a flush-cache command is just as bad as dropping a
>> write.

fc> Not that it matters, but it seems obvious that this is wrong
fc> or anyway an exaggeration.  Dropping a flush-cache just means
fc> that you have to wait until the device is quiesced before the
fc> data is consistent.

fc> Dropping a write is much much worse.

backwards i think.  Dropping a flush-cache is WORSE than dropping the
flush-cache plus all writes after the flush-cache.  The problem that
causes loss of whole pools rather than loss of recently-written data
isn't that you're writing too little.  It's that you're dropping the
barrier and misordering the writes.  consequently you lose *everything
you've ever written,* which is much worse than losing some recent
writes, even a lot of them.


Who said dropping a flush-cache means dropping any subsequent writes,
or misordering writes?  If you're misordering writes isn't that a
completely different problem?  Even then, I don't see how it's worse
than DROPPING a write.  The data eventually gets to disk, and at that
point in time, the disk is consistent.  When dropping a write, the data
never makes it to disk, ever.

In the face of a power loss, of course these result in the same problem,
but even without a power loss the drop of a write is "catastrophic".

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Greg Palmer

Miles Nordin wrote:

gm> That implies that ZFS will have to detect removable devices
gm> and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior.  The whole format vs rmformat mess is just
ridiculous.  And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
  
Since this discussion is taking place in the context of someone removing 
a USB stick I think you're confusing the issue by dragging in other 
technologies. Let's keep this in the context of the posts preceding it 
which is how USB devices are treated. I would argue that one of the 
first design goals in an environment where you can expect people who are 
not computer professionals to be interfacing with computers is to make 
sure that the appropriate safeties are in place and that the system does 
not behave in a manner which a reasonable person might find unexpected.


This is common practice for any sort of professional engineering effort. 
As an example, you aren't going to go out there and find yourself a 
chainsaw being sold new without a guard. It might be removable, but the 
default is to include it. Why? Well because there is a considerable 
chance of damage to the user without it. Likewise with a file system on 
a device which might cache a data write for as long as thirty seconds 
while being easily removable. In this case, the user may write the file 
and seconds later remove the device. Many folks out there behave in this 
manner.


It really doesn't matter to them that they have a copy of the last save 
they did two hours ago, what they want and expect is that the most 
recent data they saved actually be on the USB stick for the to retrieve. 
What you are suggesting is that it is better to lose that data when it 
could have been avoided. I would personally suggest that it is better to 
have default behavior which is not surprising along with more advanced 
behavior for those who have bothered to read the manual. In Windows 
case, the write cache can be turned on, it is not "unchangeable" and 
those who have educated themselves use it. I seldom turn it on unless 
I'm doing heavy I/O to a USB hard drive, otherwise the performance 
difference is just not that great.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin
> "fc" == Frank Cusack  writes:

fc> if you have 100TB of data, wouldn't you have a completely
fc> redundant storage network

If you work for a ponderous leaf-eating brontosorous maybe.  If your
company is modern I think having such an oddly large amount of data in
one pool means you'd more likely have 70 whitebox peecees using
motherboard ethernet/sata only, connected to a mesh of unmanaged L2
switches (of some peculiar brand that happens to work well.)  There
will always be one or two peecees switched off, and constantly
something will be resilvering.  The home user case is not really just
for home users.  I think a lot of people are tired of paying quadruple
for stuff that still breaks, even serious people.

fc> Isn't this easily worked around by having UPS power in
fc> addition to whatever the data center supplies?

In NYC over the last five years the power has been more reliable going
into my UPS than coming out of it.  The main reason for having a UPS
is wiring maintenance.  And the most important part of the UPS is the
externally-mounted bypass switch because the UPS also needs
maintenance.  UPS has never _solved_ anything, it always just helps.
so in the end we have to count on the software's graceful behavior,
not on absolutes.


pgpPp2ozffVKi.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin
> "t" == Tim   writes:

 t> I would like to believe it has more to do with Solaris's
 t> support of USB than ZFS, but the fact remains it's a pretty
 t> glaring deficiency in 2009, no matter which part of the stack
 t> is at fault.

maybe, but for this job I don't much mind glaring deficiencies, as
long as it's possible to assemble a working system without resorting
to trial-and-error, and possible to know it's working before loading
data on it.  Right now, by following the ``best practices'', you don't
know what to buy, and after you receive the hardware you don't know if
it works until you lose a pool, at which time someone will tell you
``i guess it wasn't ever working.''  

Even if you order sun4v or an expensive FC disk shelf, you still don't
know if it works.

(though, I'm starting to suspect, ni the case of FC or iSCSI the
 answer is always ``it does not work'')

The only thing you know for sure is, if you lose a pool, someone will
blame it on hardware bugs surroudning cache flushes, or else try to
conflate the issue with a bunch of inapplicable garbage about
checksums and wire corruption.  This is unworkable.

I'm not saying glaring 2009 deficiencies are irrelevant---on my laptop
I do mind because I got out of a multi-year abusive relationship with
NetBSD/hpcmips, and now want all parts of my laptop to have drivers.
And I guess it applies to that neat timeslider / home-base--USB-disk
case we were talking about a month ago.  but for what I'm doing I will
actually accept the advice ``do not ever put ZFS on USB because ZFS is
a canary in the mine of USB bugs''---it's just, that advice is not
really good enough to settle the whole issue.


pgpFtPv2xfqGk.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin
> "fc" == Frank Cusack  writes:

>> Dropping a flush-cache command is just as bad as dropping a
>> write.

fc> Not that it matters, but it seems obvious that this is wrong
fc> or anyway an exaggeration.  Dropping a flush-cache just means
fc> that you have to wait until the device is quiesced before the
fc> data is consistent.

fc> Dropping a write is much much worse.

backwards i think.  Dropping a flush-cache is WORSE than dropping the
flush-cache plus all writes after the flush-cache.  The problem that
causes loss of whole pools rather than loss of recently-written data
isn't that you're writing too little.  It's that you're dropping the
barrier and misordering the writes.  consequently you lose *everything
you've ever written,* which is much worse than losing some recent
writes, even a lot of them.


pgp0bxNk2dBD0.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin
> "gm" == Gary Mills  writes:

gm> That implies that ZFS will have to detect removable devices
gm> and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior.  The whole format vs rmformat mess is just
ridiculous.  And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.

As we've said many times, if the devices are working properly, then
they can be unplugged uncleanly without corrupting the pool, and
without corrupting any other non-Microsoft filesystem.  This is an
old, SOLVED, problem.  It's ridiculous hypocricy to make whole
filesystems DSYNC, to even _invent the possibility for the filesystem
to be DSYNC_, just because it is possible to remove something.  Will
you do the same thing because it is possible for your laptop's battery
to run out?  just, STOP!  If the devices are broken, the problem is
that they're broken, not that they're removeable.

personally, I think everything with a broken write cache should be
black-listed in the kernel and attach read-only by default, whether
it's a USB bridge or a SATA disk.  This will not be perfect because
USB bridges, RAID layers and iSCSI targets, will often hide the
identity of the SATA drive behind them, and of course people will
demand a way to disable it.  but if you want to be ``safe'', then for
the sake of making the point, THIS is the right way to do it, not muck
around with these overloaded notions of ``removeable''.

Also, the so-far unacknowledged ``iSCSI/FC Write Hole'' should be
fixed so that a copy of all written data is held in the initiator's
buffer cache until it's verified as *on the physical platter/NVRAM* so
that it can be replayed if necessary, and SYNC CACHE commands are
allowed to fail far enough that even *things which USE the initiator,
like ZFS* will understand what it means when SYNC CACHE fails, and
bounced connections are handled correctly---otherwise, when
connections bounce or SYNC CACHE returns failure, correctness requires
that the initiator pretend like its plug was pulled and panic.  Short
of that the initiator system must forcibly unmount all filesystems
using that device and kill all processes that had files open on those
filesystems.  

And sysadmins should have and know how to cleverly use a
tool that tests for both functioning barriers and working SYNC CACHE,
end-to-end.

NO more ``removeable'' attributes, please!  You are just pretending to
solve a much bigger problem, and making things clumsy and disgusting
in the process.


pgpoCtG5UI9HX.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Eric D. Mudama

On Thu, Feb 12 at 19:43, Toby Thain wrote:
^^ Spec compliance is what we're testing for... We wouldn't know if this 
special variant is working correctly either. :)


Time the difference between NCQ reads with and without FUA in the
presence of overlapped cached write data.  That should have a
significant performance penalty, compared to a device servicing the
reads from a volatile buffer cache.

FYI, there are semi-commonly-available power control units that take
serial port or USB as an input, and have a whole bunch of SATA power
connectors on them.  These are the sorts of things that drive vendors
use to bounce power unexpectedly in their testing, if you need to
perform that same validation, it makes sense to invest in that bit of
infrastructure.

Something like this:
http://www.ulinktech.com/products/hw_power_hub.html

or just roll your own in a few days like this guy did for his printer:
http://chezphil.org/slugpower/


It should be pretty trivial to perform a few thousand cached writes,
issue a flush cache ext, and turn off power immediately after that
command completes.  Then go back and figure out how many of those
writes were successfully written as the device claimed.

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Eric D. Mudama

On Fri, Feb 13 at  9:14, Neil Perrin wrote:

Having a separate intent log on good hardware will not prevent corruption
on a pool with bad hardware. By "good" I mean hardware that correctly
flush their write caches when requested.


Can someone please name a specific piece of bad hardware?

--eric


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Neil Perrin

Having a separate intent log on good hardware will not prevent corruption
on a pool with bad hardware. By "good" I mean hardware that correctly
flush their write caches when requested.

Note, a pool is always consistent (again when using good hardware).
The function of the intent log is not to provide consistency (like a journal),
but to speed up synchronous requests like fsync and O_DSYNC.

Neil.

On 02/13/09 06:29, Jiawei Zhao wrote:

While mobility could be lost, usb storage still has the advantage of being cheap
and easy to install comparing to install internal disks on pc, so if I just 
want to
use it to provide zfs storage space for home file server, can a small  intent 
log
located on internal sata disk prevent the pool corruption caused by a power cut?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Kyle McDonald

On 2/13/2009 5:58 AM, Ross wrote:

huh?  but that looses the convenience of USB.

I've used USB drives without problems at all, just remember to "zpool export" 
them before you unplug.
   
I think there is a subcommand of cfgaadm you should run to to notify 
Solariss that you intend to unplug the device. I don't use USB, and my 
familiarity with cfgadm (for FC and SCSI) is limited.


  -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Jiawei Zhao
While mobility could be lost, usb storage still has the advantage of being 
cheap and easy to install comparing to install internal disks on pc, so if I 
just want to use it to provide zfs storage space for home file server, can a 
small  intent log located on internal sata disk prevent the pool corruption 
caused by a power cut?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross
huh?  but that looses the convenience of USB.

I've used USB drives without problems at all, just remember to "zpool export" 
them before you unplug.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Jiawei Zhao
I am wondering if the usb storage device is not reliable for ZFS usage, can the 
situation be improved if I put the intent log on internal sata disk to avoid 
corruption and utilize the convenience of usb storage
at the same time?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Frank Cusack

On February 12, 2009 1:44:34 PM -0800 bdebel...@intelesyscorp.com wrote:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510

...

Dropping a flush-cache command is just as bad as dropping a write.


Not that it matters, but it seems obvious that this is wrong or
anyway an exaggeration.  Dropping a flush-cache just means that you
have to wait until the device is quiesced before the data is consistent.

Dropping a write is much much worse.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Uwe Dippel
bcirvin,

you proposed "something to allow us to try to pull data from a failed pool".
Yes and no. 'Yes' as a pragmatic solution; 'no' for what ZFS was 'sold' to be: 
the last filesystem mankind would need. It was conceived as a filesystem that 
does not need recovery, due to its guaranteed consistent states on the/any 
drive - or better: at any moment. If this was truly the case, a recovery 
program was not needed, and I don't think SUN will like one neither.
It also is more then suboptimal to prevent caching as proposed by others; this 
is but a very ugly hack.

Again, and I have yet to receive comments on this, the original poster claimed 
to have done a proper flash/sync, and left a 100% consistent file system behind 
on his drive. At reboot, the pool, the higher entity, failed miserably.
Of course, now one can conceive a program that scans the whole drive, like in 
the good ole days on ancient file systems to recover all those 100% correct 
file system(s).
Or, one could - as proposed - add an Ãœberblock, like we had the FAT-mirror in 
the last millennium.

The alternative, and engineering-wise much better solution, would be to 
diagnose the weakness on the contextual or semantical level: Where 100% 
consistent file systems cannot be communicated to by the operating system. This 
- so it seems - is (still) a shortcoming of the concept of ZFS. Which might be 
solved by means of yesterday, I agree. 
Or, by throwing more work into the level of the volume management, the pools. 
Without claiming to have the solution, conceptually I might want to propose to 
do away with the static, look-up-table-like structure of the pool, as stored in 
a mirror or Ãœberblock. Could it be feasible to associate pools dynamically? 
Could it be feasible, that the filesystems in a pool create a (new) handle once 
they are updated in a consistent manner? And when the drive is plugged/turned 
on, the software simply collects all the handles of all file systems on that 
drive? Then the export/import is possible, but not required any longer, since 
the filesystems form their own entities. They can still have associated 
contextual/semantic (stored) structures into which they are 'plugged' once the 
drive is up; if one wanted to ('logical volume'). But with or without, the pool 
would self-configure when the drive starts by picking up all file system 
handles.

Uwe
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Sanjeev
Blake,

On Thu, Feb 12, 2009 at 05:35:14PM -0500, Blake wrote:
> That does look like the issue being discussed.
> 
> It's a little alarming that the bug was reported against snv54 and is
> still not fixed :(

Looks like the bug-report is out of sync.
I see that the bug has been fixed in B54. Here is the link to
source gate which shows that the fix is in the gate :
http://src.opensolaris.org/source/search?q=&defs=&refs=&path=&hist=6424510&project=%2Fonnv

And here are the diffs :
http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/uts/common/io/scsi/targets/sd.c?r2=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403169&r1=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403138

Thanks and regards,
Sanjeev.
> 
> Does anyone know how to push for resolution on this?  USB is pretty
> common, like it or not for storage purposes - especially amongst the
> laptop-using dev crowd that OpenSolaris apparently targets.
> 
> 
> 
> On Thu, Feb 12, 2009 at 4:44 PM, bdebel...@intelesyscorp.com
>  wrote:
> > Is this the crux of the problem?
> >
> > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510
> >
> > 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE.
> > This can cause catastrophic data corruption in the event of power loss,
> > even for filesystems like ZFS that are designed to survive it.
> > Dropping a flush-cache command is just as bad as dropping a write.
> > It violates the interface that software relies on to use the device.'
> > --
> > This message posted from opensolaris.org
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

Sanjeev Bagewadi
Solaris RPE 
Bangalore, India
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Toby Thain


On 12-Feb-09, at 7:02 PM, Eric D. Mudama wrote:


On Thu, Feb 12 at 21:45, Mattias Pantzare wrote:
A read of data in the disk cache will be read from the disk cache.  
You

can't tell the disk to ignore its cache and read directly from the
plater.

The only way to test this is to write and the remove the power from
the disk. Not easy in software.


Not true with modern SATA drives that support NCQ, as there is a FUA
bit that can be set by the driver on NCQ reads.  If the device
implements the spec,


^^ Spec compliance is what we're testing for... We wouldn't know if  
this special variant is working correctly either. :)


--T


any overlapped write cache data will be flushed,
invalidated, and a fresh read done from the non-volatile media for the
FUA read command.

--eric



--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Dave

Blake wrote:

I'm sure it's very hard to write good error handling code for hardware
events like this.

I think, after skimming this thread (a pretty wild ride), we can at
least decide that there is an RFE for a recovery tool for zfs -
something to allow us to try to pull data from a failed pool.  That
seems like a reasonable tool to request/work on, no?



The ability to force a roll back to an older uberblock in order to be 
able to access the pool (in the case of corrupt current uberblock) 
should be ZFS developer's very top priority, IMO. I'd offer to do it 
myself, but I have nowhere near the ability to do so.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Eric D. Mudama

On Thu, Feb 12 at 21:45, Mattias Pantzare wrote:

A read of data in the disk cache will be read from the disk cache. You
can't tell the disk to ignore its cache and read directly from the
plater.

The only way to test this is to write and the remove the power from
the disk. Not easy in software.


Not true with modern SATA drives that support NCQ, as there is a FUA
bit that can be set by the driver on NCQ reads.  If the device
implements the spec, any overlapped write cache data will be flushed,
invalidated, and a fresh read done from the non-volatile media for the
FUA read command.

--eric



--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Blake
I'm sure it's very hard to write good error handling code for hardware
events like this.

I think, after skimming this thread (a pretty wild ride), we can at
least decide that there is an RFE for a recovery tool for zfs -
something to allow us to try to pull data from a failed pool.  That
seems like a reasonable tool to request/work on, no?


On Thu, Feb 12, 2009 at 6:03 PM, Toby Thain  wrote:
>
> On 12-Feb-09, at 3:02 PM, Tim wrote:
>
>
> On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet  wrote:
>>
>> On Thu, February 12, 2009 10:10, Ross wrote:
>>
>> > Of course, that does assume that devices are being truthful when they
>> > say
>> > that data has been committed, but a little data loss from badly designed
>> > hardware is I feel acceptable, so long as ZFS can have a go at
>> > recovering
>> > corrupted pools when it does happen, instead of giving up completely
>> > like
>> > it does now.
>>
>> Well; not "acceptable" as such.  But I'd agree it's outside ZFS's purview.
>>  The blame for data lost due to hardware actively lying and not working to
>> spec goes to the hardware vendor, not to ZFS.
>>
>> If ZFS could easily and reliably warn about such hardware I'd want it to,
>> but the consensus seems to be that we don't have a reliable qualification
>> procedure.  In terms of upselling people to a Sun storage solution, having
>> ZFS diagnose problems with their cheap hardware early is clearly desirable
>> :-).
>>
>
>
> Right, well I can't imagine it's impossible to write a small app that can
> test whether or not drives are honoring correctly by issuing a commit and
> immediately reading back to see if it was indeed committed or not.
>
> You do realise that this is not as easy as it looks? :) For one thing, the
> drive will simply serve the read from cache.
> It's hard to imagine a test that doesn't involve literally pulling plugs;
> even better, a purpose built hardware test harness.
> Nonetheless I hope that someone comes up with a brilliant test. But if the
> ZFS team hasn't found one yet... it looks grim :)
> --Toby
>
> Like a "zfs test cXtX".  Of course, then you can't just blame the hardware
> everytime something in zfs breaks ;)
>
> --Tim
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Toby Thain


On 12-Feb-09, at 3:02 PM, Tim wrote:




On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet   
wrote:


On Thu, February 12, 2009 10:10, Ross wrote:

> Of course, that does assume that devices are being truthful when  
they say
> that data has been committed, but a little data loss from badly  
designed
> hardware is I feel acceptable, so long as ZFS can have a go at  
recovering
> corrupted pools when it does happen, instead of giving up  
completely like

> it does now.

Well; not "acceptable" as such.  But I'd agree it's outside ZFS's  
purview.
 The blame for data lost due to hardware actively lying and not  
working to

spec goes to the hardware vendor, not to ZFS.

If ZFS could easily and reliably warn about such hardware I'd want  
it to,
but the consensus seems to be that we don't have a reliable  
qualification
procedure.  In terms of upselling people to a Sun storage solution,  
having
ZFS diagnose problems with their cheap hardware early is clearly  
desirable

:-).



Right, well I can't imagine it's impossible to write a small app  
that can test whether or not drives are honoring correctly by  
issuing a commit and immediately reading back to see if it was  
indeed committed or not.


You do realise that this is not as easy as it looks? :) For one  
thing, the drive will simply serve the read from cache.


It's hard to imagine a test that doesn't involve literally pulling  
plugs; even better, a purpose built hardware test harness.


Nonetheless I hope that someone comes up with a brilliant test. But  
if the ZFS team hasn't found one yet... it looks grim :)


--Toby

Like a "zfs test cXtX".  Of course, then you can't just blame the  
hardware everytime something in zfs breaks ;)


--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Bill Sommerfeld

On Thu, 2009-02-12 at 17:35 -0500, Blake wrote:
> That does look like the issue being discussed.
> 
> It's a little alarming that the bug was reported against snv54 and is
> still not fixed :(

bugs.opensolaris.org's information about this bug is out of date.

It was fixed in snv_54:

changeset:   3169:1dea14abfe17
user:phitran
date:Sat Nov 25 11:05:17 2006 -0800
files:   usr/src/uts/common/io/scsi/targets/sd.c

6424510 usb ignores DKIOCFLUSHWRITECACHE

- Bill

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread bdebel...@intelesyscorp.com
I just tried putting a pool on a USB flash drive, writing a file to it, and 
then yanking it.  I did not lose any data or the pool, but I had to reboot 
before I could get any zpool command to complete without freezing.  I also had 
OS reboot once on its own, when I tried to issue a zpool command to the pool.  

OS did noticed the disk was yanked until i tried to status it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread David Dyer-Bennet

On Thu, February 12, 2009 14:02, Tim wrote:

>
> Right, well I can't imagine it's impossible to write a small app that can
> test whether or not drives are honoring correctly by issuing a commit and
> immediately reading back to see if it was indeed committed or not.  Like a
> "zfs test cXtX".  Of course, then you can't just blame the hardware
> everytime something in zfs breaks ;)
>

I can imagine it fairly easily.  All you've got to work with is what the
drive says about itself, and how fast, and the what we're trying to test
is whether it lies.  It's often very hard to catch it out on this sort of
thing.

We need somebody who really understands the command sets available to send
to modern drives (which is not me) to provide a test they think would
work, and people can argue or try it.  My impression, though, is that the
people with the expertise are so far consistently saying it's not
possible.   I think at this point somebody who thinks it's possible needs
to do the work to at least propose a specific test, or else we have to
give up on the idea.

I'm still hoping for at least some kind of qualification procedure
involving manual intervention (hence not something that could be embodied
in a simple command you just typed), but we're not seeing even this so
far.

Of course, the other side of this is that, if people "know" that drives
have these problems, there must in fact be some way to demonstrate it, or
they wouldn't know.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Blake
That does look like the issue being discussed.

It's a little alarming that the bug was reported against snv54 and is
still not fixed :(

Does anyone know how to push for resolution on this?  USB is pretty
common, like it or not for storage purposes - especially amongst the
laptop-using dev crowd that OpenSolaris apparently targets.



On Thu, Feb 12, 2009 at 4:44 PM, bdebel...@intelesyscorp.com
 wrote:
> Is this the crux of the problem?
>
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510
>
> 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE.
> This can cause catastrophic data corruption in the event of power loss,
> even for filesystems like ZFS that are designed to survive it.
> Dropping a flush-cache command is just as bad as dropping a write.
> It violates the interface that software relies on to use the device.'
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread bdebel...@intelesyscorp.com
Is this the crux of the problem?

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510

'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE.
This can cause catastrophic data corruption in the event of power loss,
even for filesystems like ZFS that are designed to survive it.
Dropping a flush-cache command is just as bad as dropping a write.
It violates the interface that software relies on to use the device.'
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Ross Smith
That would be the ideal, but really I'd settle for just improved error
handling and recovery for now.  In the longer term, disabling write
caching by default for USB or Firewire drives might be nice.


On Thu, Feb 12, 2009 at 8:35 PM, Gary Mills  wrote:
> On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote:
>> Ross wrote:
>> >I can also state with confidence that very, very few of the 100 staff
>> >working here will even be aware that it's possible to unmount a USB volume
>> >in windows.  They will all just pull the plug when their work is saved,
>> >and since they all come to me when they have problems, I think I can
>> >safely say that pulling USB devices really doesn't tend to corrupt
>> >filesystems in Windows.  Everybody I know just waits for the light on the
>> >device to go out.
>> >
>> The key here is that Windows does not cache writes to the USB drive
>> unless you go in and specifically enable them. It caches reads but not
>> writes. If you enable them you will lose data if you pull the stick out
>> before all the data is written. This is the type of safety measure that
>> needs to be implemented in ZFS if it is to support the average user
>> instead of just the IT professionals.
>
> That implies that ZFS will have to detect removable devices and treat
> them differently than fixed devices.  It might have to be an option
> that can be enabled for higher performance with reduced data security.
>
> --
> -Gary Mills--Unix Support--U of M Academic Computing and Networking-
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Mattias Pantzare
>
> Right, well I can't imagine it's impossible to write a small app that can
> test whether or not drives are honoring correctly by issuing a commit and
> immediately reading back to see if it was indeed committed or not.  Like a
> "zfs test cXtX".  Of course, then you can't just blame the hardware
> everytime something in zfs breaks ;)

A read of data in the disk cache will be read from the disk cache. You
can't tell the disk to ignore its cache and read directly from the
plater.

 The only way to test this is to write and the remove the power from
the disk. Not easy in software.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Gary Mills
On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote:
> Ross wrote:
> >I can also state with confidence that very, very few of the 100 staff 
> >working here will even be aware that it's possible to unmount a USB volume 
> >in windows.  They will all just pull the plug when their work is saved, 
> >and since they all come to me when they have problems, I think I can 
> >safely say that pulling USB devices really doesn't tend to corrupt 
> >filesystems in Windows.  Everybody I know just waits for the light on the 
> >device to go out.
> >  
> The key here is that Windows does not cache writes to the USB drive 
> unless you go in and specifically enable them. It caches reads but not 
> writes. If you enable them you will lose data if you pull the stick out 
> before all the data is written. This is the type of safety measure that 
> needs to be implemented in ZFS if it is to support the average user 
> instead of just the IT professionals.

That implies that ZFS will have to detect removable devices and treat
them differently than fixed devices.  It might have to be an option
that can be enabled for higher performance with reduced data security.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Tim
On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet  wrote:

>
> On Thu, February 12, 2009 10:10, Ross wrote:
>
> > Of course, that does assume that devices are being truthful when they say
> > that data has been committed, but a little data loss from badly designed
> > hardware is I feel acceptable, so long as ZFS can have a go at recovering
> > corrupted pools when it does happen, instead of giving up completely like
> > it does now.
>
> Well; not "acceptable" as such.  But I'd agree it's outside ZFS's purview.
>  The blame for data lost due to hardware actively lying and not working to
> spec goes to the hardware vendor, not to ZFS.
>
> If ZFS could easily and reliably warn about such hardware I'd want it to,
> but the consensus seems to be that we don't have a reliable qualification
> procedure.  In terms of upselling people to a Sun storage solution, having
> ZFS diagnose problems with their cheap hardware early is clearly desirable
> :-).
>
>

Right, well I can't imagine it's impossible to write a small app that can
test whether or not drives are honoring correctly by issuing a commit and
immediately reading back to see if it was indeed committed or not.  Like a
"zfs test cXtX".  Of course, then you can't just blame the hardware
everytime something in zfs breaks ;)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread David Dyer-Bennet

On Thu, February 12, 2009 10:10, Ross wrote:

> Of course, that does assume that devices are being truthful when they say
> that data has been committed, but a little data loss from badly designed
> hardware is I feel acceptable, so long as ZFS can have a go at recovering
> corrupted pools when it does happen, instead of giving up completely like
> it does now.

Well; not "acceptable" as such.  But I'd agree it's outside ZFS's purview.
 The blame for data lost due to hardware actively lying and not working to
spec goes to the hardware vendor, not to ZFS.

If ZFS could easily and reliably warn about such hardware I'd want it to,
but the consensus seems to be that we don't have a reliable qualification
procedure.  In terms of upselling people to a Sun storage solution, having
ZFS diagnose problems with their cheap hardware early is clearly desirable
:-).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Greg Palmer

Ross wrote:

I can also state with confidence that very, very few of the 100 staff working 
here will even be aware that it's possible to unmount a USB volume in windows.  
They will all just pull the plug when their work is saved, and since they all 
come to me when they have problems, I think I can safely say that pulling USB 
devices really doesn't tend to corrupt filesystems in Windows.  Everybody I 
know just waits for the light on the device to go out.
  
The key here is that Windows does not cache writes to the USB drive 
unless you go in and specifically enable them. It caches reads but not 
writes. If you enable them you will lose data if you pull the stick out 
before all the data is written. This is the type of safety measure that 
needs to be implemented in ZFS if it is to support the average user 
instead of just the IT professionals.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Robert Milkowski
Hello Bob,

Wednesday, February 11, 2009, 11:25:12 PM, you wrote:

BF> I agree.  ZFS apparently syncs uncommitted writes every 5 seconds. 
BF> If there has been no filesystem I/O (including read I/O due to atime) 
BF> for at least 10 seconds, and there has not been more data 
BF> burst-written into RAM than can be written to disk in 10 seconds, then
BF> there should be nothing remaining to write.

That's not entirely true. After recent changes writes could be delayed
even up-to 30s by default.


-- 
Best regards,
 Robert Milkowski
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Ross
> All that and yet the fact
> remains: I've never "ejected" a USB
> drive from OS X or Windows, I simply pull it and go,
> and I've never once lost data, or had it become
> unrecoverable or even corrupted.
> And yes, I do keep checksums of all the data
> sitting on them and periodically check it.  So,
> for all of your ranting and raving, the fact remains
> even a *crappy* filesystem like fat32 manages to
> handle a hot unplug without any prior notice without
> going belly up.
> --Tim

Just wanted to chime in with my 2c here.  I've also *never* unmounted a USB 
drive from windows, and have been using them regularly since memory sticks 
became available.  So that's 2-3 years of experience and I've never lost work 
on a memory stick, nor had a file corrupted.

I can also state with confidence that very, very few of the 100 staff working 
here will even be aware that it's possible to unmount a USB volume in windows.  
They will all just pull the plug when their work is saved, and since they all 
come to me when they have problems, I think I can safely say that pulling USB 
devices really doesn't tend to corrupt filesystems in Windows.  Everybody I 
know just waits for the light on the device to go out.

And while this isn't really what ZFS is designed to do, I do think it should be 
able to cope.  First of all, some kind of ZFS recovery tools are needed.  
There's going to be an awful lot of good data on that disk, making all of that 
inaccessible just because the last write failed isn't really on.  It's a copy 
on write filesystem, "zpool import" really should be able to take advantage of 
that for recovering pools!

I don't know the technicalities of how it works on disk, but my feeling is that 
the last successful mount point should be saved, and the last few uberblocks 
should also be available, so barring complete hardware failure, some kind of 
pool should be available for mounting.

Also, if a drive is removed while writes are pending, some kind of error or 
warning is needed, either in the console, or the GUI.  It should be possible to 
prompt the user to re-insert the device so that the remaining writes can be 
completed.  Recovering the pool in that situation should be easy - you can keep 
the location of the uberblock you're using in memory, and just re-write 
everything.

Of course, that does assume that devices are being truthful when they say that 
data has been committed, but a little data loss from badly designed hardware is 
I feel acceptable, so long as ZFS can have a go at recovering corrupted pools 
when it does happen, instead of giving up completely like it does now.

Yes, these problems happen more often with consumer level hardware, but 
recovery tools like this are going to be very much appreciated by anybody who 
encounters problems like this on a server!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread D. Eckert
after all statements read here I just want to highlight another issue regarding 
ZFS.

It was here many times recommended to set copies=2.

Installing Solaris 10 10/2008 or snv_107 you can choose either to use UFS or 
ZFS.

If you choose ZFS by default, the rpool will be created by default with 
'copies=1'.

If someone does not mention this and you have a hanging system with no chance 
to access or to shutdown properly and you have no other chance than to press 
the power button of your notebook through the desk plate, couldn't it be that 
there happens the same with my external usb drive?

This is the same sudden power off event what seems to damage my pool.

And it would be a nice to have that ZFS could handle this.

Another issue what I miss in this thread is, that ZFS is a layer on an EFI 
lable. What about that in case of a sudden power off event?

Regards,

Dave.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread David Dyer-Bennet

On Wed, February 11, 2009 18:16, Uwe Dippel wrote:
> I need to disappoint you here, LED inactive for a few seconds is a very
> bad indicator of pending writes. Used to experience this on a stick on
> Ubuntu, which was silent until the 'umount' and then it started to write
> for some 10 seconds.

Yikes, that's bizarre.

> On the other hand, you are spot-on w.r.t. 'umount'. Once the command is
> through, there is no more write to be expected. And if there was, it would
> be a serious bug. So this 'umount'ed system needs to be in perfectly
> consistent states. (Which is why I wrote further up that the structure
> above the file system, that is the pool, is probably the culprit for all
> this misery.)

Yeah, once it's unmounted it really REALLY should be in a consistent state.

> [i]Conversely, anybody who is pulling disks / memory sticks off while IO
> is
> visibly incomplete really SHOULD expect to lose everything on them[/i]
> I hope you don't mean this. Not in a filesystem much hyped and much
> advanced. Of course, we expect corruption of all files whose 'write' has
> been boldly interrupted. But I for one, expect the metadata of all other
> files to be readily available. Kind of, at the next use, telling me:"You
> idiot removed the plug last, while files were still in the process of
> writing. Don't expect them to be available now. Here is the list of all
> other files: [list of all files not being written then]"

It's good to have hopes, certainly.  I'm just kinda cynical.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Uwe Dippel
May I doubt that there are drives that don't 'sync'? That means you have a good 
chance of corrupted data at a normal 'reboot'; or just at a 'umount' (without 
considering ZFS here). 
May I doubt the marketing drab that you need to buy a USCSI or whatnot to have 
functional 'sync' at a shutdown or umount? There are millions if not billions 
of drives out there that come up with consistent data structures after a clean 
shutdown. 
This means that a proper 'umount' flushes everything on those drives, and we 
need not expect corrupted data, and no further writes. And that was the topic 
further up to which I tried to answer. As well as to the notion that a file 
system that encounters interrupted writes may well and legally be completely 
unreadable. That is what I refuted, nothing else. 

Uwe
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Toby Thain


On 11-Feb-09, at 9:30 PM, Uwe Dippel wrote:


Toby,

sad that you fall for the last resort of the marketing droids here.  
All manufactures (and there are only a few left) will sue the hell  
out of you if you state that their drives don't 'sync'. And each  
and every drive I have ever used did. So the talk about a distinct  
borderline between 'enterprise' and 'home' is just cheap and not  
sustainable.


They have existed. This thread has shown a motive to verify COTS  
drives for this property, if the data is valuable.




Also, if you were correct, and ZFS allowed for compromising the  
metadata of dormant files (folders) by writing metadata for other  
files (folders), we would not have advanced beyond FAT, and ZFS  
would be but a short episode in the history of file systems. Or am  
I the last to notice that atomic writes have been dropped?  
Especially with atomic writes you either have the last consistent  
state of the file structure, or the updated one. So what would be  
the meaning of 'always consistent on the drive' if metadata were  
allowed to hang in between; in an inconsistent state? You write  
"What is known, is the last checkpoint." Exactly, and here a  
contradiction shows: the last checkpoint of all untouched files  
(plus those read only) does contain exactly all untouched files.  
How could one allow to compromise the last checkpoint by writing a  
new one?


ZFS claims that the last checkpoint (my term, sorry, not an official  
one) is fully consistent (metadata *and* data! Unlike other  
filesystems). Since consistency is achievable by thousands of other  
transactional systems I have no reason to doubt that it is achieved  
by ZFS.


You are correct with "the feasible recovery mode is a partial".  
Though here we have heard some stories of total loss. Nobody has  
questioned that the recovery of an interrupted 'write' must  
necessarily be partial. What is questioned is the complete loss of  
semantics.


Only an incomplete transaction would be lost, AIUI. That is the  
'atomic' property of all journaled and transactional systems. (All of  
it, or none of it.)


--Toby




Uwe
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 18:25, Toby Thain wrote:

>
> Absolutely. You should never get "actual corruption" (inconsistency)
> at any time *except* in the case Jeff Bonwick explained: i.e. faulty/
> misbehaving hardware! (That's one meaning of "always consistent on
> disk".)
>
> I think this is well understood, is it not?

Perhaps.  I think the consensus seems to be settling down this direction
(as I filter for reliability of people posting, not by raw count :-)).

The shocker is how much hardware that doesn't behave to spec in this area
seems to be out there -- or so people claim; the other problem is that we
can't sort out which is which.

> Write barriers are not a new concept, and nor is the necessity. For
> example, they are a clearly described feature of DEC's MSCP
> protocol*, long before ATA or SCSI - presumably so that transactional
> systems could actually be built at all. Devices were held to a high
> standard of conformance since DEC's customers (like Sun's) were
> traditionally those whose data was of very high value. Storage
> engineers across the industry were certainly implementing them long
> before MSCP.
>
> --Toby
>
>
> * - The related patent that I am looking at is #4,449,182, filed 5
> Oct, 1981.
> "Interface between a pair of processors, such as host and peripheral-
> controlling processors in data processing systems."

While I was working for LCG in Marlboro, in fact.  (Not on hardware,
nowhere near that work.)
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 17:25, Bob Friesenhahn wrote:

> Regardless, it seems that the ZFS problems with crummy hardware are
> primarily due to the crummy hardware writting the data to the disk in
> a different order than expected.  ZFS expects that after a sync that
> all pending writes are committed.

Which is something Unix has been claiming (or pretending) to provide for
some time now, yes.

> The lesson is that unprofessional hardware may prove to be unreliable
> for professional usage.

Or any other usage.  And the question is how can we tell them apart?

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Uwe Dippel
Toby,

sad that you fall for the last resort of the marketing droids here. All 
manufactures (and there are only a few left) will sue the hell out of you if 
you state that their drives don't 'sync'. And each and every drive I have ever 
used did. So the talk about a distinct borderline between 'enterprise' and 
'home' is just cheap and not sustainable. 

Also, if you were correct, and ZFS allowed for compromising the metadata of 
dormant files (folders) by writing metadata for other files (folders), we would 
not have advanced beyond FAT, and ZFS would be but a short episode in the 
history of file systems. Or am I the last to notice that atomic writes have 
been dropped? Especially with atomic writes you either have the last consistent 
state of the file structure, or the updated one. So what would be the meaning 
of 'always consistent on the drive' if metadata were allowed to hang in 
between; in an inconsistent state? You write "What is known, is the last 
checkpoint." Exactly, and here a contradiction shows: the last checkpoint of 
all untouched files (plus those read only) does contain exactly all untouched 
files. How could one allow to compromise the last checkpoint by writing a new 
one?
You are correct with "the feasible recovery mode is a partial". Though here we 
have heard some stories of total loss. Nobody has questioned that the recovery 
of an interrupted 'write' must necessarily be partial. What is questioned is 
the complete loss of semantics.

Uwe
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Toby Thain


On 11-Feb-09, at 7:16 PM, Uwe Dippel wrote:

I need to disappoint you here, LED inactive for a few seconds is a  
very bad indicator of pending writes. Used to experience this on a  
stick on Ubuntu, which was silent until the 'umount' and then it  
started to write for some 10 seconds.


On the other hand, you are spot-on w.r.t. 'umount'. Once the  
command is through, there is no more write to be expected. And if  
there was, it would be a serious bug.


Yes; though at the risk of repetition - the bug here can be in the  
drive...


So this 'umount'ed system needs to be in perfectly consistent  
states. (Which is why I wrote further up that the structure above  
the file system, that is the pool, is probably the culprit for all  
this misery.)


[i]Conversely, anybody who is pulling disks / memory sticks off  
while IO is

visibly incomplete really SHOULD expect to lose everything on them[/i]
I hope you don't mean this. Not in a filesystem much hyped and much  
advanced. Of course, we expect corruption of all files whose  
'write' has been boldly interrupted. But I for one, expect the  
metadata of all other files to be readily available. Kind of, at  
the next use, telling me:"You idiot removed the plug last, while  
files were still in the process of writing. Don't expect them to be  
available now. Here is the list of all other files: [list of all  
files not being written then]"


That hope is a little naive. AIUI, it cannot be known, thanks to the  
many indeterminacies of the I/O path, which 'files' were partially  
written (since a whole slew of copy-on-writes to many objects could  
have been in flight, and absent a barrier it cannot be known post  
facto which succeeded). What is known, is the last checkpoint. Hence  
the feasible recovery mode is a partial, automatic rollback to a past  
consistent state.


Somebody correct me if I am wrong.

--Toby



Uwe
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Toby Thain


On 11-Feb-09, at 5:52 PM, David Dyer-Bennet wrote:



On Wed, February 11, 2009 15:52, Bob Friesenhahn wrote:

On Wed, 11 Feb 2009, Tim wrote:


Right, except the OP stated he unmounted the filesystem in  
question, and

it
was the *ONLY* one on the drive, meaning there is absolutely 0  
chance of

their being pending writes.  There's nothing to write to.


This is an interesting assumption leading to a wrong conclusion.  If
the file is updated and the filesystem is "unmounted", it is still
possible for there to be uncommitted data in the pool. ...


As a practical matter, it seems unreasonable to me that there would be
uncommitted data in the pool after some quite short period of time ...

That is, if I plug in a memory stick with ZFS on it, read and write  
for a
while, then when I'm done and IO appears to have quiesced, observe  
that
the IO light on the drive is inactive for several seconds, I'd be  
kinda

disappointed if I got actual corrution if I pulled it.


Absolutely. You should never get "actual corruption" (inconsistency)  
at any time *except* in the case Jeff Bonwick explained: i.e. faulty/ 
misbehaving hardware! (That's one meaning of "always consistent on  
disk".)


I think this is well understood, is it not?

Write barriers are not a new concept, and nor is the necessity. For  
example, they are a clearly described feature of DEC's MSCP  
protocol*, long before ATA or SCSI - presumably so that transactional  
systems could actually be built at all. Devices were held to a high  
standard of conformance since DEC's customers (like Sun's) were  
traditionally those whose data was of very high value. Storage  
engineers across the industry were certainly implementing them long  
before MSCP.


--Toby


* - The related patent that I am looking at is #4,449,182, filed 5  
Oct, 1981.
"Interface between a pair of processors, such as host and peripheral- 
controlling processors in data processing systems."


Also the MSCP document released with the UDA50 mass storage  
subsystem, dated April 1982:


"4.5 Command Categories and Execution Order
...
Sequential commands are those commands that, for the same unit, must  
be executed in precise order. ... All sequential commands for a  
particular unit that are received on the same connection must be  
executed in the exact order that the MSCP server receives them. The  
execution of a sequential command may not be interleaved with the  
execution of any other sequential or non-sequential commands for the  
same unit. Furthermore, any non-sequential commands received before  
and on the same connection as a particular sequential command must be  
completed before execution of that sequential command begins, and any  
non-sequential commands received after and on the same conection as a  
particular sequential command must not begin execution until after  
that sequential command is completed. Sequential commands are, in  
effect, a barrier than non-sequential commands cannot pass or penetrate.
   Non-sequential commands are those commands that controllers may  
re-order so as to optimize performance. Controllers may furthermore  
interleave the execution of several non-sequential commands among  
themselves, ..."





Complaints about
not being exported next time I tried to import it, sure.  Maybe other
complaints.  I wouldn't do this deliberately (other than for testing).
...

--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Uwe Dippel
I need to disappoint you here, LED inactive for a few seconds is a very bad 
indicator of pending writes. Used to experience this on a stick on Ubuntu, 
which was silent until the 'umount' and then it started to write for some 10 
seconds.

On the other hand, you are spot-on w.r.t. 'umount'. Once the command is 
through, there is no more write to be expected. And if there was, it would be a 
serious bug. So this 'umount'ed system needs to be in perfectly consistent 
states. (Which is why I wrote further up that the structure above the file 
system, that is the pool, is probably the culprit for all this misery.) 

[i]Conversely, anybody who is pulling disks / memory sticks off while IO is
visibly incomplete really SHOULD expect to lose everything on them[/i]
I hope you don't mean this. Not in a filesystem much hyped and much advanced. 
Of course, we expect corruption of all files whose 'write' has been boldly 
interrupted. But I for one, expect the metadata of all other files to be 
readily available. Kind of, at the next use, telling me:"You idiot removed the 
plug last, while files were still in the process of writing. Don't expect them 
to be available now. Here is the list of all other files: [list of all files 
not being written then]"

Uwe
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Bob Friesenhahn

On Wed, 11 Feb 2009, David Dyer-Bennet wrote:


As a practical matter, it seems unreasonable to me that there would be
uncommitted data in the pool after some quite short period of time when
there's no new IO activity to the pool (not just the filesystem).  5 or 10
seconds, maybe?  (Possibly excepting if there was a HUGE spike of IO for a
while just before this; there could be considerable stuff in the ZIL not
yet committed then, I would think.)


I agree.  ZFS apparently syncs uncommitted writes every 5 seconds. 
If there has been no filesystem I/O (including read I/O due to atime) 
for at least 10 seconds, and there has not been more data 
burst-written into RAM than can be written to disk in 10 seconds, then 
there should be nothing remaining to write.


Regardless, it seems that the ZFS problems with crummy hardware are 
primarily due to the crummy hardware writting the data to the disk in 
a different order than expected.  ZFS expects that after a sync that 
all pending writes are committed.


The lesson is that unprofessional hardware may prove to be unreliable 
for professional usage.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 15:52, Bob Friesenhahn wrote:
> On Wed, 11 Feb 2009, Tim wrote:
>>
>> Right, except the OP stated he unmounted the filesystem in question, and
>> it
>> was the *ONLY* one on the drive, meaning there is absolutely 0 chance of
>> their being pending writes.  There's nothing to write to.
>
> This is an interesting assumption leading to a wrong conclusion.  If
> the file is updated and the filesystem is "unmounted", it is still
> possible for there to be uncommitted data in the pool.  If you pay
> closer attention you will see that "mounting" the filesystem basically
> just adds a logical path mapping since the filesystem is already
> available under /poolname/filesystemname regardless.  So doing the
> mount makes /poolname/filesystemname available as /filesystemname, or
> whatever mount path you specify.

As a practical matter, it seems unreasonable to me that there would be
uncommitted data in the pool after some quite short period of time when
there's no new IO activity to the pool (not just the filesystem).  5 or 10
seconds, maybe?  (Possibly excepting if there was a HUGE spike of IO for a
while just before this; there could be considerable stuff in the ZIL not
yet committed then, I would think.)

That is, if I plug in a memory stick with ZFS on it, read and write for a
while, then when I'm done and IO appears to have quiesced, observe that
the IO light on the drive is inactive for several seconds, I'd be kinda
disappointed if I got actual corrution if I pulled it.  Complaints about
not being exported next time I tried to import it, sure.  Maybe other
complaints.  I wouldn't do this deliberately (other than for testing). 
But it seems wrong to leave things uncommitted significantlylonger than
necessary (seconds are huge time units to a computer, after all), and if
the device is sitting there not doing IO, there's no reason it shouldn't
have been writing anything uncommitted instead.

Conversely, anybody who is pulling disks / memory sticks off while IO is
visibly incomplete really SHOULD expect to lose everything on them, even
if sometimes they'll be luckier than that.  I suppose we're dealing with
people who didn't work with floppies here, where that lesson got pretty
solidly beaten in to people :-.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 15:51, Frank Cusack wrote:
> On February 11, 2009 3:02:48 PM -0600 Tim  wrote:

>> It's hardly uncommon for an entire datacenter to go down, redundant
>> power
>> or not.  When it does, if it means I have to restore hundreds of
>> terabytes if not petabytes from tape instead of just restoring the files
>> that were corrupted or running an fsck, we've got issues.
>
> Isn't this easily worked around by having UPS power in addition to
> whatever the data center supplies?

Well, that covers some of the cases (it does take a fairly hefty UPS to
deal with 100TB levels of redundant disk).

> I've been there with entire data center shutdown (or partial, but entire
> as far as my gear is concerned), but for really critical stuff we've had
> our own UPS.

I knew people once who had pretty careful power support; UPS where needed,
then backup generator that would cut in automatically, and cut back when
power was restored.

Unfortunately, the cut back failed to happen automatically.  On a weekend.
 So things sailed along fine until the generator ran out of fuel, and then
shut down MOST uncleanly.

Best laid plans of mice and men gang aft agley, or some such (from memory,
and the spelling seems unlikely).  Sure, human error was a factor.  But
human error is a MAJOR factor in the real world, and one of the things
we're trying to protect our data from.

Certainly, if a short power glitch on the normal mains feed (to lapse into
Brit for a second) brings down your data server in an uncontrolled
fashion, you didn't do a very good job of protecting it.  My home NAS is
protected to the point of one UPS, anyway.  But real-world problems a few
steps more severe can produce the same power cut, practically anywhere,
just not as often.

> I don't know if that really works for 100TB and up though.  That's a lot
> of disk == a lot of UPS capacity.  And again, I'm not trying to take away
> from the fact that this is a significant zfs problem.

We've got this UPS in our server room that's about, oh, 4 washing machines
in size.  It's wired into building power, and powers the outlets the
servers are plugged into, and the floor outlets out here the development
PCs are plugged into also.

I never got the tour, but I heard about the battery backup system at the
old data center Northwest Airlines had back when they ran their own
reservations system.  Enough lead-acid batteries to keep an IBM mainframe
running for three hours.

One can certainly do it if one wants to badly enough, which one should if
the data is important.  I can't imagine anybody investing in 100TB of
enterprise-grade storage if the data WASN'T important!

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Bob Friesenhahn

On Wed, 11 Feb 2009, Tim wrote:


Right, except the OP stated he unmounted the filesystem in question, and it
was the *ONLY* one on the drive, meaning there is absolutely 0 chance of
their being pending writes.  There's nothing to write to.


This is an interesting assumption leading to a wrong conclusion.  If 
the file is updated and the filesystem is "unmounted", it is still 
possible for there to be uncommitted data in the pool.  If you pay 
closer attention you will see that "mounting" the filesystem basically 
just adds a logical path mapping since the filesystem is already 
available under /poolname/filesystemname regardless.  So doing the 
mount makes /poolname/filesystemname available as /filesystemname, or 
whatever mount path you specify.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Frank Cusack

On February 11, 2009 3:02:48 PM -0600 Tim  wrote:

On Wed, Feb 11, 2009 at 1:36 PM, Frank Cusack  wrote:



if you have 100TB of data, wouldn't you have a completely redundant
storage network -- dual FC switches on different electrical supplies,
etc.  i've never designed or implemented a storage network before but
such designs seem common in the literature and well supported by
Solaris.  i have done such designs with data networks and such
redundancy is quite common.

i mean, that's a lot of data to go missing due to a single device
failing -- which it will.

not to say it's not a problem with zfs, just that in the real world,
it should be mitigated since your storage network design would overcome
a single failure *anyway* -- regardless of zfs.



It's hardly uncommon for an entire datacenter to go down, redundant power
or not.  When it does, if it means I have to restore hundreds of
terabytes if not petabytes from tape instead of just restoring the files
that were corrupted or running an fsck, we've got issues.


Isn't this easily worked around by having UPS power in addition to whatever
the data center supplies?

I've been there with entire data center shutdown (or partial, but entire
as far as my gear is concerned), but for really critical stuff we've had
our own UPS.

I don't know if that really works for 100TB and up though.  That's a lot
of disk == a lot of UPS capacity.  And again, I'm not trying to take away
from the fact that this is a significant zfs problem.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Tim
On Wed, Feb 11, 2009 at 1:36 PM, Frank Cusack  wrote:

>
> if you have 100TB of data, wouldn't you have a completely redundant
> storage network -- dual FC switches on different electrical supplies,
> etc.  i've never designed or implemented a storage network before but
> such designs seem common in the literature and well supported by
> Solaris.  i have done such designs with data networks and such
> redundancy is quite common.
>
> i mean, that's a lot of data to go missing due to a single device
> failing -- which it will.
>
> not to say it's not a problem with zfs, just that in the real world,
> it should be mitigated since your storage network design would overcome
> a single failure *anyway* -- regardless of zfs.
>

It's hardly uncommon for an entire datacenter to go down, redundant power or
not.  When it does, if it means I have to restore hundreds of terabytes if
not petabytes from tape instead of just restoring the files that were
corrupted or running an fsck, we've got issues.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Tim
On Wed, Feb 11, 2009 at 11:46 AM, Kyle McDonald wrote:

>
> Yep. I've never unplugged a USB drive on purpose, but I have left a drive
> plugged into the docking station, Hibernated windows XP professional,
> undocked the laptop, and then woken it up later undocked. It routinely would
> pop up windows saying that a 'delayed write' was not successful on the now
> missing drive.
>
> I've always counted myself lucky that any new data written to that drive
> was written long long before I hibernated, becuase have yet to find any
> problems with that data, (but I don't read it very often if at all.) But it
> is luck only!
>
>  -Kyle
>

Right, except the OP stated he unmounted the filesystem in question, and it
was the *ONLY* one on the drive, meaning there is absolutely 0 chance of
their being pending writes.  There's nothing to write to.

 I don't know what exactly it is you put on your USB drives, but I'm
certainly aware of whether or not things on mine are in use before pulling
the drive out.  If a picture is open and in an editor, I'm obviously not
going to save it then pull the drive mid-save.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 13:45, Ian Collins wrote:
> David Dyer-Bennet wrote:
>> I've spent $2000 on hardware and, by now, hundreds of hours of my time
>> trying to get and keep a ZFS-based home NAS working.
>
> Hundreds of hours doing what?  I just plugged in the drives, built the
> pool and left the box in a corner for the past couple of years.  It's
> been upgraded twice, from build 62 to 72 to get the SATA framework and
> then to b101 for CIFS.

Well, good for you.  It took me a lot of work to get it working in the
first place (and then with only 4 of my 8 hot-swap bays, 4 of my 6 eSATA
connections on the motherboard) working.  Before that, I'd spent quite a
lot of time trying to get VMWare to run Solaris, which it wouldn't back
then.  I did manage to get Parallels, I think it was, to let me create a
Solaris system and then a ZFS pool to play with (this was back before
OpenSolaris and before any sort of LiveCD I could find).  Then I had a
series of events starting in December of last year that, in hindsight, I
think were mainly or entirely one memory SIMM going bad, which caused me
to upgrade to 2008.11 and also have to restore my main pool from backup. 
Oh, and converted from using Samba to using CIFS.   I'm just now getting
close to having things up working again usably and stably, still working
on backup.  I do still have some problems with file access permissions I
know, due to the new different handling of ACLs I guess.

And I wasn't a Solaris admin to begin with.  I guess SunOS back when was
the first Unix I had root on, but since then I've mostly worked with Linux
(including my time as news admin for a local ISP, and my years as an
engineer with Sun, where I was in the streaming video server group).  In
some ways a completely UNfamiliar system might have been easier :-).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Thommy M . Malmström
> after working for 1 month with ZFS on 2 external USB
> drives I have experienced, that the all new zfs
> filesystem is the most unreliable FS I have ever
> seen.

Troll.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Ian Collins

David Dyer-Bennet wrote:

I've spent $2000 on hardware and, by now, hundreds of hours of my time
trying to get and keep a ZFS-based home NAS working.  


Hundreds of hours doing what?  I just plugged in the drives, built the 
pool and left the box in a corner for the past couple of years.  It's 
been upgraded twice, from build 62 to 72 to get the SATA framework and 
then to b101 for CIFS.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Frank Cusack

On February 11, 2009 2:07:47 AM -0800 Gino  wrote:

I agree but I'd like to point out that the MAIN problem with ZFS is that
because of a corruption you-ll loose ALL your data and there is no way to
recover it. Consider an example where you have 100TB of data and a fc
switch fails or other hw problem happens during I/O on a single file.
With UFS you'll probably get corruption on that single file. With ZFS
you'll loose all your data.  I totally agree that ZFS is theoretically
much much much much much better than UFS but in real world application
having a risk to loose access to an entire pool is not acceptable.


if you have 100TB of data, wouldn't you have a completely redundant
storage network -- dual FC switches on different electrical supplies,
etc.  i've never designed or implemented a storage network before but
such designs seem common in the literature and well supported by
Solaris.  i have done such designs with data networks and such
redundancy is quite common.

i mean, that's a lot of data to go missing due to a single device
failing -- which it will.

not to say it's not a problem with zfs, just that in the real world,
it should be mitigated since your storage network design would overcome
a single failure *anyway* -- regardless of zfs.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Fredrich Maney
On Wed, Feb 11, 2009 at 11:19 AM, Tim  wrote:
> On Tue, Feb 10, 2009 at 11:44 PM, Fredrich Maney 
> wrote:

>> Ah... an illiterate AND idiotic bigot. Have you even read the manual
>> or *ANY* of the replies to your posts? *YOU* caused the situation that
>> resulted in your data being corrupted. Not Sun, not OpenSolaris, not
>> ZFS and not anyone on this list. Yet you feel the need to blame ZFS
>> and insult the people that have been trying to help you understand
>> what happened and why you shouldn't do what you did.

> #1 English is clearly not his native tongue.  Calling someone idiotic and
> illiterate when they're doing as well as he is in a second language is not
> only inaccurate, it's "idiotic".

I have a great deal of respect for his command of more than one
language. What I don't have any respect for is his complete
unwillingness to actually read the dozens of responses that have all
said the same thing, namely that his problems are self inflicted due
his refusal to read the documentation. I refrained from calling him an
idiot until after he proved himself one by spewing his blind bigotry
against the US. All in all, I'd say he got far better treatment than
he gave and infinitely better than he deserved.

>> ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like
>> SVM or VxVM. It is both a filesystem and a logical volume manager. As
>> such, like all LVM solutions, there are two steps that you must
>> perform to safely remove a disk: unmount the filesystem and quiesce
>> the volume. That means you *MUST*, in the case of ZFS, issue 'umount
>> filesystem' *AND* 'zpool export' before you yank the USB stick out of
>> the machine.
>>
>> Effectively what you did was create a one-sided mirrored volume with
>> one filesystem on it, then put your very important (but not important
>> enough to bother mirroring or backing up) data on it. Then you
>> unmounted the filesystem and ripped the active volume out of the
>> machine. You got away with it a couple of times because just how good
>> of a job the ZFS developers did at idiot proofing it, but when it
>> finally got to the point where you lost your data, you came here to
>> bitch and point fingers at everyone but the responsible party (hint,
>> it's you). When your ignorance (and fault) was pointed out to you, you
>> then resorted to personal attacks and slurs. Nice. Very professional.
>> Welcome to the bit-bucket.
>
> All that and yet the fact remains: I've never "ejected" a USB drive from OS
> X or Windows, I simply pull it and go, and I've never once lost data, or had
> it become unrecoverable or even corrupted.

You've been lucky then. I've lost data and had corrupted filesystems
on USB sticks on both of those OSes, as well as several Linux and BSD
variants, from doing just that.

[...]

fpsm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Frank Cusack
On February 11, 2009 12:21:03 PM -0600 David Dyer-Bennet  
wrote:

I've spent $2000 on hardware and, by now, hundreds of hours of my time
trying to get and keep a ZFS-based home NAS working.  Because it's the
only affordable modern practice, my backups are on external drives (USB
drives because that's "the" standard for consumer external drives, they
were much cheaper when I bought them than any that supported Firewire at
the 1TB size).  So hearing how easy it is to muck up a ZFS pool on USB is
leading me, again, to doubt this entire enterprise.


Same here, except I have no doubts.  As I only use the USB for backup,
I'm quite happy with it.  I have a 4-disk enclosure that accepts SATA
drives.

My main storage is a 12-bay SAS/SATA enclosure.

After my own experience with USB (I still have the problem that I cannot
create new pools while another USB drive is present with a zpool on it,
whether or not that zpool is active ... no response on that thread yet
and I expect never), I'm not thrilled with it and suspect some of the
problem lies in the way that USB is handled differently than other
physical connections (can't use 'format', e.g.).  Anyway to get back to
the point I wouldn't want to use it for primary storage, even if it
were only 2 drives.  That's unfortunate, but in line with Solaris'
hardware support, historically.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 12:23, Bob Friesenhahn wrote:
> On Wed, 11 Feb 2009, David Dyer-Bennet wrote:
>>
>> Then again, I've never lost data during the learning period, nor on the
>> rare occasions where I just get it wrong.  This is good; not quite
>> remembering to eject a USB memory stick is *so* easy.
>
> With Windows and OS-X, it is up to the *user* to determine if they
> have lost data.  This is because they are designed to be user-friendly
> operating systems.  If the disk can be loaded at all, Windows and OS-X
> will just go with what is left.  If Windows and OS-X started to tell
> users that they lost some data, then those users would be in a panic
> (just like we see here).

I don't carry much on my memory stick -- mostly stuff in transit from one
place to another.   Two things that live there constantly are my encrypted
password database, and some private keys (encrypted under passphrases).

So the stuff on the memory stick tends to get looked at, and the stuff
that lives there is in a format where corruption is very likely to get
noticed.

So while I can't absolutely swear that I never lost data I didn't notice
losing, I'm fairly confident that no data was lost.  And I'm absolutely
sure no data THAT I CARED ABOUT was lost, which is all that really
matters.

> The whole notion of "journaling" is to intentionally lose data by
> rolling back to a known good point.  More data might be lost than if
> the task was left to a tool like 'fsck' but the journaling approach is
> much faster.  Windows and OS-X are highly unlikely to inform you that
> some data was lost due to the filesystem being rolled back.

True about journaling.

This applies to NTFS disks for Windows, but not to FAT systems (which
aren't journaled); and memory sticks for me are always FAT systems.

Databases have something of an all-or-nothing problem as well, for that
matter, and for something of the same reasons.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Bob Friesenhahn

On Wed, 11 Feb 2009, David Dyer-Bennet wrote:


Then again, I've never lost data during the learning period, nor on the
rare occasions where I just get it wrong.  This is good; not quite
remembering to eject a USB memory stick is *so* easy.


With Windows and OS-X, it is up to the *user* to determine if they 
have lost data.  This is because they are designed to be user-friendly 
operating systems.  If the disk can be loaded at all, Windows and OS-X 
will just go with what is left.  If Windows and OS-X started to tell 
users that they lost some data, then those users would be in a panic 
(just like we see here).


The whole notion of "journaling" is to intentionally lose data by 
rolling back to a known good point.  More data might be lost than if 
the task was left to a tool like 'fsck' but the journaling approach is 
much faster.  Windows and OS-X are highly unlikely to inform you that 
some data was lost due to the filesystem being rolled back.


Your comments about write caching being a factor seem reasonable.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 10:49, Bob Friesenhahn wrote:
> On Wed, 11 Feb 2009, David Dyer-Bennet wrote:
>> This all-or-nothing behavior of ZFS pools is kinda scary.  Turns out I'd
>> rather have 99% of my data than 0% -- who knew?  :-)  I'd much rather
>> have
>> 100.00% than either of course, and I'm running ZFS with mirroring, and
>> doing regular backups, because of that.
>
> It seems to me that this level of terror is getting out of hand.  I am
> glad to see that you made it to work today since statistics show that
> you might have gotten into a deadly automobile accident on the way to
> the office and would no longer care about your data.  In fact, quite a
> lot of people get in serious automobile accidents yet we rarely hear
> such levels of terror regarding taking a drive in an automobile.
>
> Most people are far more afraid of taking a plane flight than taking a
> drive in their car, even though taking a drive in their car is far
> more risky.
>
> It is best to put risks in perspective.  People are notoriously poor
> at evaluating risks and paranoia is often the result.

All true (and I'm certainly glad I made it to work myself; I did drive,
which is one of the most dangerous things most people do).

I think you're overstating my terror level, though; I'd say I'm at yellow;
not even orange.

I've spent $2000 on hardware and, by now, hundreds of hours of my time
trying to get and keep a ZFS-based home NAS working.  Because it's the
only affordable modern practice, my backups are on external drives (USB
drives because that's "the" standard for consumer external drives, they
were much cheaper when I bought them than any that supported Firewire at
the 1TB size).  So hearing how easy it is to muck up a ZFS pool on USB is
leading me, again, to doubt this entire enterprise.  Am I really better
off than I would be with an Infrant Ready NAS, or a Drobo?  I'm certainly
far behind financially and with my time.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 11:35, Toby Thain wrote:
>
> On 11-Feb-09, at 11:19 AM, Tim wrote:
>
>> ...
>> And yes, I do keep checksums of all the data sitting on them and
>> periodically check it.  So, for all of your ranting and raving, the
>> fact remains even a *crappy* filesystem like fat32 manages to
>> handle a hot unplug without any prior notice without going belly up.
>
> By chance, certainly not design.

No, I do think it's by design -- it's because the design isn't
aggressively exploiting possible performance.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread David Dyer-Bennet

On Wed, February 11, 2009 11:21, Bob Friesenhahn wrote:
> On Wed, 11 Feb 2009, Tim wrote:
>>
>> All that and yet the fact remains: I've never "ejected" a USB drive from
>> OS
>> X or Windows, I simply pull it and go, and I've never once lost data, or
>> had
>> it become unrecoverable or even corrupted.
>>
>> And yes, I do keep checksums of all the data sitting on them and
>> periodically check it.  So, for all of your ranting and raving, the fact
>> remains even a *crappy* filesystem like fat32 manages to handle a hot
>> unplug
>> without any prior notice without going belly up.
>
> This seems like another one of your trolls.  Any one of us who have
> used USB drives under OS-X or Windows knows that the OS complains
> quite a lot if you just unplug the drive so we all learn how to do
> things properly.

Then again, I've never lost data during the learning period, nor on the
rare occasions where I just get it wrong.  This is good; not quite
remembering to eject a USB memory stick is *so* easy.

We do all know why violating protocols here works so much of the time,
right?  It's because Windows is using very simple, old-fashioned
strategies to write to the USB devices.  Write caching is nonexistent, or
of very short duration, for example.  So if IO has quiesced to the device,
it's been several seconds since the last IO, it's nearly certain to just
pull it.  Nearly.

ZFS is applying much more modern, much more aggressive, optimizing
strategies.  This is entirely good; ZFS is intended for a space where
that's important a lot of the time.  But one tradeoff is that those rules
become more important.

> You must have very special data if you compute independent checksums
> for each one of your files, and it leaves me wondering why you think
> that data is correct due to being checksummed.  Checksumming incorrect
> data does not make that data correct.

Can't speak for him, but I have par2 checksums and redundant data for lots
of my old photos on disk.  I created them before writing archival optical
disks of the data, to give me some additional hope of recovering the data
in the long run.

I don't, in fact, know that most of those photos are actually valid data;
only the ones I've viewed after creating the par2 checksums (and I can't
rule out weird errors that don't result in corrupting the whole rest of
the image even then).  Still, once I've got the checksum on file, I can at
least determine that I've had a disk error in many cases (not quite
identical to determining that the data is still valid; after all, the data
and the checksum could have been corrupted in such a way that I get a
false positive on the checksum).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Kyle McDonald

On 2/11/2009 12:35 PM, Toby Thain wrote:


On 11-Feb-09, at 11:19 AM, Tim wrote:


...
And yes, I do keep checksums of all the data sitting on them and 
periodically check it.  So, for all of your ranting and raving, the 
fact remains even a *crappy* filesystem like fat32 manages to handle 
a hot unplug without any prior notice without going belly up.


By chance, certainly not design.
Yep. I've never unplugged a USB drive on purpose, but I have left a 
drive plugged into the docking station, Hibernated windows XP 
professional, undocked the laptop, and then woken it up later undocked. 
It routinely would pop up windows saying that a 'delayed write' was not 
successful on the now missing drive.


I've always counted myself lucky that any new data written to that drive 
was written long long before I hibernated, becuase have yet to find any 
problems with that data, (but I don't read it very often if at all.) But 
it is luck only!


  -Kyle



--Toby




--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Toby Thain


On 11-Feb-09, at 11:19 AM, Tim wrote:


...
And yes, I do keep checksums of all the data sitting on them and  
periodically check it.  So, for all of your ranting and raving, the  
fact remains even a *crappy* filesystem like fat32 manages to  
handle a hot unplug without any prior notice without going belly up.


By chance, certainly not design.

--Toby




--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Bob Friesenhahn

On Wed, 11 Feb 2009, Tim wrote:


All that and yet the fact remains: I've never "ejected" a USB drive from OS
X or Windows, I simply pull it and go, and I've never once lost data, or had
it become unrecoverable or even corrupted.

And yes, I do keep checksums of all the data sitting on them and
periodically check it.  So, for all of your ranting and raving, the fact
remains even a *crappy* filesystem like fat32 manages to handle a hot unplug
without any prior notice without going belly up.


This seems like another one of your trolls.  Any one of us who have 
used USB drives under OS-X or Windows knows that the OS complains 
quite a lot if you just unplug the drive so we all learn how to do 
things properly.


You must have very special data if you compute independent checksums 
for each one of your files, and it leaves me wondering why you think 
that data is correct due to being checksummed.  Checksumming incorrect 
data does not make that data correct.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread D. Eckert
(...)
Ah... an illiterate AND idiotic bigot.
(...)

I apologize for my poor English. Yes, it's not my mother tongue, but I have no 
doubt at all, that this
discussion could be continued in German as well.

But just to make it clear:

Finally I did understand very well were I went wrong. But it wasn't something I 
did expect.

Due to the fact, that I was using a single zpool with no other filesystems 
inside I thought, unmounting it with the command 'zfs umount usbhdd1' and 
checking if usbhdd1 is still shown in the output of 'mount' (it wasn't), I 
expected, that the pool was clearly unmounted and there is no risk to yank the 
USB wire.

Even from the view of logic, that 'zpool export usbhdd1' will release the 
entire pool from the system should 'zfs umount usbhdd1' do the same in case no 
other filesystem exists inside this particular pool.

if the output of the mount cmd doesn't show your zfs pool anymore what else 
should be there what can be unmounted?

This is just what caused confusion on my side, and that's human, but I learned 
for the future.

Regards.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Bob Friesenhahn

On Wed, 11 Feb 2009, David Dyer-Bennet wrote:

This all-or-nothing behavior of ZFS pools is kinda scary.  Turns out I'd
rather have 99% of my data than 0% -- who knew?  :-)  I'd much rather have
100.00% than either of course, and I'm running ZFS with mirroring, and
doing regular backups, because of that.


It seems to me that this level of terror is getting out of hand.  I am 
glad to see that you made it to work today since statistics show that 
you might have gotten into a deadly automobile accident on the way to 
the office and would no longer care about your data.  In fact, quite a 
lot of people get in serious automobile accidents yet we rarely hear 
such levels of terror regarding taking a drive in an automobile.


Most people are far more afraid of taking a plane flight than taking a 
drive in their car, even though taking a drive in their car is far 
more risky.


It is best to put risks in perspective.  People are notoriously poor 
at evaluating risks and paranoia is often the result.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >