Re: [zfs-discuss] ZFS + DB + default blocksize

2007-11-12 Thread Roch - PAE
Louwtjie Burger writes:
  Hi
  
  What is the impact of not aligning the DB blocksize (16K) with ZFS,
  especially when it comes to random reads on single HW RAID LUN.
  
  How would one go about measuring the impact (if any) on the workload?
  

The DB will have a bigger in memory footprint as you
will need to keep the ZFS record for the lifespan of the DB
block.

This probably means you want to partition memory between 
DB cache/ZFS ARC cache according to the ratio of DB blocksize/ZFS recordize.

Then I imagine you have multiple spindles associated with
the lun. If you're lun is capable of 2000 IOPS over a
200MB/sec data channel then during 1 second at full speed :


2000 IOPS * 16K = 32MB of data transfer,

and this fits  in the channel capability.
But using say a ZFS blocks of 128K then

2000 IOPS * 128K = 256MB,

which  overload the  channel. So  in this  example the  data
channel would  saturate  first preventing you  from reaching
those 2000 IOPS.   But with enough  memory  and data channel
throughput then it's a  good idea to  keep the ZFS recordize
large.


-r


  Thank you
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-12 Thread Martin
IIn this PC, I'm using the PCI card 
http://www.intel.com/network/connectivity/products/pro1000gt_desktop_adapter.htm
 , but, more recentlyI'm using the PCI Express card 
http://www.intel.com/network/connectivity/products/pro1000pt_desktop_adapter.htm

Note that the latter didn't have PXE and the boot ROM enabled (for JumpStart), 
contrary the the documentation, and I had to download the DOS program from the 
Intel site to enable it.  (please ask if anyone needs the URL) 

...so, for an easy life, I recommend the Intel PRO/ 1000 GT Desktop
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Intent Log removal

2007-11-12 Thread Cyril Plisko
Hi !

I played recently with Gigabyte i-RAM card (which is basically an SSD)
as a log device for a ZFS pool. However, when I tried to remove it - I need
to give the card back - it refused to do so. It looks like I am hitting

6574286 removing a slog doesn't work [1]

Is there any workaround ? I really need to this card removed and I
cannot afford losing the data on that pool.
Any hints ?

[1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286

-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intent Log removal

2007-11-12 Thread Neelakanth Nadgir
You could always replace this device by another one of same, or
bigger size using zpool replace.
-neel

Cyril Plisko wrote:
 Hi !
 
 I played recently with Gigabyte i-RAM card (which is basically an SSD)
 as a log device for a ZFS pool. However, when I tried to remove it - I need
 to give the card back - it refused to do so. It looks like I am hitting
 
 6574286 removing a slog doesn't work [1]
 
 Is there any workaround ? I really need to this card removed and I
 cannot afford losing the data on that pool.
 Any hints ?
 
 [1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Modify fsid/guid of dataset for NFS failover

2007-11-12 Thread Jonathan Edwards

On Nov 10, 2007, at 23:16, Carson Gaspar wrote:

 Mattias Pantzare wrote:

 As the fsid is created when the file system is created it will be the
 same when you mount it on a different NFS server. Why change it?

 Or are you trying to match two different file systems? Then you also
 have to match all inode-numbers on your files. That is not  
 possible at
 all.

 It is, if you do block replication between the servers (drbd on Linux,
 or the Sun product whose name I'm blanking on at the moment).

AVS (or Availability Suite) ..

http://www.opensolaris.org/os/project/avs/

Jim Dunham does a nice demo here for block replication on zfs (see  
sidebar)

 What isn't clear is if zfs send/recv retains inode numbers... if it
 doesn't that's a really sad thing, as we won't be able to use ZFS to
 replace NetApp snapmirrors.

zfs send/recv comes out of the DSL which i believe will generate a  
unique fsid_guid .. for mirroring you'd really want to use AVS.

btw - you can also look at the Cluster SUNWnfs agent in the ohac  
community:
http://opensolaris.org/os/community/ha-clusters/ohac/downloads/

hth
---
.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intent Log removal

2007-11-12 Thread Cyril Plisko
On Nov 12, 2007 5:51 PM, Neelakanth Nadgir [EMAIL PROTECTED] wrote:
 You could always replace this device by another one of same, or
 bigger size using zpool replace.

Indeed. Provided that I always have an unused device of same or
bigger size, which is seldom the case.

:(


 -neel


 Cyril Plisko wrote:
  Hi !
 
  I played recently with Gigabyte i-RAM card (which is basically an SSD)
  as a log device for a ZFS pool. However, when I tried to remove it - I need
  to give the card back - it refused to do so. It looks like I am hitting
 
  6574286 removing a slog doesn't work [1]
 
  Is there any workaround ? I really need to this card removed and I
  cannot afford losing the data on that pool.
  Any hints ?
 
  [1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286
 





-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intent Log removal

2007-11-12 Thread Tim Spriggs

Cyril Plisko wrote:
 On Nov 12, 2007 5:51 PM, Neelakanth Nadgir [EMAIL PROTECTED] wrote:
   
 You could always replace this device by another one of same, or
 bigger size using zpool replace.
 

 Indeed. Provided that I always have an unused device of same or
 bigger size, which is seldom the case.

 :(
   

In a pinch you could use an iSCSI target :)

   
 -neel


 Cyril Plisko wrote:
 
 Hi !

 I played recently with Gigabyte i-RAM card (which is basically an SSD)
 as a log device for a ZFS pool. However, when I tried to remove it - I need
 to give the card back - it refused to do so. It looks like I am hitting

 6574286 removing a slog doesn't work [1]

 Is there any workaround ? I really need to this card removed and I
 cannot afford losing the data on that pool.
 Any hints ?

 [1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286

   
 



   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Response to phantom dd-b post

2007-11-12 Thread can you guess?
 
 In the previous and current responses, you seem quite
 determined of 
 others misconceptions.

I'm afraid that your sentence above cannot be parsed grammatically.  If you 
meant that I *have* determined that some people here are suffering from various 
misconceptions, that's correct.

 Given that fact and the first
 paragraph of your 
 response below, I think you can figure out why nobody
 on this list will 
 reply to you again.

Predicting the future (especially the actions of others) is usually a feat 
reserved for psychics:  are you claiming to be one (perhaps like the poster who 
found it 'clear' that I was a paid NetApp troll - one of the aforementioned 
misconceptions)?

Oh, well - what can one expect from someone who not only top-posts but 
completely fails to trim quotations?  I see that you appear to be posting from 
a .edu domain, so perhaps next year you will at least mature to the point of 
becoming sophomoric.

Whether people here find it sufficiently uncomfortable to have their beliefs 
(I'm almost tempted to say 'faith', in some cases) challenged that they'll 
indeed just shut up I really wouldn't presume to guess.  As for my own 
attitude, if you actually examine my responses rather than just go with your 
gut (which doesn't seem to be a very reliable guide in your case) you'll find 
that I tend to treat people pretty much as they deserve.  If they don't pay 
attention to what they're purportedly responding to or misrepresent what I've 
said, I do chide them a bit (since I invariably *do* pay attention to what 
*they* say and make sincere efforts to respond to exactly that), and if they're 
confrontational and/or derogatory then they'll find me very much right back in 
their face.

Perhaps it's some kind of territorial thing - that people bridle when they find 
a seriously divergent viewpoint popping up in a cozy little community where 
most most of them have congregated because they already share the beliefs of 
the group.  Such in-bred communities do provide a kind of sanctuary and feeling 
of belonging:  perhaps it's unrealistic to expect most people to be able to 
rise above that and deal rationally with the wider world's entry into their 
little one.

Or not:  we'll see.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-12 Thread A Darren Dunham
On Sat, Nov 10, 2007 at 02:05:04PM -0200, Toby Thain wrote:
  Yup - that's exactly the kind of error that ZFS and WAFL do a  
  perhaps uniquely good job of catching.
 
 WAFL can't catch all: It's distantly isolated from the CPU end.

How so?  The checksumming method is different from ZFS, but as far as I
understand rather similar in capability. 

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Error: Volume size exceeds limit for this system

2007-11-12 Thread Chris Murray
Thanks for the help guys - unfortunately the only hardware at my disposal just 
at the minute is all 32 bit, so I'll just have to wait a while and fork out on 
some 64-bit kit before I get the drives. I'm a home user so I'm glad I didnt 
buy the drives and discover I couldnt use them without spending even more!!

Chris
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mdb ::memstat including zfs buffer details?

2007-11-12 Thread Jonathan Adams
On Nov 8, 2007 4:21 PM, Nathan Kroenert [EMAIL PROTECTED] wrote:

 Hey all -

 Just a quick one...

 Is there any plan to update the mdb ::memstat dcmd to present ZFS
 buffers as part of the summary?

 At present, we get something like:
   ::memstat
 Page SummaryPagesMB  %Tot
      
 Kernel  28859   112   13%
 Anon34230   133   15%
 Exec and libs   10305405%
 Page cache  16876658%
 Free (cachelist)26145   102   12%
 Free (freelist)105176   410   47%
 Balloon 0 00%

 Total  221591   865

 Which just (as far as I can tell) includes the zfs buffers in Kernel
 memory.

 And what I'd really like is:

   ::memstat
 Page SummaryPagesMB  %Tot
      
 Kernel  28859   112   13%
 Anon34230   133   15%
 Exec and libs   10305405%
 Page cache  16876658%
 Free (cachelist)26145   102   12%
 Free (zfscachelist)   1827346  1700   xx%
 Free (freelist)105176   410   47%
 Balloon 0 00%

 Total  221591   865

 Which then represents the pages that *could* be freed up by ZFS in the
 event that they are needed for other purposes...

 Any thoughts on this? Is there a great reason why we cannot do this?

 Also - Other utilities like vmstat, etc that print out memory...


File an RFE.

I don't think it should be too bad (for ::memstat), given that (at least in
Nevada), all of the ZFS caching data belongs to the zvp vnode, instead of
kvp. The work that made that change was:

4894692 caching data in heap inflates crash dump

Of course, this so-called free memory does act a bit differently than the
cachelist, etc., so maybe it should be named slightly differently.

Cheers,
- jonathan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Response to phantom dd-b post

2007-11-12 Thread Peter Schuller
  You have to detect the problem first.   ZFS is in a
  much better position
  to detect the problem due to block checksums.

 Bulls***, to quote another poster here who has since been strangely quiet. 
 The vast majority of what ZFS can detect (save for *extremely* rare
 undetectable bit-rot and for real hardware (path-related) errors that
 studies like CERN's have found to be very rare - and you have yet to
 provide even anecdotal evidence to the contrary) 

You wanted anectodal evidence: During my personal experience with only two 
home machines, ZFS has helped me detect corruption at least three times in a 
period of a few months.

One due to silent corruption due to a controller bug (and a driver that did 
not work around it).

Another time corruption during hotswapping (though this does not necessarily 
count since I did it on hardware that I did not know was supposed to support 
it, and I would not have attempted it to begin with otherwise).

Third time I don't remember now. You may disregard it if you wish.

In my professional life I have seen bitflips a few times in the middle of real 
live data running on real servers that are used for important data. As a 
result I have become pretty paranoid about it all, making heavy use of par2.

(I have also seen various file system corruption / system instability issues 
that may very well be consistent with bit flips / other forms of corruption, 
but where there has been no proof of the underlying cause of the problems.)

 can also be detected by 
 scrubbing, and it's arguably a lot easier to apply brute-force scrubbing
 (e.g., by scheduling a job that periodically copies your data to the null
 device if your system does not otherwise support the mechanism) than to
 switch your file system.

How would your magic scrubbing detect arbitrary data corruption without 
checksumming or redundancy?

A lot of the data people save does not have checksumming. Even if it does, the 
file system meta data typically does not. Nor does various minor information 
related to the data (let's day the meta data associated with your backup of 
your other data, even if that data has some internal checksumming).

I think one needs to stop making excuses by observing properties of specific 
file types and simlar. You can always use FEC to do error correction on 
arbitrary files if you really feel they are important. But the point is that 
with ZFS you get detection of *ANY* bit error for free (essentially), and 
optionally correction if you have redundancy. it doesn't matter if it's 
internal file system meta data, that important file you didn't consider 
important from a corruption perspective, or in the middle of some larger file 
that you may or may not have applied FEC on otherwise.

Even without fancy high-end requirements, it is nice to have some good 
statistical reason to believe that random corruption does not occurs. Even if 
only to drive your web browsers or E-Mail client; at least you can be sure 
that random bitflips (unless they either are undetected due to an 
implementation bug, or occurrs in memory/etc) is not the cause of your random 
application misbehavior.

It's like choosing RAM. You can make excuses all you want about doing proper 
testing, buying good RAM, or having redundancy at other levels etc - but you 
will still sleep better knowing you have ECC RAM than some random junk.

Or let's do the seat belt analogy. You can try to convince yourself/other 
people all you want that you are a safe driver, that you should not drive in 
a way that allows crashes or whatever else - but you are still going to be 
safer with a seat belt than without it.

This is also why we care about fsync(). It doesn't matter that you spent 
$10 on that expensive server with redundant PSU:s hooked up to redundant 
UPS systems. *SHIT HAPPENS*, and when it does, you want to be maximally 
protected.

Yes, ZFS is not perfect. But to me, both in the context of personal use and 
more serious use, ZFS is, barring some implementation details, more or less 
exactly what I have always wanted and solves pretty much all of the major 
problems with storage.

And let me be clear: That is not hype. It's ZFS actually providing what I have 
wanted, and what I knew I wanted even before ZFS (or WAFL or whatever else) 
was ever on my radar.

For some reason some people seem to disagree. That's your business. But the 
next time you have a power  outtage, you'll be sorry if you had a database 
that didn't do fsync()[1], a filesystem that had no correction checking 
whatsoever[2], a RAID5 system that didn't care about parity correctness in 
the face of a crash[3], and a filesystem or application whose data is not 
structured such that you can ascertain *what* is broken after the crash and 
what is not[4].

You will be even more sorry two years later when something really important 
malfunctioned as a result of undetected corruption two years earlier...

[1] Because of course all serious players use 

Re: [zfs-discuss] mdb ::memstat including zfs buffer details?

2007-11-12 Thread johansen
I don't think it should be too bad (for ::memstat), given that (at
least in Nevada), all of the ZFS caching data belongs to the zvp
vnode, instead of kvp.

ZFS data buffers are attached to zvp; however, we still keep metadata in
the crashdump.  At least right now, this means that cached ZFS metadata
has kvp as its vnode.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Modify fsid/guid of dataset for NFS failover

2007-11-12 Thread Darren J Moffat
asa wrote:
 I would like for all my NFS clients to hang during the failover, then  
 pick up trucking on this new filesystem, perhaps obviously failing  
 their writes back to the apps which are doing the writing.  Naive?

The OpenSolaris NFS client does this already - has done since IIRC 
around Solaris 2.6.  The knowledge is in the NFS client code.

For NFSv4 this functionality is part of the standard.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mdb ::memstat including zfs buffer details?

2007-11-12 Thread Jonathan Adams
On Nov 12, 2007 4:16 PM, [EMAIL PROTECTED] wrote:

 I don't think it should be too bad (for ::memstat), given that (at
 least in Nevada), all of the ZFS caching data belongs to the zvp
 vnode, instead of kvp.

 ZFS data buffers are attached to zvp; however, we still keep metadata in
 the crashdump.  At least right now, this means that cached ZFS metadata
 has kvp as its vnode.


Still, it's better than what you get currently.

Cheers,
- jonathan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best option for my home file server?

2007-11-12 Thread Christopher
I went ahead and bought a M9N-Sli motherboard with 6 sata controllers and also 
a promise tx4 (4x sata300 non-raid) pci controller. Anyone know if the tx4 is 
suppoerted in OpenSolaris? If it's as badly supported as the (crappy) Sil 
chipsets i'm better of with OpenFiler (linux) I think.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-12 Thread can you guess?
Thanks for taking the time to flesh these points out.  Comments below:

...

 The compression I see varies from something like 30%
 to 50%, very 
 roughly (files reduced *by* 30%, not files reduced
 *to* 30%).   This is 
 with the Nikon D200, compressed NEF option.  On some
 of the lower-level 
 bodies, I believe the compression can't be turned
 off.  Smaller files 
 will of course get hit less often -- or it'll take
 longer to accumulate 
 the terrabyte, is how I'd prefer to think of it.

Either viewpoint works.  And since the compression is not that great, you still 
wind up consuming a lot of space.  Effectively, you're trading (at least if 
compression is an option rather than something that you're stuck with) the 
possibility that a picture will become completely useless should a bit get 
flipped for a storage space reduction of 30% - 50% - and that's a good trade, 
since it effectively allows you to maintain a complete backup copy on disk (for 
archiving, preferably off line) almost for free compared with the uncompressed 
option.

 
 Damage that's fixable is still damage; I think of
 this in archivist 
 mindset, with the disadvantage of not having an
 external budget to be my 
 own archivist. 

There will *always* be the potential for damage, so the key is to make sure 
that any damage is easily fixable.  The best way to do this is to a) keep 
multiple copies, b) keep them isolated from each other (that's why RAID is not 
a suitable approach to archiving), and c) check (scrub) them periodically to 
ensure that if you lose a piece (whether a bit or a sector) you can restore the 
affected data from another copy and thus return your redundancy to full 
strength.

For serious archiving, you probably want to maintain at least 3 such copies 
(possibly more if some are on media of questionable longevity).  For normal 
use, there's probably negligible risk of losing any data if you maintain only 
two on reasonably reliable media:  'MAID' experience suggests that scrubbing as 
little as every few months reduces the likelihood of encountering detectable 
errors while restoring redundancy by several orders of magnitude (i.e., down to 
something like once in a PB at worst for disks - becoming comparable to the 
levels of bit-flip errors that the disk fails to detect at all).

Which is what I've been getting at w.r.t. ZFS in this particular application 
(leaving aside whether it can reasonably be termed a 'consumer' application - 
because bulk video storage is becoming one and it not only uses a similar 
amount of storage space but should probably be protected using similar 
strategies):  unless you're seriously worried about errors in the once-per-PB 
range, ZFS primarily just gives you automated (rather than manually-scheduled) 
scrubbing (and only for your on-line copy).  Yes, it will help detect hardware 
faults as well if they happen to occur between RAM and the disk (and aren't 
otherwise detected - I'd still like to know whether the 'bad cable' experiences 
reported here occurred before ATA started CRCing its transfers), but while 
there's anecdotal evidence of such problems presented here it doesn't seem to 
be corroborated by the few actual studies that I'm familiar with, so that risk 
is difficult to quantify.

Getting back to 'consumer' use for a moment, though, given that something like 
90% of consumers entrust their PC data to the tender mercies of Windows, and a 
large percentage of those neither back up their data, nor use RAID to guard 
against media failures, nor protect it effectively from the perils of Internet 
infection, it would seem difficult to assert that whatever additional 
protection ZFS may provide would make any noticeable difference in the consumer 
space - and that was the kind of reasoning behind my comment that began this 
sub-discussion.

By George, we've managed to get around to having a substantive discussion after 
all:  thanks for persisting until that occurred.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Response to phantom dd-b post

2007-11-12 Thread can you guess?
Well, I guess we're going to remain stuck in this sub-topic for a bit longer:

  The vast majority of what ZFS can detect (save for
 *extremely* rare
  undetectable bit-rot and for real hardware
 (path-related) errors that
  studies like CERN's have found to be very rare -
 and you have yet to
  provide even anecdotal evidence to the contrary) 
 
 You wanted anectodal evidence:

To be accurate, the above was not a solicitation for just any kind of anecdotal 
evidence but for anecdotal evidence that specifically contradicted the notion 
that otherwise undetected path-related hardware errors are 'very rare'.

 During my personal
 experience with only two 
 home machines, ZFS has helped me detect corruption at
 least three times in a 
 period of a few months.
 
 One due to silent corruption due to a controller bug
 (and a driver that did 
 not work around it).

If that experience occurred using what could be considered normal consumer 
hardware and software, that's relevant (and disturbing).  As I noted earlier, 
the only path-related problem that the CERN study unearthed involved their 
(hardly consumer-typical) use of RAID cards, the unusual demands that those 
cards placed on the WD disk firmware (to the point where it produced on-disk 
errors), and the cards' failure to report accompanying disk time-outs.

 
 Another time corruption during hotswapping (though
 this does not necessarily 
 count since I did it on hardware that I did not know
 was supposed to support 
 it, and I would not have attempted it to begin with
 otherwise).

Using ZFS as a test platform to see whether you could get away with using 
hardware in a manner that it may not have been intended to be used may not 
really qualify as 'consumer' use.  As I've noted before, consumer relevance 
remains the point in question here (since that's the point that fired off this 
lengthy sub-discussion).

...
 
 In my professional life I have seen bitflips a few
 times in the middle of real 
 live data running on real servers that are used for
 important data. As a 
 result I have become pretty paranoid about it all,
 making heavy use of par2.

And well you should - but, again, that's hardly 'consumer' use.

...

  can also be detected by 
  scrubbing, and it's arguably a lot easier to apply
 brute-force scrubbing
  (e.g., by scheduling a job that periodically copies
 your data to the null
  device if your system does not otherwise support
 the mechanism) than to
  switch your file system.
 
 How would your magic scrubbing detect arbitrary data
 corruption without 
 checksumming

The assertion is that it would catch the large majority of errors that ZFS 
would catch (i.e., all the otherwise detectable errors, most of them detected 
by the disk when it attempts to read a sector), leaving a residue of no 
noticeable consequence to consumers (especially as one could make a reasonable 
case that most consumers would not experience any noticeable problem even if 
*none* of these errors were noticed).

 or redundancy?

Redundancy is necessary if you want to fix (not just catch) errors, but 
conventional mechanisms provide redundancy just as effective as ZFS's.  (With 
the minor exception of ZFS's added metadata redundancy, but the likelihood that 
an error will happen to hit the relatively minuscule amount of metadata on a 
disk rather than the sea of data on it is, for consumers, certainly negligible, 
especially considering all the far more likely potential risks in the use of 
their PCs.)

 
 A lot of the data people save does not have
 checksumming.

*All* disk data is checksummed, right at the disk - and according to the 
studies I'm familiar with this detects most errors (certainly enough of those 
that ZFS also catches to satisfy most consumers).  If you've got any 
quantitative evidence to the contrary, by all means present it.

...
 
 I think one needs to stop making excuses by observing
 properties of specific 
 file types and simlar.

I'm afraid that's incorrect:  given the statistical incidence of the errors in 
question here, in normal consumer use only humongous files will ever experience 
them with non-neglible probability.  So those are the kinds of files at issue.

When such a file experiences one of these errors, then either it will be one 
that ZFS is uniquely (save for WAFL) capable of detecting, or it will be one 
that more conventional mechanisms can detect.  The latter are, according to the 
studies I keep mentioning, far more frequent (only relatively, of course:  
we're still only talking about one in every 10 TB or so, on average and 
according to manufacturers' specs, which seem to be if anything pessimistic in 
this area), and comprise primarily unreadable disk sectors which (as long as 
they're detected in a timely manner by scrubbing, whether ZFS's or some 
manually-scheduled mechanism) simply require that the bad sector (or file) be 
replaced by a good copy to restore the desired level of redundancy.

When we get into the 

[zfs-discuss] zdb internals?

2007-11-12 Thread Mark Ashley
I don't have time to RTFS so I was curious if there was a guide on using 
zdb, and does it do any writing of the zfs information? The binary has a 
lot of options which aren't clear what do what.

I'm looking for any tools that let you do low level fiddling with things 
such as broken zpools.

ta,
Mark.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + default blocksize

2007-11-12 Thread Anton B. Rang
Yes.  Blocks are compressed individually, so a smaller block size will (on 
average) lead to less compression.  (Assuming that your data is compressible at 
all, that is.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-12 Thread Selim Daoud
some business do not accept any kind of risk and hence will try hard
(i.e spend a lot of money) to eliminate it (create 2, 3, 4 copies,
read-verify, cksum...)

at the moment only ZFS can give this assurance, plus the ability to
self correct detected
errors.

It's a good things that ZFS can help people store and manage safely
their jpgs on their usb disk..the real target customers here are
companies that rely a lot on their data: their goal is create value
out of it. It is the case for CERN (corrupted file might imply a
missed higgs particule) and any mature company as matter of fact
(finance, governement). these are the business ZFS gives a real data
storage assurance

Selim
-- 
--
Blog: http://fakoli.blogspot.com/

On Nov 13, 2007 12:53 AM, can you guess? [EMAIL PROTECTED] wrote:
 Thanks for taking the time to flesh these points out.  Comments below:

 ...

  The compression I see varies from something like 30%
  to 50%, very
  roughly (files reduced *by* 30%, not files reduced
  *to* 30%).   This is
  with the Nikon D200, compressed NEF option.  On some
  of the lower-level
  bodies, I believe the compression can't be turned
  off.  Smaller files
  will of course get hit less often -- or it'll take
  longer to accumulate
  the terrabyte, is how I'd prefer to think of it.

 Either viewpoint works.  And since the compression is not that great, you 
 still wind up consuming a lot of space.  Effectively, you're trading (at 
 least if compression is an option rather than something that you're stuck 
 with) the possibility that a picture will become completely useless should a 
 bit get flipped for a storage space reduction of 30% - 50% - and that's a 
 good trade, since it effectively allows you to maintain a complete backup 
 copy on disk (for archiving, preferably off line) almost for free compared 
 with the uncompressed option.

 
  Damage that's fixable is still damage; I think of
  this in archivist
  mindset, with the disadvantage of not having an
  external budget to be my
  own archivist.

 There will *always* be the potential for damage, so the key is to make sure 
 that any damage is easily fixable.  The best way to do this is to a) keep 
 multiple copies, b) keep them isolated from each other (that's why RAID is 
 not a suitable approach to archiving), and c) check (scrub) them periodically 
 to ensure that if you lose a piece (whether a bit or a sector) you can 
 restore the affected data from another copy and thus return your redundancy 
 to full strength.

 For serious archiving, you probably want to maintain at least 3 such copies 
 (possibly more if some are on media of questionable longevity).  For normal 
 use, there's probably negligible risk of losing any data if you maintain only 
 two on reasonably reliable media:  'MAID' experience suggests that scrubbing 
 as little as every few months reduces the likelihood of encountering 
 detectable errors while restoring redundancy by several orders of magnitude 
 (i.e., down to something like once in a PB at worst for disks - becoming 
 comparable to the levels of bit-flip errors that the disk fails to detect at 
 all).

 Which is what I've been getting at w.r.t. ZFS in this particular application 
 (leaving aside whether it can reasonably be termed a 'consumer' application - 
 because bulk video storage is becoming one and it not only uses a similar 
 amount of storage space but should probably be protected using similar 
 strategies):  unless you're seriously worried about errors in the once-per-PB 
 range, ZFS primarily just gives you automated (rather than 
 manually-scheduled) scrubbing (and only for your on-line copy).  Yes, it will 
 help detect hardware faults as well if they happen to occur between RAM and 
 the disk (and aren't otherwise detected - I'd still like to know whether the 
 'bad cable' experiences reported here occurred before ATA started CRCing its 
 transfers), but while there's anecdotal evidence of such problems presented 
 here it doesn't seem to be corroborated by the few actual studies that I'm 
 familiar with, so that risk is difficult to quantify.

 Getting back to 'consumer' use for a moment, though, given that something 
 like 90% of consumers entrust their PC data to the tender mercies of Windows, 
 and a large percentage of those neither back up their data, nor use RAID to 
 guard against media failures, nor protect it effectively from the perils of 
 Internet infection, it would seem difficult to assert that whatever 
 additional protection ZFS may provide would make any noticeable difference in 
 the consumer space - and that was the kind of reasoning behind my comment 
 that began this sub-discussion.

 By George, we've managed to get around to having a substantive discussion 
 after all:  thanks for persisting until that occurred.

 - bill


 This message posted from opensolaris.org
 ___

 zfs-discuss mailing 

[zfs-discuss] ZFS + DB + fragments

2007-11-12 Thread Louwtjie Burger
Hi

After a clean database load a database would (should?) look like this,
if a random stab at the data is taken...

[8KB-m][8KB-n][8KB-o][8KB-p]...

The data should be fairly (100%) sequential in layout ... after some
days though that same spot (using ZFS) would problably look like:

[8KB-m][   ][8KB-o][   ]

Is this pseudo logical-physical view correct (if blocks n and p was
updated and with COW relocated somewhere else)?

Could a utility be constructed to show the level of fragmentation ?
(50% in above example)

IF the above theory is flawed... how would fragmentation look/be
observed/calculated under ZFS with large Oracle tablespaces?

Does it even matter what the fragmentation is from a performance perspective?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss