Re: [zfs-discuss] ZFS + DB + default blocksize
Louwtjie Burger writes: Hi What is the impact of not aligning the DB blocksize (16K) with ZFS, especially when it comes to random reads on single HW RAID LUN. How would one go about measuring the impact (if any) on the workload? The DB will have a bigger in memory footprint as you will need to keep the ZFS record for the lifespan of the DB block. This probably means you want to partition memory between DB cache/ZFS ARC cache according to the ratio of DB blocksize/ZFS recordize. Then I imagine you have multiple spindles associated with the lun. If you're lun is capable of 2000 IOPS over a 200MB/sec data channel then during 1 second at full speed : 2000 IOPS * 16K = 32MB of data transfer, and this fits in the channel capability. But using say a ZFS blocks of 128K then 2000 IOPS * 128K = 256MB, which overload the channel. So in this example the data channel would saturate first preventing you from reaching those 2000 IOPS. But with enough memory and data channel throughput then it's a good idea to keep the ZFS recordize large. -r Thank you ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS very slow under xVM
IIn this PC, I'm using the PCI card http://www.intel.com/network/connectivity/products/pro1000gt_desktop_adapter.htm , but, more recentlyI'm using the PCI Express card http://www.intel.com/network/connectivity/products/pro1000pt_desktop_adapter.htm Note that the latter didn't have PXE and the boot ROM enabled (for JumpStart), contrary the the documentation, and I had to download the DOS program from the Intel site to enable it. (please ask if anyone needs the URL) ...so, for an easy life, I recommend the Intel PRO/ 1000 GT Desktop This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Intent Log removal
Hi ! I played recently with Gigabyte i-RAM card (which is basically an SSD) as a log device for a ZFS pool. However, when I tried to remove it - I need to give the card back - it refused to do so. It looks like I am hitting 6574286 removing a slog doesn't work [1] Is there any workaround ? I really need to this card removed and I cannot afford losing the data on that pool. Any hints ? [1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286 -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intent Log removal
You could always replace this device by another one of same, or bigger size using zpool replace. -neel Cyril Plisko wrote: Hi ! I played recently with Gigabyte i-RAM card (which is basically an SSD) as a log device for a ZFS pool. However, when I tried to remove it - I need to give the card back - it refused to do so. It looks like I am hitting 6574286 removing a slog doesn't work [1] Is there any workaround ? I really need to this card removed and I cannot afford losing the data on that pool. Any hints ? [1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Modify fsid/guid of dataset for NFS failover
On Nov 10, 2007, at 23:16, Carson Gaspar wrote: Mattias Pantzare wrote: As the fsid is created when the file system is created it will be the same when you mount it on a different NFS server. Why change it? Or are you trying to match two different file systems? Then you also have to match all inode-numbers on your files. That is not possible at all. It is, if you do block replication between the servers (drbd on Linux, or the Sun product whose name I'm blanking on at the moment). AVS (or Availability Suite) .. http://www.opensolaris.org/os/project/avs/ Jim Dunham does a nice demo here for block replication on zfs (see sidebar) What isn't clear is if zfs send/recv retains inode numbers... if it doesn't that's a really sad thing, as we won't be able to use ZFS to replace NetApp snapmirrors. zfs send/recv comes out of the DSL which i believe will generate a unique fsid_guid .. for mirroring you'd really want to use AVS. btw - you can also look at the Cluster SUNWnfs agent in the ohac community: http://opensolaris.org/os/community/ha-clusters/ohac/downloads/ hth --- .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intent Log removal
On Nov 12, 2007 5:51 PM, Neelakanth Nadgir [EMAIL PROTECTED] wrote: You could always replace this device by another one of same, or bigger size using zpool replace. Indeed. Provided that I always have an unused device of same or bigger size, which is seldom the case. :( -neel Cyril Plisko wrote: Hi ! I played recently with Gigabyte i-RAM card (which is basically an SSD) as a log device for a ZFS pool. However, when I tried to remove it - I need to give the card back - it refused to do so. It looks like I am hitting 6574286 removing a slog doesn't work [1] Is there any workaround ? I really need to this card removed and I cannot afford losing the data on that pool. Any hints ? [1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286 -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intent Log removal
Cyril Plisko wrote: On Nov 12, 2007 5:51 PM, Neelakanth Nadgir [EMAIL PROTECTED] wrote: You could always replace this device by another one of same, or bigger size using zpool replace. Indeed. Provided that I always have an unused device of same or bigger size, which is seldom the case. :( In a pinch you could use an iSCSI target :) -neel Cyril Plisko wrote: Hi ! I played recently with Gigabyte i-RAM card (which is basically an SSD) as a log device for a ZFS pool. However, when I tried to remove it - I need to give the card back - it refused to do so. It looks like I am hitting 6574286 removing a slog doesn't work [1] Is there any workaround ? I really need to this card removed and I cannot afford losing the data on that pool. Any hints ? [1] http://bugs.opensolaris.org/view_bug.do?bug_id=6574286 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Response to phantom dd-b post
In the previous and current responses, you seem quite determined of others misconceptions. I'm afraid that your sentence above cannot be parsed grammatically. If you meant that I *have* determined that some people here are suffering from various misconceptions, that's correct. Given that fact and the first paragraph of your response below, I think you can figure out why nobody on this list will reply to you again. Predicting the future (especially the actions of others) is usually a feat reserved for psychics: are you claiming to be one (perhaps like the poster who found it 'clear' that I was a paid NetApp troll - one of the aforementioned misconceptions)? Oh, well - what can one expect from someone who not only top-posts but completely fails to trim quotations? I see that you appear to be posting from a .edu domain, so perhaps next year you will at least mature to the point of becoming sophomoric. Whether people here find it sufficiently uncomfortable to have their beliefs (I'm almost tempted to say 'faith', in some cases) challenged that they'll indeed just shut up I really wouldn't presume to guess. As for my own attitude, if you actually examine my responses rather than just go with your gut (which doesn't seem to be a very reliable guide in your case) you'll find that I tend to treat people pretty much as they deserve. If they don't pay attention to what they're purportedly responding to or misrepresent what I've said, I do chide them a bit (since I invariably *do* pay attention to what *they* say and make sincere efforts to respond to exactly that), and if they're confrontational and/or derogatory then they'll find me very much right back in their face. Perhaps it's some kind of territorial thing - that people bridle when they find a seriously divergent viewpoint popping up in a cozy little community where most most of them have congregated because they already share the beliefs of the group. Such in-bred communities do provide a kind of sanctuary and feeling of belonging: perhaps it's unrealistic to expect most people to be able to rise above that and deal rationally with the wider world's entry into their little one. Or not: we'll see. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
On Sat, Nov 10, 2007 at 02:05:04PM -0200, Toby Thain wrote: Yup - that's exactly the kind of error that ZFS and WAFL do a perhaps uniquely good job of catching. WAFL can't catch all: It's distantly isolated from the CPU end. How so? The checksumming method is different from ZFS, but as far as I understand rather similar in capability. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Error: Volume size exceeds limit for this system
Thanks for the help guys - unfortunately the only hardware at my disposal just at the minute is all 32 bit, so I'll just have to wait a while and fork out on some 64-bit kit before I get the drives. I'm a home user so I'm glad I didnt buy the drives and discover I couldnt use them without spending even more!! Chris This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mdb ::memstat including zfs buffer details?
On Nov 8, 2007 4:21 PM, Nathan Kroenert [EMAIL PROTECTED] wrote: Hey all - Just a quick one... Is there any plan to update the mdb ::memstat dcmd to present ZFS buffers as part of the summary? At present, we get something like: ::memstat Page SummaryPagesMB %Tot Kernel 28859 112 13% Anon34230 133 15% Exec and libs 10305405% Page cache 16876658% Free (cachelist)26145 102 12% Free (freelist)105176 410 47% Balloon 0 00% Total 221591 865 Which just (as far as I can tell) includes the zfs buffers in Kernel memory. And what I'd really like is: ::memstat Page SummaryPagesMB %Tot Kernel 28859 112 13% Anon34230 133 15% Exec and libs 10305405% Page cache 16876658% Free (cachelist)26145 102 12% Free (zfscachelist) 1827346 1700 xx% Free (freelist)105176 410 47% Balloon 0 00% Total 221591 865 Which then represents the pages that *could* be freed up by ZFS in the event that they are needed for other purposes... Any thoughts on this? Is there a great reason why we cannot do this? Also - Other utilities like vmstat, etc that print out memory... File an RFE. I don't think it should be too bad (for ::memstat), given that (at least in Nevada), all of the ZFS caching data belongs to the zvp vnode, instead of kvp. The work that made that change was: 4894692 caching data in heap inflates crash dump Of course, this so-called free memory does act a bit differently than the cachelist, etc., so maybe it should be named slightly differently. Cheers, - jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Response to phantom dd-b post
You have to detect the problem first. ZFS is in a much better position to detect the problem due to block checksums. Bulls***, to quote another poster here who has since been strangely quiet. The vast majority of what ZFS can detect (save for *extremely* rare undetectable bit-rot and for real hardware (path-related) errors that studies like CERN's have found to be very rare - and you have yet to provide even anecdotal evidence to the contrary) You wanted anectodal evidence: During my personal experience with only two home machines, ZFS has helped me detect corruption at least three times in a period of a few months. One due to silent corruption due to a controller bug (and a driver that did not work around it). Another time corruption during hotswapping (though this does not necessarily count since I did it on hardware that I did not know was supposed to support it, and I would not have attempted it to begin with otherwise). Third time I don't remember now. You may disregard it if you wish. In my professional life I have seen bitflips a few times in the middle of real live data running on real servers that are used for important data. As a result I have become pretty paranoid about it all, making heavy use of par2. (I have also seen various file system corruption / system instability issues that may very well be consistent with bit flips / other forms of corruption, but where there has been no proof of the underlying cause of the problems.) can also be detected by scrubbing, and it's arguably a lot easier to apply brute-force scrubbing (e.g., by scheduling a job that periodically copies your data to the null device if your system does not otherwise support the mechanism) than to switch your file system. How would your magic scrubbing detect arbitrary data corruption without checksumming or redundancy? A lot of the data people save does not have checksumming. Even if it does, the file system meta data typically does not. Nor does various minor information related to the data (let's day the meta data associated with your backup of your other data, even if that data has some internal checksumming). I think one needs to stop making excuses by observing properties of specific file types and simlar. You can always use FEC to do error correction on arbitrary files if you really feel they are important. But the point is that with ZFS you get detection of *ANY* bit error for free (essentially), and optionally correction if you have redundancy. it doesn't matter if it's internal file system meta data, that important file you didn't consider important from a corruption perspective, or in the middle of some larger file that you may or may not have applied FEC on otherwise. Even without fancy high-end requirements, it is nice to have some good statistical reason to believe that random corruption does not occurs. Even if only to drive your web browsers or E-Mail client; at least you can be sure that random bitflips (unless they either are undetected due to an implementation bug, or occurrs in memory/etc) is not the cause of your random application misbehavior. It's like choosing RAM. You can make excuses all you want about doing proper testing, buying good RAM, or having redundancy at other levels etc - but you will still sleep better knowing you have ECC RAM than some random junk. Or let's do the seat belt analogy. You can try to convince yourself/other people all you want that you are a safe driver, that you should not drive in a way that allows crashes or whatever else - but you are still going to be safer with a seat belt than without it. This is also why we care about fsync(). It doesn't matter that you spent $10 on that expensive server with redundant PSU:s hooked up to redundant UPS systems. *SHIT HAPPENS*, and when it does, you want to be maximally protected. Yes, ZFS is not perfect. But to me, both in the context of personal use and more serious use, ZFS is, barring some implementation details, more or less exactly what I have always wanted and solves pretty much all of the major problems with storage. And let me be clear: That is not hype. It's ZFS actually providing what I have wanted, and what I knew I wanted even before ZFS (or WAFL or whatever else) was ever on my radar. For some reason some people seem to disagree. That's your business. But the next time you have a power outtage, you'll be sorry if you had a database that didn't do fsync()[1], a filesystem that had no correction checking whatsoever[2], a RAID5 system that didn't care about parity correctness in the face of a crash[3], and a filesystem or application whose data is not structured such that you can ascertain *what* is broken after the crash and what is not[4]. You will be even more sorry two years later when something really important malfunctioned as a result of undetected corruption two years earlier... [1] Because of course all serious players use
Re: [zfs-discuss] mdb ::memstat including zfs buffer details?
I don't think it should be too bad (for ::memstat), given that (at least in Nevada), all of the ZFS caching data belongs to the zvp vnode, instead of kvp. ZFS data buffers are attached to zvp; however, we still keep metadata in the crashdump. At least right now, this means that cached ZFS metadata has kvp as its vnode. -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Modify fsid/guid of dataset for NFS failover
asa wrote: I would like for all my NFS clients to hang during the failover, then pick up trucking on this new filesystem, perhaps obviously failing their writes back to the apps which are doing the writing. Naive? The OpenSolaris NFS client does this already - has done since IIRC around Solaris 2.6. The knowledge is in the NFS client code. For NFSv4 this functionality is part of the standard. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mdb ::memstat including zfs buffer details?
On Nov 12, 2007 4:16 PM, [EMAIL PROTECTED] wrote: I don't think it should be too bad (for ::memstat), given that (at least in Nevada), all of the ZFS caching data belongs to the zvp vnode, instead of kvp. ZFS data buffers are attached to zvp; however, we still keep metadata in the crashdump. At least right now, this means that cached ZFS metadata has kvp as its vnode. Still, it's better than what you get currently. Cheers, - jonathan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best option for my home file server?
I went ahead and bought a M9N-Sli motherboard with 6 sata controllers and also a promise tx4 (4x sata300 non-raid) pci controller. Anyone know if the tx4 is suppoerted in OpenSolaris? If it's as badly supported as the (crappy) Sil chipsets i'm better of with OpenFiler (linux) I think. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
Thanks for taking the time to flesh these points out. Comments below: ... The compression I see varies from something like 30% to 50%, very roughly (files reduced *by* 30%, not files reduced *to* 30%). This is with the Nikon D200, compressed NEF option. On some of the lower-level bodies, I believe the compression can't be turned off. Smaller files will of course get hit less often -- or it'll take longer to accumulate the terrabyte, is how I'd prefer to think of it. Either viewpoint works. And since the compression is not that great, you still wind up consuming a lot of space. Effectively, you're trading (at least if compression is an option rather than something that you're stuck with) the possibility that a picture will become completely useless should a bit get flipped for a storage space reduction of 30% - 50% - and that's a good trade, since it effectively allows you to maintain a complete backup copy on disk (for archiving, preferably off line) almost for free compared with the uncompressed option. Damage that's fixable is still damage; I think of this in archivist mindset, with the disadvantage of not having an external budget to be my own archivist. There will *always* be the potential for damage, so the key is to make sure that any damage is easily fixable. The best way to do this is to a) keep multiple copies, b) keep them isolated from each other (that's why RAID is not a suitable approach to archiving), and c) check (scrub) them periodically to ensure that if you lose a piece (whether a bit or a sector) you can restore the affected data from another copy and thus return your redundancy to full strength. For serious archiving, you probably want to maintain at least 3 such copies (possibly more if some are on media of questionable longevity). For normal use, there's probably negligible risk of losing any data if you maintain only two on reasonably reliable media: 'MAID' experience suggests that scrubbing as little as every few months reduces the likelihood of encountering detectable errors while restoring redundancy by several orders of magnitude (i.e., down to something like once in a PB at worst for disks - becoming comparable to the levels of bit-flip errors that the disk fails to detect at all). Which is what I've been getting at w.r.t. ZFS in this particular application (leaving aside whether it can reasonably be termed a 'consumer' application - because bulk video storage is becoming one and it not only uses a similar amount of storage space but should probably be protected using similar strategies): unless you're seriously worried about errors in the once-per-PB range, ZFS primarily just gives you automated (rather than manually-scheduled) scrubbing (and only for your on-line copy). Yes, it will help detect hardware faults as well if they happen to occur between RAM and the disk (and aren't otherwise detected - I'd still like to know whether the 'bad cable' experiences reported here occurred before ATA started CRCing its transfers), but while there's anecdotal evidence of such problems presented here it doesn't seem to be corroborated by the few actual studies that I'm familiar with, so that risk is difficult to quantify. Getting back to 'consumer' use for a moment, though, given that something like 90% of consumers entrust their PC data to the tender mercies of Windows, and a large percentage of those neither back up their data, nor use RAID to guard against media failures, nor protect it effectively from the perils of Internet infection, it would seem difficult to assert that whatever additional protection ZFS may provide would make any noticeable difference in the consumer space - and that was the kind of reasoning behind my comment that began this sub-discussion. By George, we've managed to get around to having a substantive discussion after all: thanks for persisting until that occurred. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Response to phantom dd-b post
Well, I guess we're going to remain stuck in this sub-topic for a bit longer: The vast majority of what ZFS can detect (save for *extremely* rare undetectable bit-rot and for real hardware (path-related) errors that studies like CERN's have found to be very rare - and you have yet to provide even anecdotal evidence to the contrary) You wanted anectodal evidence: To be accurate, the above was not a solicitation for just any kind of anecdotal evidence but for anecdotal evidence that specifically contradicted the notion that otherwise undetected path-related hardware errors are 'very rare'. During my personal experience with only two home machines, ZFS has helped me detect corruption at least three times in a period of a few months. One due to silent corruption due to a controller bug (and a driver that did not work around it). If that experience occurred using what could be considered normal consumer hardware and software, that's relevant (and disturbing). As I noted earlier, the only path-related problem that the CERN study unearthed involved their (hardly consumer-typical) use of RAID cards, the unusual demands that those cards placed on the WD disk firmware (to the point where it produced on-disk errors), and the cards' failure to report accompanying disk time-outs. Another time corruption during hotswapping (though this does not necessarily count since I did it on hardware that I did not know was supposed to support it, and I would not have attempted it to begin with otherwise). Using ZFS as a test platform to see whether you could get away with using hardware in a manner that it may not have been intended to be used may not really qualify as 'consumer' use. As I've noted before, consumer relevance remains the point in question here (since that's the point that fired off this lengthy sub-discussion). ... In my professional life I have seen bitflips a few times in the middle of real live data running on real servers that are used for important data. As a result I have become pretty paranoid about it all, making heavy use of par2. And well you should - but, again, that's hardly 'consumer' use. ... can also be detected by scrubbing, and it's arguably a lot easier to apply brute-force scrubbing (e.g., by scheduling a job that periodically copies your data to the null device if your system does not otherwise support the mechanism) than to switch your file system. How would your magic scrubbing detect arbitrary data corruption without checksumming The assertion is that it would catch the large majority of errors that ZFS would catch (i.e., all the otherwise detectable errors, most of them detected by the disk when it attempts to read a sector), leaving a residue of no noticeable consequence to consumers (especially as one could make a reasonable case that most consumers would not experience any noticeable problem even if *none* of these errors were noticed). or redundancy? Redundancy is necessary if you want to fix (not just catch) errors, but conventional mechanisms provide redundancy just as effective as ZFS's. (With the minor exception of ZFS's added metadata redundancy, but the likelihood that an error will happen to hit the relatively minuscule amount of metadata on a disk rather than the sea of data on it is, for consumers, certainly negligible, especially considering all the far more likely potential risks in the use of their PCs.) A lot of the data people save does not have checksumming. *All* disk data is checksummed, right at the disk - and according to the studies I'm familiar with this detects most errors (certainly enough of those that ZFS also catches to satisfy most consumers). If you've got any quantitative evidence to the contrary, by all means present it. ... I think one needs to stop making excuses by observing properties of specific file types and simlar. I'm afraid that's incorrect: given the statistical incidence of the errors in question here, in normal consumer use only humongous files will ever experience them with non-neglible probability. So those are the kinds of files at issue. When such a file experiences one of these errors, then either it will be one that ZFS is uniquely (save for WAFL) capable of detecting, or it will be one that more conventional mechanisms can detect. The latter are, according to the studies I keep mentioning, far more frequent (only relatively, of course: we're still only talking about one in every 10 TB or so, on average and according to manufacturers' specs, which seem to be if anything pessimistic in this area), and comprise primarily unreadable disk sectors which (as long as they're detected in a timely manner by scrubbing, whether ZFS's or some manually-scheduled mechanism) simply require that the bad sector (or file) be replaced by a good copy to restore the desired level of redundancy. When we get into the
[zfs-discuss] zdb internals?
I don't have time to RTFS so I was curious if there was a guide on using zdb, and does it do any writing of the zfs information? The binary has a lot of options which aren't clear what do what. I'm looking for any tools that let you do low level fiddling with things such as broken zpools. ta, Mark. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + DB + default blocksize
Yes. Blocks are compressed individually, so a smaller block size will (on average) lead to less compression. (Assuming that your data is compressible at all, that is.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
some business do not accept any kind of risk and hence will try hard (i.e spend a lot of money) to eliminate it (create 2, 3, 4 copies, read-verify, cksum...) at the moment only ZFS can give this assurance, plus the ability to self correct detected errors. It's a good things that ZFS can help people store and manage safely their jpgs on their usb disk..the real target customers here are companies that rely a lot on their data: their goal is create value out of it. It is the case for CERN (corrupted file might imply a missed higgs particule) and any mature company as matter of fact (finance, governement). these are the business ZFS gives a real data storage assurance Selim -- -- Blog: http://fakoli.blogspot.com/ On Nov 13, 2007 12:53 AM, can you guess? [EMAIL PROTECTED] wrote: Thanks for taking the time to flesh these points out. Comments below: ... The compression I see varies from something like 30% to 50%, very roughly (files reduced *by* 30%, not files reduced *to* 30%). This is with the Nikon D200, compressed NEF option. On some of the lower-level bodies, I believe the compression can't be turned off. Smaller files will of course get hit less often -- or it'll take longer to accumulate the terrabyte, is how I'd prefer to think of it. Either viewpoint works. And since the compression is not that great, you still wind up consuming a lot of space. Effectively, you're trading (at least if compression is an option rather than something that you're stuck with) the possibility that a picture will become completely useless should a bit get flipped for a storage space reduction of 30% - 50% - and that's a good trade, since it effectively allows you to maintain a complete backup copy on disk (for archiving, preferably off line) almost for free compared with the uncompressed option. Damage that's fixable is still damage; I think of this in archivist mindset, with the disadvantage of not having an external budget to be my own archivist. There will *always* be the potential for damage, so the key is to make sure that any damage is easily fixable. The best way to do this is to a) keep multiple copies, b) keep them isolated from each other (that's why RAID is not a suitable approach to archiving), and c) check (scrub) them periodically to ensure that if you lose a piece (whether a bit or a sector) you can restore the affected data from another copy and thus return your redundancy to full strength. For serious archiving, you probably want to maintain at least 3 such copies (possibly more if some are on media of questionable longevity). For normal use, there's probably negligible risk of losing any data if you maintain only two on reasonably reliable media: 'MAID' experience suggests that scrubbing as little as every few months reduces the likelihood of encountering detectable errors while restoring redundancy by several orders of magnitude (i.e., down to something like once in a PB at worst for disks - becoming comparable to the levels of bit-flip errors that the disk fails to detect at all). Which is what I've been getting at w.r.t. ZFS in this particular application (leaving aside whether it can reasonably be termed a 'consumer' application - because bulk video storage is becoming one and it not only uses a similar amount of storage space but should probably be protected using similar strategies): unless you're seriously worried about errors in the once-per-PB range, ZFS primarily just gives you automated (rather than manually-scheduled) scrubbing (and only for your on-line copy). Yes, it will help detect hardware faults as well if they happen to occur between RAM and the disk (and aren't otherwise detected - I'd still like to know whether the 'bad cable' experiences reported here occurred before ATA started CRCing its transfers), but while there's anecdotal evidence of such problems presented here it doesn't seem to be corroborated by the few actual studies that I'm familiar with, so that risk is difficult to quantify. Getting back to 'consumer' use for a moment, though, given that something like 90% of consumers entrust their PC data to the tender mercies of Windows, and a large percentage of those neither back up their data, nor use RAID to guard against media failures, nor protect it effectively from the perils of Internet infection, it would seem difficult to assert that whatever additional protection ZFS may provide would make any noticeable difference in the consumer space - and that was the kind of reasoning behind my comment that began this sub-discussion. By George, we've managed to get around to having a substantive discussion after all: thanks for persisting until that occurred. - bill This message posted from opensolaris.org ___ zfs-discuss mailing
[zfs-discuss] ZFS + DB + fragments
Hi After a clean database load a database would (should?) look like this, if a random stab at the data is taken... [8KB-m][8KB-n][8KB-o][8KB-p]... The data should be fairly (100%) sequential in layout ... after some days though that same spot (using ZFS) would problably look like: [8KB-m][ ][8KB-o][ ] Is this pseudo logical-physical view correct (if blocks n and p was updated and with COW relocated somewhere else)? Could a utility be constructed to show the level of fragmentation ? (50% in above example) IF the above theory is flawed... how would fragmentation look/be observed/calculated under ZFS with large Oracle tablespaces? Does it even matter what the fragmentation is from a performance perspective? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss