Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
On Mon, Sep 28, 2009 at 06:04:01PM -0400, Thomas Burgess wrote: > personally i like this case: > > > http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021 > > it's got 20 hot swap bays, and it's surprisingly well built. For the money, > it's an amazing deal. You don't like http://www.supermicro.com/products/nfo/chassis_storage.cfm ? I must admit I don't have a price list of these. When running that many hard drives I would insist on redundant power supplies, and server motherboards with ECC memory. Unless it's for home use, where a downtime of days or weeks is not critical. -- Eugen* Leitl http://leitl.org";>leitl http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Connamacher wrote: I was thinking of custom building a server, which I think I can do for around $10,000 of hardware (using 45 SATA drives and a custom enclosure), and putting OpenSolaris on it. It's a bit of a risk compared to buying a $30,000 server, but would be a fun experiment. Others have done similar experiments with considerable success. Bob -- Yes, but be careful of your workload on SATA disks. SATA can be very good for sequential read and write, but only under lower loads, even with a serious SSD cache. I'd want to benchmark things with your particular workload before using SATA instead of SAS. To mention things: Sun's 7110 lists for $11k in the 2TB (with SAS) disk configuration. If you have a longer-term storage needs, look at a X4540 Thor (the replacement for the X4500 Thumpers). They're significantly more reliable and manageable than a custom-built solution. And reasonably cost-competitive. ( >> $1/GB after discount). Both the Thor and 7110 are available for Try-and-Buy. Get them and test them against your workload - it's the only way to be sure (to paraphrase Ripley). Not just for Sun kit, but I'd be very wary of using any no-service-contract hardware for something that is business critical, which I can't imagine your digital editing system isn't. Don't be penny-wise and pound-foolish. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
11:04pm, Paul Archer wrote: Cool. FWIW, there appears to be an issue with the LSI 150-6 card I was using. I grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, and I'm getting write speeds of about 60-70MB/sec, which is about 40x the write speed I was seeing with the old card. Paul Small correction: I was seeing writes in the 60-70MB range because I was writing to a single 2TB (on its own pool). When I tried writing back to the primary (4+1 raid-z) pool, I was getting between 100-120MB/sec. (That's for sequential writes, anyway.) paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Cool. FWIW, there appears to be an issue with the LSI 150-6 card I was using. I grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, and I'm getting write speeds of about 60-70MB/sec, which is about 40x the write speed I was seeing with the old card. Paul Tomorrow, Robert Milkowski wrote: Paul Archer wrote: In light of all the trouble I've been having with this zpool, I bought a 2TB drive, and I'm going to move all my data over to it, then destroy the pool and start over. Before I do that, what is the best way on an x86 system to format/label the disks? if entire disk is going to be dedicated to a one zfs pool then don't bother with manual labeling - when creating a pool provide a disk name without a slice name (so for example c0d0 instead of c0d0s0) and zfs will automatically put an EFI label on it with s0 representing entire disk (- reserved area). -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, 28 Sep 2009, Richard Elling wrote: Many people here would profoundly disagree with the above. There is no substitute for good backups, but a periodic scrub helps validate that a later resilver would succeed. A perioic scrub also helps find system problems early when they are less likely to crater your business. It is much better to find an issue during a scrub rather than during resilver of a mirror or raidz. As I said, I am concerned that people would mistakenly expect that scrubbing offers data protection. It doesn't. I think you proved my point? ;-) It does not specifically offer data "protection" but if you have only duplex redundancy, it substantially helps find and correct a failure which would have caused data loss during a resilver. The value substantially diminishes if you have triple redundancy. I hope it does not offend that I scrub my mirrored pools once a week. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
On Mon, 28 Sep 2009, Richard Connamacher wrote: I was thinking of custom building a server, which I think I can do for around $10,000 of hardware (using 45 SATA drives and a custom enclosure), and putting OpenSolaris on it. It's a bit of a risk compared to buying a $30,000 server, but would be a fun experiment. Others have done similar experiments with considerable success. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
I was thinking of custom building a server, which I think I can do for around $10,000 of hardware (using 45 SATA drives and a custom enclosure), and putting OpenSolaris on it. It's a bit of a risk compared to buying a $30,000 server, but would be a fun experiment. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Paul Archer wrote: In light of all the trouble I've been having with this zpool, I bought a 2TB drive, and I'm going to move all my data over to it, then destroy the pool and start over. Before I do that, what is the best way on an x86 system to format/label the disks? if entire disk is going to be dedicated to a one zfs pool then don't bother with manual labeling - when creating a pool provide a disk name without a slice name (so for example c0d0 instead of c0d0s0) and zfs will automatically put an EFI label on it with s0 representing entire disk (- reserved area). -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
Robert Milkowski wrote: Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. Not only that - it won't also read all the copies of data if zfs has redundancy configured at a pool level. Scrubbing the pool will. And that's the main reason behind the scrub - to be able to detect and repair checksum errors (if any) while a redundant copy is still fine. Also doing tar means reading from ARC and/or L2ARC if data is cached which won't verify if data is actually fine on a disk. Scrub won't use a cache and will always go to physical disks. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. Not only that - it won't also read all the copies of data if zfs has redundancy configured at a pool level. Scrubbing the pool will. And that's the main reason behind the scrub - to be able to detect and repair checksum errors (if any) while a redundant copy is still fine. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
On Mon, 28 Sep 2009, Richard Connamacher wrote: I'm planning on using RAIDZ2 if it can keep up with my bandwidth requirements. So maybe ZFS could be an option after all? ZFS certainly can be an option. If you are willing to buy Sun hardware, they have a "try and buy" program which would allow you to set up a system to evaluate if it will work for you. Otherwise you can use a high-grade Brand-X server and decent-grade Brand-X JBOD array to test on. Sun Sun Storage 7000 series has OpenSolaris and ZFS inside but is configured and sold as a closed-box NAS. The X4550 server is fitted with 48 disk drives and is verified to be able to deliver 2.0GB/second to a network. By MB do you mean mega*byte*? If so, 550 MB is more than enough for uncompressed 1080p. If you mean mega*bit*, then that's not enough. But as you said, you're using a mirrored setup, and RAID-Z should be faster. Yes. I mean megabyte. This is a 12-drive StorageTek 2540 with two 4gbit FC links. I am getting a peak of more than one FC link (550MB/second with a huge file). A JBOD SAS array would be a much better choice now but these products had not yet come to market when I ordered my hardware. This might work for Final Cut editing using QuickTime files. But FX and color grading using TIFF frames at 130 MB/s would slow your setup to a crawl. Do you think RAID-Z would help here? There is no reason why RAID-Z is necessarily faster at sequential reads than mirrors and in fact mirrors can be faster due to fewer disk seeks. With mirrors, it is theoretically possible to schedule reads from all 12 of my disks at once. It is just a matter of the tunings/options that the ZFS implementors decide to provide. Here are some iozone measurements (taken June 28th) with different record sizes running up to a 64GB file size: KB reclen write rewritereadreread 8388608 64 482097 595557 1851378 1879145 8388608 128 429126 621319 1937128 1944177 8388608 256 428197 646922 1954065 1965570 8388608 512 489692 585971 1593610 1584573 16777216 64 439880 41304 822968 841246 16777216 128 443119 435886 815705 844789 16777216 256 446006 475347 814529 687915 16777216 512 436627 462599 787369 803182 33554432 64 401110 41096 547065 553262 33554432 128 404420 394838 549944 552664 33554432 256 406367 400859 544950 553516 33554432 512 401254 410153 554100 558650 67108864 64 378158 40794 552623 555655 67108864 128 379809 385453 549364 553948 67108864 256 380286 377397 551060 550414 67108864 512 378225 385588 550131 557150 It seems like every time I run the benchmark, the numbers have improved. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Sep 28, 2009, at 19:39, Richard Elling wrote: Finally, there are two basic types of scrubs: read-only and rewrite. ZFS does read-only. Other scrubbers can do rewrite. There is evidence that rewrites are better for attacking superparamagnetic decay issues. Something that may be possible when *bp rewrite is eventually committed. Educating post. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
> For me, agressive prefetch is most important in order to schedule > reads from enough disks in advance to produce a high data rate. This > is because I am using mirrors. When using raidz or raidz2 the > situation should be a bit different because raidz is striped. The > prefetch bug which is specifically fixed is when using thousands of > files in the 5MB-8MB range which is typical for film postproduction. > The bug is that prefetch becomes disabled if the file had been > accessed before but its data is no longer in cache. I'm planning on using RAIDZ2 if it can keep up with my bandwidth requirements. So maybe ZFS could be an option after all? > That is not clear to me yet. With my setup, I can read up to > 550MB/second from a large file. That is likely the hardware limit for > me. But when reading one-at-a-time from individual 5 or 8MB files, > the data rate is much less (around 130MB/second). By MB do you mean mega*byte*? If so, 550 MB is more than enough for uncompressed 1080p. If you mean mega*bit*, then that's not enough. But as you said, you're using a mirrored setup, and RAID-Z should be faster. This might work for Final Cut editing using QuickTime files. But FX and color grading using TIFF frames at 130 MB/s would slow your setup to a crawl. Do you think RAID-Z would help here? Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
On Mon, 28 Sep 2009, Richard Connamacher wrote: Thanks for the detailed information. When you get the patch, I'd love to hear if it fixes the problems you're having. From my understanding, a working prefetch would keep video playback from stuttering whenever the drive head moves — is this right? For me, agressive prefetch is most important in order to schedule reads from enough disks in advance to produce a high data rate. This is because I am using mirrors. When using raidz or raidz2 the situation should be a bit different because raidz is striped. The prefetch bug which is specifically fixed is when using thousands of files in the 5MB-8MB range which is typical for film postproduction. The bug is that prefetch becomes disabled if the file had been accessed before but its data is no longer in cache. When doing video playback, it is typical to be reading from several files at once in order to avoid the potential for read "stutter". The inability to read and write simultaneously (within reason) would be frustrating for a shared video editing server. I wonder if ZFS needs more parallelism? If any software RAID ends up having a ZFS has a lot of parallelism since it is optimized for large data servers. similar problem, then we might have to go with the hardware RAID setups I'm trying to avoid. I wonder if there's any way to work around that. Would a bigger write cache help? Or adding an SSD for the cache (ZFS Intent Log)? would Linux software RAID be any better? The problem seems to be that ZFS uses a huge write cache by default and it delays flushing it (up to 30 seconds) so that when the write cache is flushed, it maximally engages the write channel for up to 5 seconds. Decreasing the size of the write cache diminishes the size of the problem. Assuming they fix the prefetch performance issues you talked about, do you think ZFS would be able to keep up with uncompressed 1080p HD or 2K? That is not clear to me yet. With my setup, I can read up to 550MB/second from a large file. That is likely the hardware limit for me. But when reading one-at-a-time from individual 5 or 8MB files, the data rate is much less (around 130MB/second). I am using Solaris 10. OpenSolaris performance seems to be better than Solaris 10. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
Thanks for the detailed information. When you get the patch, I'd love to hear if it fixes the problems you're having. From my understanding, a working prefetch would keep video playback from stuttering whenever the drive head moves — is this right? The inability to read and write simultaneously (within reason) would be frustrating for a shared video editing server. I wonder if ZFS needs more parallelism? If any software RAID ends up having a similar problem, then we might have to go with the hardware RAID setups I'm trying to avoid. I wonder if there's any way to work around that. Would a bigger write cache help? Or adding an SSD for the cache (ZFS Intent Log)? would Linux software RAID be any better? Assuming they fix the prefetch performance issues you talked about, do you think ZFS would be able to keep up with uncompressed 1080p HD or 2K? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Sep 28, 2009, at 11:41 AM, Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Elling wrote: In other words, I am concerned that people replace good data protection practices with scrubs and expecting scrub to deliver better data protection (it won't). Many people here would profoundly disagree with the above. There is no substitute for good backups, but a periodic scrub helps validate that a later resilver would succeed. A perioic scrub also helps find system problems early when they are less likely to crater your business. It is much better to find an issue during a scrub rather than during resilver of a mirror or raidz. As I said, I am concerned that people would mistakenly expect that scrubbing offers data protection. It doesn't. I think you proved my point? ;-) Scrubs are also useful for detecting broken hardware. However, normal activity will also detect broken hardware, so it is better to think of scrubs as finding degradation of old data rather than being a hardware checking service. Do you have a scientific reference for this notion that "old data" is more likely to be corrupt than "new data" or is it just a gut- feeling? This hypothesis does not sound very supportable to me. Magnetic hysteresis lasts quite a lot longer than the recommended service life for a hard drive. Studio audio tapes from the '60s are still being used to produce modern "remasters" of old audio recordings which sound better than they ever did before (other than the master tape). Those are analog tapes... they just fade away... For data, it depends on the ECC methods, quality of the media, environment, etc. You will find considerable attention spent on verification of data on tapes in archiving products. In the tape world, there are slightly different conditions than the magnetic disk world, but I can't think of a single study which shows that magnetic disks get more reliable over time, while there are dozens which show that they get less reliable and that latent sector errors dominate, as much as 5x, over full disk failures. My studies of Sun disk failure rates have shown similar results. Some forms of magnetic hysteresis are known to last millions of years. Media failure is more often than not mechanical or chemical and not related to loss of magnetic hysteresis. Head failures may be construed to be media failures. Here is a good study from the University of Wisconsin-Madison which clearly shows the relationship between disk age and latent sector errors. It also shows how the increase in aerial density also increases the latent sector error (LSE) rate. Additionally, this gets back to the ECC method, which we observe to be different on consumer-grade and enterprise-class disks. The study shows a clear win for enterprise-class drives wrt latent errors. The paper suggests a 2- week scrub cycle and recognizes that many RAID arrays have such policies. There are indeed many studies which show latent sector errors are a bigger problem as the disk ages. An Analysis of Latent Sector Errors in Disk Drives www.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.ps See http://en.wikipedia.org/wiki/Ferromagnetic for information on ferromagnetic materials. For disks we worry about the superparamagnetic effect. http://en.wikipedia.org/wiki/Superparamagnetism Quoting US Patent 6987630, ... the superparamagnetic effect is a thermal relaxation of information stored on the disk surface. Because the superparamagnetic effect may occur at room temperature, over time, information stored on the disk surface will begin to decay. Once the stored information decays beyond a threshold level, it will be unable to be properly read by the read head and the information will be lost. The superparamagnetic effect manifests itself by a loss in amplitude in the readback signal over time or an increase in the mean square error (MSE) of the read back signal over time. In other words, the readback signal quality metrics are means square error and amplitude as measured by the read channel integrated circuit. Decreases in the quality of the readback signal cause bit error rate (BER) increases. As is well known, the BER is the ultimate measure of drive performance in a disk drive. This effect is based on the time since written. Hence, older data can have higher MSE and subsequent BER leading to a UER. To be fair, newer disk technology is constantly improving. But what is consistent with the physics is that increase in bit densities leads to more space and rebalancing the BER. IMHO, this is why we see densities increase, but UER does not increase (hint: marketing always wins these sorts of battles). FWIW, flash memories are not affected by superparamagnetic decay. It would be most useful if zfs incorporated a sl
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
well when i start looking into rack configurations i will consider it. :) here's my configuration - enjoy! http://michaelshadle.com/2009/09/28/my-recipe-for-zfs-at-home/ On Mon, Sep 28, 2009 at 3:10 PM, Thomas Burgess wrote: > i own this case, it's really not that bad. It's got 4 fans but they are > really big and don't make nearly as much noise as you'd think. honestly, > it's not bad at all. I know someone who sits it vertically as well, > honestly, it's a good case for the money > > > On Mon, Sep 28, 2009 at 6:06 PM, Michael Shadle wrote: >> >> rackmount chassis aren't usually designed with acoustics in mind :) >> >> however i might be getting my closet fitted so i can put half a rack >> in. might switch up my configuration to rack stuff soon. >> >> On Mon, Sep 28, 2009 at 3:04 PM, Thomas Burgess >> wrote: >> > personally i like this case: >> > >> > >> > http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021 >> > >> > it's got 20 hot swap bays, and it's surprisingly well built. For the >> > money, >> > it's an amazing deal. >> > >> > >> > >> > ___ >> > zfs-discuss mailing list >> > zfs-discuss@opensolaris.org >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > >> > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
On Mon, 28 Sep 2009, Richard Connamacher wrote: I'm looking at building a high bandwidth file server to store video for editing, as an alternative to buying a $30,000 hardware RAID and spending $2000 per seat on fibrechannel and specialized SAN drive software. Uncompressed HD runs around 1.2 to 4 gigabits per second, putting it in 10 gigabit Ethernet or FibreChannel territory. Any file server would have to be able to move that many bits in sustained read and sustained write, and doing both simultaneously would be a plus. Please see a white paper I wrote entitled "ZFS and Digital Intermediate" at http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-and-di.pdf which expounds on this topic and makes it sound like zfs is the perfect answer for this. Unfortunately, I have since learned that zfs file prefetch ramps up too slowly or becomes disabled for certain workloads. I reported a bug. Many here are eagerly awaiting the next OpenSolaris development release which is supposed to have fixes for the prefetch problem I encountered. I am told that a Solaris 10 IDR (customer-specific patch) will be provided to me within the next few days to resolve the performance issue. There is another performance issue in which writes to the server cause reads to briefly stop periodically. This means that the server could not be used simultaneously for video playback while files are being updated. To date there is no proposed solution for this problem. Linux XFS seems like the top contender for video playback and editing. From a description of XFS design and behavior, it would not surprise me if it stuttered during playback when files are updated as well. Linux XFS also buffers written data and writes it out in large batches at a time. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive should allow to keep received system
On Mon, Sep 28, 2009 at 03:16:17PM -0700, Igor Velkov wrote: > Not so good as I hope. > zfs send -R xxx/x...@daily_2009-09-26_23:51:00 |ssh -c blowfish r...@xxx.xx > zfs recv -vuFd xxx/xxx > > invalid option 'u' > usage: > receive [-vnF] > receive [-vnF] -d > > For the property list, run: zfs set|get > > For the delegated permission list, run: zfs allow|unallow > r...@xxx:~# uname -a > SunOS xxx 5.10 Generic_13-03 sun4u sparc SUNW,Sun-Fire-V890 > > What's wrong? Looks like -u was a recent addition. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive should allow to keep received system
On 09/28/09 16:16, Igor Velkov wrote: Not so good as I hope. zfs send -R xxx/x...@daily_2009-09-26_23:51:00 |ssh -c blowfish r...@xxx.xx zfs recv -vuFd xxx/xxx invalid option 'u' usage: receive [-vnF] receive [-vnF] -d For the property list, run: zfs set|get For the delegated permission list, run: zfs allow|unallow r...@xxx:~# uname -a SunOS xxx 5.10 Generic_13-03 sun4u sparc SUNW,Sun-Fire-V890 What's wrong? the option was added in S10 Update 7. I'm not sure whether the patch-level shown above included U7 changes or not. Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive should allow to keep received system
Not so good as I hope. zfs send -R xxx/x...@daily_2009-09-26_23:51:00 |ssh -c blowfish r...@xxx.xx zfs recv -vuFd xxx/xxx invalid option 'u' usage: receive [-vnF] receive [-vnF] -d For the property list, run: zfs set|get For the delegated permission list, run: zfs allow|unallow r...@xxx:~# uname -a SunOS xxx 5.10 Generic_13-03 sun4u sparc SUNW,Sun-Fire-V890 What's wrong? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
i own this case, it's really not that bad. It's got 4 fans but they are really big and don't make nearly as much noise as you'd think. honestly, it's not bad at all. I know someone who sits it vertically as well, honestly, it's a good case for the money On Mon, Sep 28, 2009 at 6:06 PM, Michael Shadle wrote: > rackmount chassis aren't usually designed with acoustics in mind :) > > however i might be getting my closet fitted so i can put half a rack > in. might switch up my configuration to rack stuff soon. > > On Mon, Sep 28, 2009 at 3:04 PM, Thomas Burgess > wrote: > > personally i like this case: > > > > > > http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021 > > > > it's got 20 hot swap bays, and it's surprisingly well built. For the > money, > > it's an amazing deal. > > > > > > > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive should allow to keep received system
Wah! Thank you, lalt! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Would ZFS work for a high-bandwidth video SAN?
I'm looking at building a high bandwidth file server to store video for editing, as an alternative to buying a $30,000 hardware RAID and spending $2000 per seat on fibrechannel and specialized SAN drive software. Uncompressed HD runs around 1.2 to 4 gigabits per second, putting it in 10 gigabit Ethernet or FibreChannel territory. Any file server would have to be able to move that many bits in sustained read and sustained write, and doing both simultaneously would be a plus. If the drives were plentiful enough and fast enough, could a RAID-Z (on currently available off-the-shelf hardware) keep up with that? Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
rackmount chassis aren't usually designed with acoustics in mind :) however i might be getting my closet fitted so i can put half a rack in. might switch up my configuration to rack stuff soon. On Mon, Sep 28, 2009 at 3:04 PM, Thomas Burgess wrote: > personally i like this case: > > > http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021 > > it's got 20 hot swap bays, and it's surprisingly well built. For the money, > it's an amazing deal. > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
personally i like this case: http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021 it's got 20 hot swap bays, and it's surprisingly well built. For the money, it's an amazing deal. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive should allow to keep received system unmounted
On 09/28/09 15:54, Igor Velkov wrote: zfs receive should allow option to disable immediately mount of received filesystem. In case of original filesystem have changed mountpoints, it's hard to make clone fs with send-receive, because received filesystem immediately try to mount to old mountpoint, that locked by sourcr fs. In case of different host mountpoint can be locked by unrelated filesystem. Can anybody recommend a way to avoid mountpoint conflict in that cases? The -u option to zfs receive suppresses all mounts. lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
Yeah - give me a bit to rope together the parts list and double check it, and I will post it on my blog. On Mon, Sep 28, 2009 at 2:34 PM, Ware Adams wrote: > On Sep 28, 2009, at 4:20 PM, Michael Shadle wrote: > >> I agree - SOHO usage of ZFS is still a scary "will this work?" deal. I >> found a working setup and I cloned it. It gives me 16x SATA + 2x SATA >> for mirrored boot, 4GB ECC RAM and a quad core processor - total cost >> without disks was ~ $1k I believe. Not too shabby. Emphasis was also >> for acoustics - rack dense would be great but my current living >> situation doesn't warrant that > > This sounds interesting. Do you have any info on it (case you started with, > etc...). > > I'm concerned about noise too as this will be in a closet close to the room > where our television is. Currently there is a MacPro in there which isn't > terribly quiet, but the SuperMicro case is reported to be fairly quiet. > > Thanks, > Ware > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs receive should allow to keep received system unmounted
zfs receive should allow option to disable immediately mount of received filesystem. In case of original filesystem have changed mountpoints, it's hard to make clone fs with send-receive, because received filesystem immediately try to mount to old mountpoint, that locked by sourcr fs. In case of different host mountpoint can be locked by unrelated filesystem. Can anybody recommend a way to avoid mountpoint conflict in that cases? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
On Sep 28, 2009, at 4:20 PM, Michael Shadle wrote: I agree - SOHO usage of ZFS is still a scary "will this work?" deal. I found a working setup and I cloned it. It gives me 16x SATA + 2x SATA for mirrored boot, 4GB ECC RAM and a quad core processor - total cost without disks was ~ $1k I believe. Not too shabby. Emphasis was also for acoustics - rack dense would be great but my current living situation doesn't warrant that This sounds interesting. Do you have any info on it (case you started with, etc...). I'm concerned about noise too as this will be in a closet close to the room where our television is. Currently there is a MacPro in there which isn't terribly quiet, but the SuperMicro case is reported to be fairly quiet. Thanks, Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
In light of all the trouble I've been having with this zpool, I bought a 2TB drive, and I'm going to move all my data over to it, then destroy the pool and start over. Before I do that, what is the best way on an x86 system to format/label the disks? Thanks, Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
This seems like you're doing an awful lot of planning for only 8 SATA + 4 SAS bays? I agree - SOHO usage of ZFS is still a scary "will this work?" deal. I found a working setup and I cloned it. It gives me 16x SATA + 2x SATA for mirrored boot, 4GB ECC RAM and a quad core processor - total cost without disks was ~ $1k I believe. Not too shabby. Emphasis was also for acoustics - rack dense would be great but my current living situation doesn't warrant that. The noisiest components are the 5-in-3 chassis used in the front of the case. I have to keep the fans on high (I tried to swap out for larger, quieter fans, but could not get the fan alarm to shut up) or they go over Seagate's recommended <= 50 degrees. I really should post my parts list up on my blog. I had to choose everything to the best of my research online and hope for the best. On Mon, Sep 28, 2009 at 1:12 PM, Ware Adams wrote: > Hello, > > I have been researching building a home storage server based on OpenSolaris > and ZFS, and I would appreciate any time people could take to comment on my > current leanings. > > I've tried to gather old information from this list as well as the HCL, but > I would welcome anyone's experience on both compatibility and > appropriateness for my goals. I'd love if that white box server wiki page > were set up now, but for now I'll have to just ask here. > > My priorities: > > 1) Data security. I'm hoping I can get this via ECC RAM and enterprise > drives that hopefully don't lie to ZFS about flushing to disk? I'll run > mirrored pools for redundancy (which leads me to want a case w/a lot of > bays). > 2) Compatibility. For me this translates into low upkeep cost (time). I'm > not looking to be the first person to get OpenSolaris running on some > particular piece of hardware. > 3) Scaleable. I'd like to not have to upgrade every year. I can always > use something like an external JBOD array, but there's some appeal to having > enough space in the case for reasonable growth. I'd also like to have > enough performance to keep up with scaling data volume and ZFS features. > 4) Ability to run some other (lightweight) services on the box. I'll be > using NFS (iTunes libraries for OS X clients) and iSCSI (Time Machine > backups) primarily, but my current home server also runs a few small > services (MySQL etc...) that are very lightweight but nevertheless might be > difficult to do on a ZFS (or "ZFS like") appliance > 5) Cost. All things being equal cheaper is better, but I'm willing to pay > more to accomplish particularly 1-3 above. > > My current thinking: > > SuperMicro 7046A-3 Workstation > http://supermicro.com/products/system/4U/7046/SYS-7046A-3.cfm > 8 hot swappable drive bays (SAS or SATA, I'd use SATA) > Network/Main board/SAS/SATA controllers seem well supported by OpenSolaris > Will take IPMI card for remote admin (with video and iso redirection) > 12 RAM slots so I can buy less dense chips > 2x 5.25" drive bays. I'd use a SuperMicro Mobile Rack M14T > (http://www.supermicro.com/products/accessories/mobilerack/CSE-M14.cfm) to > get 4 2.5" SAS drives in one of these. 2 would be used for a mirrored boot > pool leaving 2 for potential future use (like a ZIL on SSD). > > Nehalem E5520 CPU > These are clearly more than enough now, but I'm hoping to have decent CPU > performance for say 5 years (and I'm willing to pay for it up front vs. > upgrading every 2 years...I don't want this to be too time consuming of a > hobby). I'd like to have processor capacity for compression and (hopefully > reasonably soon) de-duplication as well as obviously support ECC RAM. > > Crucial RAM in 4 GB density (price scales linearly up through this point and > I've had good support from Crucial) > > Seagate Barracuda ES.2 1TB SATA (Model ST31000340NS) for storage pool. I > would like to use a larger drive, but I can't find anything rated to run > 24x7 larger than 1TB from Seagate. I'd like to have drives rated for 24x7 > use, and I've had good experience w/Seagate. Again, a larger case gives me > some flexibility here. > > Misc (mainly interested in compatibility b/c it will hardly be used): > Sun XVR-100 video card from eBay > Syba SY-PCI45004 > (http://www.newegg.com/Product/Product.aspx?Item=N82E16816124025) IDE card > for CD-ROM > Sony DDU1678A > (http://www.newegg.com/Product/Product.aspx?Item=N82E16827131061) CD-ROM > > Thanks a lot for any thoughts you might have. > > --Ware > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Comments on home OpenSolaris/ZFS server
Hello, I have been researching building a home storage server based on OpenSolaris and ZFS, and I would appreciate any time people could take to comment on my current leanings. I've tried to gather old information from this list as well as the HCL, but I would welcome anyone's experience on both compatibility and appropriateness for my goals. I'd love if that white box server wiki page were set up now, but for now I'll have to just ask here. My priorities: 1) Data security. I'm hoping I can get this via ECC RAM and enterprise drives that hopefully don't lie to ZFS about flushing to disk? I'll run mirrored pools for redundancy (which leads me to want a case w/a lot of bays). 2) Compatibility. For me this translates into low upkeep cost (time). I'm not looking to be the first person to get OpenSolaris running on some particular piece of hardware. 3) Scaleable. I'd like to not have to upgrade every year. I can always use something like an external JBOD array, but there's some appeal to having enough space in the case for reasonable growth. I'd also like to have enough performance to keep up with scaling data volume and ZFS features. 4) Ability to run some other (lightweight) services on the box. I'll be using NFS (iTunes libraries for OS X clients) and iSCSI (Time Machine backups) primarily, but my current home server also runs a few small services (MySQL etc...) that are very lightweight but nevertheless might be difficult to do on a ZFS (or "ZFS like") appliance 5) Cost. All things being equal cheaper is better, but I'm willing to pay more to accomplish particularly 1-3 above. My current thinking: SuperMicro 7046A-3 Workstation http://supermicro.com/products/system/4U/7046/SYS-7046A-3.cfm 8 hot swappable drive bays (SAS or SATA, I'd use SATA) Network/Main board/SAS/SATA controllers seem well supported by OpenSolaris Will take IPMI card for remote admin (with video and iso redirection) 12 RAM slots so I can buy less dense chips 2x 5.25" drive bays. I'd use a SuperMicro Mobile Rack M14T (http://www.supermicro.com/products/accessories/mobilerack/CSE-M14.cfm ) to get 4 2.5" SAS drives in one of these. 2 would be used for a mirrored boot pool leaving 2 for potential future use (like a ZIL on SSD). Nehalem E5520 CPU These are clearly more than enough now, but I'm hoping to have decent CPU performance for say 5 years (and I'm willing to pay for it up front vs. upgrading every 2 years...I don't want this to be too time consuming of a hobby). I'd like to have processor capacity for compression and (hopefully reasonably soon) de-duplication as well as obviously support ECC RAM. Crucial RAM in 4 GB density (price scales linearly up through this point and I've had good support from Crucial) Seagate Barracuda ES.2 1TB SATA (Model ST31000340NS) for storage pool. I would like to use a larger drive, but I can't find anything rated to run 24x7 larger than 1TB from Seagate. I'd like to have drives rated for 24x7 use, and I've had good experience w/Seagate. Again, a larger case gives me some flexibility here. Misc (mainly interested in compatibility b/c it will hardly be used): Sun XVR-100 video card from eBay Syba SY-PCI45004 (http://www.newegg.com/Product/Product.aspx?Item=N82E16816124025 ) IDE card for CD-ROM Sony DDU1678A (http://www.newegg.com/Product/Product.aspx?Item=N82E16827131061 ) CD-ROM Thanks a lot for any thoughts you might have. --Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)
Frank Middleton wrote: > On 09/28/09 03:00 AM, Joerg Schilling wrote: > > > I am not sure whether my changes will be kept as wikipedia prefers to > > keep badly quoted wrong information before correct information supplied by > > people who have first hand information. > > They actually disallow "first hand information". Everything on Wikipedia > is supposed to be confirmed by secondary or tertiary sources. That's why I This is why wikipedia is wrong in mny cases :-( > asked if there was any supporting documentation - papers, manuals, > proceedings, whatever, that describe the introduction of tmpfs before > 1990. If you were to write a personal page (in Wikipedia if you like) IIR, there was a talk about tmpfs on the Sun User Group meeting around december 6th in San Jose in 1987. Maybe someone finds the proceedings. > http://en.wikipedia.org/wiki/Wikipedia:Reliable_sources > > Wikipedia also has a lofi page (http://en.wikipedia.org/wiki/Lofi) that > redirects to "loop mount". It has no historical section at all... There > is no fbk (file system) page. It is bad practise to advertize own projects on wikipedia. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Should usedbydataset be the same after zfs send/recv for a volume?
On Mon, Sep 28, 2009 at 07:33:56PM -0500, Albert Chin wrote: > When transferring a volume between servers, is it expected that the > usedbydataset property should be the same on both? If not, is it cause > for concern? > > snv114# zfs list tww/opt/vms/images/vios/near.img > NAME USED AVAIL REFER MOUNTPOINT > tww/opt/vms/images/vios/near.img 70.5G 939G 15.5G - > snv114# zfs get usedbydataset tww/opt/vms/images/vios/near.img > NAME PROPERTY VALUE SOURCE > tww/opt/vms/images/vios/near.img usedbydataset 15.5G - > > snv119# zfs list t/opt/vms/images/vios/near.img > NAME USED AVAIL REFER MOUNTPOINT > t/opt/vms/images/vios/near.img 14.5G 2.42T 14.5G - > snv119# zfs get usedbydataset t/opt/vms/images/vios/near.img > NAMEPROPERTY VALUE SOURCE > t/opt/vms/images/vios/near.img usedbydataset 14.5G - Don't know if it matters but disks on both send/recv server are different, 300GB FCAL on the send, 750GB SATA on the recv. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Should usedbydataset be the same after zfs send/recv for a volume?
When transferring a volume between servers, is it expected that the usedbydataset property should be the same on both? If not, is it cause for concern? snv114# zfs list tww/opt/vms/images/vios/near.img NAME USED AVAIL REFER MOUNTPOINT tww/opt/vms/images/vios/near.img 70.5G 939G 15.5G - snv114# zfs get usedbydataset tww/opt/vms/images/vios/near.img NAME PROPERTY VALUE SOURCE tww/opt/vms/images/vios/near.img usedbydataset 15.5G - snv119# zfs list t/opt/vms/images/vios/near.img NAME USED AVAIL REFER MOUNTPOINT t/opt/vms/images/vios/near.img 14.5G 2.42T 14.5G - snv119# zfs get usedbydataset t/opt/vms/images/vios/near.img NAMEPROPERTY VALUE SOURCE t/opt/vms/images/vios/near.img usedbydataset 14.5G - -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which directories must be part of rpool?
Chris Gerhard wrote: > TMPFS was not in the first release of 4.0. It was introduced to boost the > performance of diskless clients which no longer had the old network disk for > their root file systems and hence /tmp was now over NFS. I did receive the SunOS-4.0 sources for my master thesis (a copy on write WORM filesystem) and this source did contain tmpfs. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)
Darren J Moffat wrote: > Joerg Schilling wrote: > > > > Just to prove my information: I invented "fbk" (which Sun now calls "lofi") > > Sun does NOT call your fbk by the name lofi. Lofi is a completely > different implementation of the same concept. With this kind of driver the implementation coding trivial, it is the idea that is important. So it does not matter that lofi is a reimplementation. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, 28 Sep 2009, Richard Elling wrote: In other words, I am concerned that people replace good data protection practices with scrubs and expecting scrub to deliver better data protection (it won't). Many people here would profoundly disagree with the above. There is no substitute for good backups, but a periodic scrub helps validate that a later resilver would succeed. A perioic scrub also helps find system problems early when they are less likely to crater your business. It is much better to find an issue during a scrub rather than during resilver of a mirror or raidz. Scrubs are also useful for detecting broken hardware. However, normal activity will also detect broken hardware, so it is better to think of scrubs as finding degradation of old data rather than being a hardware checking service. Do you have a scientific reference for this notion that "old data" is more likely to be corrupt than "new data" or is it just a gut-feeling? This hypothesis does not sound very supportable to me. Magnetic hysteresis lasts quite a lot longer than the recommended service life for a hard drive. Studio audio tapes from the '60s are still being used to produce modern "remasters" of old audio recordings which sound better than they ever did before (other than the master tape). Some forms of magnetic hysteresis are known to last millions of years. Media failure is more often than not mechanical or chemical and not related to loss of magnetic hysteresis. Head failures may be construed to be media failures. See http://en.wikipedia.org/wiki/Ferromagnetic for information on ferromagnetic materials. It would be most useful if zfs incorporated a slow-scan scrub which validates data at a low rate of speed which does not hinder active I/O. Of course this is not a "green" energy efficient solution. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] refreservation not transferred by zfs send when sending a volume?
On 29.09.09 03:58, Albert Chin wrote: snv114# zfs get used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation tww/opt/vms/images/vios/mello-0.img NAME PROPERTY VALUE SOURCE tww/opt/vms/images/vios/mello-0.img used 30.6G - tww/opt/vms/images/vios/mello-0.img reservation none default tww/opt/vms/images/vios/mello-0.img volsize 25G- tww/opt/vms/images/vios/mello-0.img refreservation25Glocal tww/opt/vms/images/vios/mello-0.img usedbydataset 5.62G - tww/opt/vms/images/vios/mello-0.img usedbyrefreservation 25G- Sent tww/opt/vms/images/vios/mello-0.img from snv_114 server to snv_119 server. On snv_119 server: snv119# zfs get used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation t/opt/vms/images/vios/mello-0.img NAME PROPERTY VALUE SOURCE t/opt/vms/images/vios/mello-0.img used 5.32G - t/opt/vms/images/vios/mello-0.img reservation none default t/opt/vms/images/vios/mello-0.img volsize 25G- t/opt/vms/images/vios/mello-0.img refreservationnone default t/opt/vms/images/vios/mello-0.img usedbydataset 5.32G - t/opt/vms/images/vios/mello-0.img usedbyrefreservation 0 - Any reason the refreservation and usedbyrefreservation properties are not sent? 6853862 refquota property not send over with zfs send -R also affects refreservation, fixed in snv_121 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] refreservation not transferred by zfs send when sending a volume?
On Sep 28, 2009, at 6:58 PM, Albert Chin wrote: Any reason the refreservation and usedbyrefreservation properties are not sent? I believe this was CR 6853862, fixed in snv_121. -Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On 28.09.09 22:01, Richard Elling wrote: On Sep 28, 2009, at 10:31 AM, Victor Latushkin wrote: Richard Elling wrote: On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. Too bad we cannot scrub a dataset/object. Can you provide a use case? I don't see why scrub couldn't start and stop at specific txgs for instance. That won't necessarily get you to a specific file, though. With ever increasing disk and pool sizes it takes more and more time for scrub to complete its job. Let's imagine that you have 100TB pool with 90TB of data in it, and there's dataset with 10TB that is critical and another dataset with 80TB that is not that critical and you can afford loosing some blocks/files there. Personally, I have three concerns here. 1. Gratuitous complexity, especially inside a pool -- aka creeping featurism There's the idea of priority-based resilvering (though not implemented yet, see http://blogs.sun.com/bonwick/en_US/entry/smokin_mirrors) that can be simply extended to scrubs as well. 2. Wouldn't a better practice be to use two pools with different protection policies? The only protection policy differences inside a pool are copies. In other words, I am concerned that people replace good data protection practices with scrubs and expecting scrub to deliver better data protection (it won't). It may be better, it may be not... With two pools you split you bandwidth and IOPS and space and have more entities to care about... 3. Since the pool contains the set of blocks, shared by datasets, it is not clear to me that scrubbing a dataset will detect all of the data corruption failures which can affect the dataset. I'm thinking along the lines of phantom writes, for example. That is why it may be useful to always scrub pool-wide metadata or have a way to specifically request it. 4. the time it takes to scrub lots of stuff ...there are four concerns... :-) For magnetic media, a yearly scrub interval should suffice for most folks. I know some folks who scrub monthly. More frequent scrubs won't buy much. It won't buy you much in term of magnetic media decay discovery. Unfortunately, there other sources of corruption as well (including phantom writes you are thinking about), and being able to discover corruption and recover it as quickly as possible from the backup it a good thing. Scrubs are also useful for detecting broken hardware. However, normal activity will also detect broken hardware, so it is better to think of scrubs as finding degradation of old data rather than being a hardware checking service. So being able to scrub individual dataset would help to run scrubs of critical data more frequently and faster and schedule scrubs for less frequently used and/or less important data to happen much less frequently. It may be useful to have a way to tell ZFS to scrub pool-wide metadata only (space maps etc), so that you can build your own schedule of scrubs. Another interesting idea is to be able to scrub only blocks modified since last snapshot. This can be relatively easy to implement. But remember that scrubs are most useful for finding data which has degraded from the media. In other words, old data. New data is not likely to have degraded yet, and since ZFS is COW, all of the new data is, well, new. This is why having the ability to bound the start and end of a scrub by txg can be easy and perhaps useful. This requires exporting concept of the transaction group numbers to the user and i do not see how it is less complex from the user interface perspective than being able to request scrub of individual dataset, pool-wide metadata or newly-written data. regards, victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] refreservation not transferred by zfs send when sending a volume?
snv114# zfs get used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation tww/opt/vms/images/vios/mello-0.img NAME PROPERTY VALUE SOURCE tww/opt/vms/images/vios/mello-0.img used 30.6G - tww/opt/vms/images/vios/mello-0.img reservation none default tww/opt/vms/images/vios/mello-0.img volsize 25G- tww/opt/vms/images/vios/mello-0.img refreservation25Glocal tww/opt/vms/images/vios/mello-0.img usedbydataset 5.62G - tww/opt/vms/images/vios/mello-0.img usedbyrefreservation 25G- Sent tww/opt/vms/images/vios/mello-0.img from snv_114 server to snv_119 server. On snv_119 server: snv119# zfs get used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation t/opt/vms/images/vios/mello-0.img NAME PROPERTY VALUE SOURCE t/opt/vms/images/vios/mello-0.img used 5.32G - t/opt/vms/images/vios/mello-0.img reservation none default t/opt/vms/images/vios/mello-0.img volsize 25G- t/opt/vms/images/vios/mello-0.img refreservationnone default t/opt/vms/images/vios/mello-0.img usedbydataset 5.32G - t/opt/vms/images/vios/mello-0.img usedbyrefreservation 0 - Any reason the refreservation and usedbyrefreservation properties are not sent? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OS install question
On 09/28/09 01:22 PM, David Dyer-Bennet wrote: That seems truly bizarre. Virtualbox recommends 16GB, and after doing an install there's about 12GB free. There's no way Solaris will install in 4GB if I understand what you are saying. Maybe fresh off a CD when it doesn't have to download a copy first, but the reality is 16GB is not possible unless you don't want ever to to an image update. What version are you running? Have you ever tried pkg image-update? # uname -a SunOS host8 5.11 snv_111b i86pc i386 i86pc Solaris # df -h Filesystem Size Used Avail Use% Mounted on rpool/ROOT/opensolaris-2 34G 13G 22G 37% / # du -sh /var/pkg/download/ 762M/var/pkg/download/ this after deleting all old BEs and all snapshots but not emptying /var/pkg/download; swap/boot are on different slices. SPARC is similar; snv122 takes 11Gb after deleting old BEs, all snapshots, *and* /var/pkg/downloads; *without* /opt, swap, /var/crash, /var/dump, /var/tmp, /var/run and /export... AFAIK It is absolutely impossible to do a pkg image-update (say) from snv111b to snv122 without at least 9GB free (it says 8GB in the documentation). If the baseline is 11GB, you need 20GB for an install, and that leaves you zip to spare. Obvious reasons include before and after snaps, download before install, and total rollback capability. This is all going to cost some space. I believe there is a CR about this, but IMO when you can get 2TB of disk for $200 it's hard to complain. 32GB of SSD is not unreasonable and 16GB simply won't hack it. All the above is based on actual and sometimes painful experience. You *really* don't want to run out of space during an update. You'll almost certainly end up restoring your boot disk if you do and if you don't, you'll never get back all the space. Been there, done that... Cheers -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Sep 28, 2009, at 10:31 AM, Victor Latushkin wrote: Richard Elling wrote: On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. Too bad we cannot scrub a dataset/object. Can you provide a use case? I don't see why scrub couldn't start and stop at specific txgs for instance. That won't necessarily get you to a specific file, though. With ever increasing disk and pool sizes it takes more and more time for scrub to complete its job. Let's imagine that you have 100TB pool with 90TB of data in it, and there's dataset with 10TB that is critical and another dataset with 80TB that is not that critical and you can afford loosing some blocks/files there. Personally, I have three concerns here. 1. Gratuitous complexity, especially inside a pool -- aka creeping featurism 2. Wouldn't a better practice be to use two pools with different protection policies? The only protection policy differences inside a pool are copies. In other words, I am concerned that people replace good data protection practices with scrubs and expecting scrub to deliver better data protection (it won't). 3. Since the pool contains the set of blocks, shared by datasets, it is not clear to me that scrubbing a dataset will detect all of the data corruption failures which can affect the dataset. I'm thinking along the lines of phantom writes, for example. 4. the time it takes to scrub lots of stuff ...there are four concerns... :-) For magnetic media, a yearly scrub interval should suffice for most folks. I know some folks who scrub monthly. More frequent scrubs won't buy much. Scrubs are also useful for detecting broken hardware. However, normal activity will also detect broken hardware, so it is better to think of scrubs as finding degradation of old data rather than being a hardware checking service. So being able to scrub individual dataset would help to run scrubs of critical data more frequently and faster and schedule scrubs for less frequently used and/or less important data to happen much less frequently. It may be useful to have a way to tell ZFS to scrub pool-wide metadata only (space maps etc), so that you can build your own schedule of scrubs. Another interesting idea is to be able to scrub only blocks modified since last snapshot. This can be relatively easy to implement. But remember that scrubs are most useful for finding data which has degraded from the media. In other words, old data. New data is not likely to have degraded yet, and since ZFS is COW, all of the new data is, well, new. This is why having the ability to bound the start and end of a scrub by txg can be easy and perhaps useful. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz failure, trying to recover
Liam Slusser wrote: Long story short, my cat jumped on my server at my house crashing two drives at the same time. It was a 7 drive raidz (next time ill do raidz2). Long story short - we've been able to get access to data in the pool. This involved finding better old state with the help of 'zdb -t', then verifying metadata checksums with 'zdb -eubbcsL', then extracting configuration from the pool, making cache file from the extracted configuration and finally importing pool (readonly at the moment) to back up data. As soon as it is backed up, we'll try to do read-write import... victor The server crashed complaining about a drive failure, so i rebooted into single user mode not realizing that two drives failed. I put in a new 500g replacement and had zfs start a replace operation which failed at about 2% because there was two broken drives. From that point i turned off the computer and sent both drives to a data recovery place. They were able to recover the data on one of the two drives (the one that i started the replace operation on) - great - that should be enough to get my data back. I popped the newly recovered drive back in, it had an older tgx number then the other drives so i made a backup of each drive and then modified the tgx number to an earlier tgx number so they all match. However i am still unable to mount the array - im getting the following error: (doesnt matter if i use -f or -F) bash-3.2# zpool import data pool: data id: 6962146434836213226 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: data UNAVAIL missing device raidz1 DEGRADED c0t0d0 ONLINE c0t1d0 ONLINE replacing ONLINE c0t2d0 ONLINE c0t7d0 ONLINE c0t3d0 UNAVAIL cannot open c0t4d0 ONLINE c0t5d0 ONLINE c0t6d0 ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined. Now i should have enough online devices to mount and get my data off however no luck. I'm not really sure where to go at this point. Do i have to fake a c0t3d0 drive so it thinks all drives are there? Can somebody point me in the right direction? thanks, liam p.s. To help me find which uberblocks to modify to reset the tgx i wrote a little perl program which finds and prints out information in order to revert to an earlier tgx value. Its a little messy since i wrote it super late at night quickly - but maybe it will help somebody else out. http://liam821.com/findUberBlock.txt (its just a perl script) Its easy to run. It pulls in 256k of data and sorts it (or skipping X kbyte if you use the -s ###) and then searches for uberblocks. (remember there is 4 labels, 0 256, and then two at the end of the disk. You need to manually figure out the end skip value...) Calculating the GUID seems to always fail because the number is to large for perl so it returns a negative number. meh wasnt important enough to try to figure out. (the info below has NOTHING to do with my disk problem above, its a happy and health server that i wrote the tool on) - find newest tgx number bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -n block=148 (0025000) transaction=15980419 - print verbose output bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -n -v block=148 (0025000) zfs_ver=3 (0003 ) transaction=15980419(d783 00f3 ) guid_sum=-14861410676147539 (7aad 2fc9 33a0 ffcb) timestamp=1253958103(e1d7 4abd ) (Sat Sep 26 02:41:43 2009) raw = 0025000 b10c 00ba 0003 0025010 d783 00f3 7aad 2fc9 33a0 ffcb 0025020 e1d7 4abd 0001 - list all uberblocks bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -l block=145 (0024400) transaction=15980288 block=146 (0024800) transaction=15980289 block=147 (0024c00) transaction=15980290 block=148 (0025000) transaction=15980291 block=149 (0025400) transaction=15980292 block=150 (0025800) transaction=15980293 block=151 (0025c00) transaction=15980294 block=152 (0026000) transaction=15980295 block=153 (0026400) transaction=15980296 block=154 (0026800) transaction=15980297 block=155 (0026c00) transaction=15980298 block=156 (0027000) transaction=15980299 block=157 (0027400) transaction=15980300 block=158 (0027800) transaction=15980301 . . . - skip to 256 into the disk and find the newest uberblock bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -n -s 256 block=507 (7ec00) transaction=15980522 Now lets say i want to go back in time on this, using the program can help me do that. If i wanted
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
Richard Elling wrote: On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. Too bad we cannot scrub a dataset/object. Can you provide a use case? I don't see why scrub couldn't start and stop at specific txgs for instance. That won't necessarily get you to a specific file, though. With ever increasing disk and pool sizes it takes more and more time for scrub to complete its job. Let's imagine that you have 100TB pool with 90TB of data in it, and there's dataset with 10TB that is critical and another dataset with 80TB that is not that critical and you can afford loosing some blocks/files there. So being able to scrub individual dataset would help to run scrubs of critical data more frequently and faster and schedule scrubs for less frequently used and/or less important data to happen much less frequently. It may be useful to have a way to tell ZFS to scrub pool-wide metadata only (space maps etc), so that you can build your own schedule of scrubs. Another interesting idea is to be able to scrub only blocks modified since last snapshot. victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, Sep 28, 2009 at 10:16:20AM -0700, Richard Elling wrote: > On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: > >> On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: >>> On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. >>> >>> This should work but it does not verify the redundant metadata. For >>> example, the duplicate metadata copy might be corrupt but the problem >>> is not detected since it did not happen to be used. >> >> Too bad we cannot scrub a dataset/object. > > Can you provide a use case? I don't see why scrub couldn't start and > stop at specific txgs for instance. That won't necessarily get you to a > specific file, though. If your pool is borked but mostly readable, yet some file systems have cksum errors, you cannot "zfs send" that file system (err, snapshot of filesystem). So, you need to manually fix the file system by traversing it to read all files to determine which must be fixed. Once this is done, you can snapshot and "zfs send". If you have many file systems, this is time consuming. Of course, you could just rsync and be happy with what you were able to recover, but if you have clones branched from the same parent, which a few differences inbetween shapshots, having to rsync *everything* rather than just the differences is painful. Hence the reason to try to get "zfs send" to work. But, this is an extreme example and I doubt pools are often in this state so the engineering time isn't worth it. In such cases though, a "zfs scrub" would be useful. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, 28 Sep 2009, Bob Friesenhahn wrote: This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. I am finding that your tar incantation is reading hardly any data from disk when testing my home directory and the 'tar' happens to be GNU tar: # time tar cf - . > /dev/null tar cf - . > /dev/null 2.72s user 12.43s system 96% cpu 15.721 total # du -sh . 82G Looks like the GNU folks slipped in a small performance "enhancement" if the output is to /dev/null. Make sure to use /bin/tar, which seems to actually read the data. When actually reading the data via tar, read performance is very poor. Hopefully I will have a ZFS IDR to test with in the next few days which fixes the prefetch bug. Zpool scrub reads the data at 360MB/second but this tar method is only reading at an average of 6MB/second to 42MB/second (according to zpool iostat). Wups, I just saw a one-minute average of 105MB and then 131MB. Quite variable. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OS install question
On Mon, September 28, 2009 07:56, Frank Middleton wrote: > On 09/28/09 12:40 AM, Ron Watkins wrote: >> >> Thus, im at a loss as to how to get the root pool setup as a 20Gb >> slice > > 20GB is too small. You'll be fighting for space every time > you use pkg. From my considerable experience installing to a > 20GB mirrored rpool, I would go for 32GB if you can. That seems truly bizarre. Virtualbox recommends 16GB, and after doing an install there's about 12GB free. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, Sep 28, 2009 at 12:16 PM, Richard Elling wrote: > On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: > > On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: >> >>> On Mon, 28 Sep 2009, Richard Elling wrote: >>> Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. >>> >>> This should work but it does not verify the redundant metadata. For >>> example, the duplicate metadata copy might be corrupt but the problem >>> is not detected since it did not happen to be used. >>> >> >> Too bad we cannot scrub a dataset/object. >> > > Can you provide a use case? I don't see why scrub couldn't start and > stop at specific txgs for instance. That won't necessarily get you to a > specific file, though. > -- richard > > > On Mon, Sep 28, 2009 at 12:16 PM, Richard Elling wrote: > On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: > > On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: >> >>> On Mon, 28 Sep 2009, Richard Elling wrote: >>> Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. >>> >>> This should work but it does not verify the redundant metadata. For >>> example, the duplicate metadata copy might be corrupt but the problem >>> is not detected since it did not happen to be used. >>> >> >> Too bad we cannot scrub a dataset/object. >> > > Can you provide a use case? I don't see why scrub couldn't start and > stop at specific txgs for instance. That won't necessarily get you to a > specific file, though. > -- richard > I get the impression he just wants to check a single file in a pool without waiting for it to check the entire pool. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. Too bad we cannot scrub a dataset/object. Can you provide a use case? I don't see why scrub couldn't start and stop at specific txgs for instance. That won't necessarily get you to a specific file, though. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: > On Mon, 28 Sep 2009, Richard Elling wrote: >> >> Scrub could be faster, but you can try >> tar cf - . > /dev/null >> >> If you think about it, validating checksums requires reading the data. >> So you simply need to read the data. > > This should work but it does not verify the redundant metadata. For > example, the duplicate metadata copy might be corrupt but the problem > is not detected since it did not happen to be used. Too bad we cannot scrub a dataset/object. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Paul, Thanks for additional data, please see comments inline. Paul Archer wrote: 7:56pm, Victor Latushkin wrote: While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool replace' because the zpool isn't online. ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it controls entire disk. As before upgrade it looked like this: NAMESTATE READ WRITE CKSUM datapoolONLINE 0 0 0 raidz1ONLINE 0 0 0 c2d0s0 ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 c4d0s0 ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 I guess something happened to the labeling of disk c7d0 (used to be c2d0) before, during or after upgrade. It would be nice to show what zdb -l shows for this disk and some other disk too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too. This is from c7d0: LABEL 0 version=13 name='datapool' state=0 txg=233478 pool_guid=3410059226836265661 hostid=519305 hostname='shebop' top_guid=7679950824008134671 guid=17458733222130700355 vdev_tree type='raidz' id=0 guid=7679950824008134671 nparity=1 metaslab_array=23 metaslab_shift=32 ashift=9 asize=7501485178880 is_log=0 children[0] type='disk' id=0 guid=17458733222130700355 path='/dev/dsk/c7d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742049/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@0/i...@1/c...@0,0:a' whole_disk=1 This is why ZFS does not show s0 in the zpool output for c7d0 - it controls entire disk. I guess initially it was the other way - it is unlikely that you specified disks differently at creation time and earlier output suggests that it was other way. So somthing happened before last system reboot that most likely relabeled your c7d0 disk, and configuration in the labels was updated. DTL=588 children[1] type='disk' id=1 guid=4735756507338772729 path='/dev/dsk/c8d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742050/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@0/c...@0,0:a' whole_disk=0 All the other disks have whole_disk=0, so there's s0 in the zpool output for those disks. DTL=467 children[2] type='disk' id=2 guid=10113358996255761229 path='/dev/dsk/c9d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742059/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@1/c...@0,0:a' whole_disk=0 DTL=573 children[3] type='disk' id=3 guid=11460855531791764612 path='/dev/dsk/c11d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742048/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@1/c...@0,0:a' whole_disk=0 DTL=571 children[4] type='disk' id=4 guid=14986691153111294171 path='/dev/dsk/c10d0s0' devid='id1,c...@ast31500341as=9vs0ttwf/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@0/c...@0,0:a' whole_disk=0 DTL=473 Labels 1-3 are identical The other disks in the pool give identical results (except for the guid's, which match with what's above). Ok, then let's look at the vtoc - probably we can find something interesting there. c8d0 - c11d0 are identical, so I didn't include that output below: This is expected. So let's look for the differences: r...@shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0 * /dev/rdsk/c7d0s0 partition map * * Unallocated space: * First SectorLast * Sector CountSector * 34 222 255 * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400256 2930247391 2930247646 8 1100 2930247647 16384 2930264030 r...@shebop:/tmp# r...@shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0 * /dev/rdsk/c8d0s0 partition map * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 1700 34 2930277101 2930277134 Now you can clearly see the difference between the two: 4 disks have only one
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, 28 Sep 2009, Richard Elling wrote: Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. This should work but it does not verify the redundant metadata. For example, the duplicate metadata copy might be corrupt but the problem is not detected since it did not happen to be used. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Sep 28, 2009, at 2:41 PM, Albert Chin wrote: Without doing a zpool scrub, what's the quickest way to find files in a filesystem with cksum errors? Iterating over all files with "find" takes quite a bit of time. Maybe there's some zdb fu that will perform the check for me? Scrub could be faster, but you can try tar cf - . > /dev/null If you think about it, validating checksums requires reading the data. So you simply need to read the data. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ARC vs Oracle cache
Been there, done that, got the tee shirt A larger SGA will *always* be more efficient at servicing Oracle requests for blocks. You avoid going through all the IO code of Oracle and it simply reduces to a hash. http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle al...@sun wrote: Hi all, There is no generic response for: Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? We can awser: Have a large enough SGA do get good cache hit ratio (higher than 90 % for OLTP). Have some GB ZFS arc (Not less than 500M, usually more than 16GB is not usefull). Then you have to tune. We know that ZFS cache help the database reads. The cache strategies of ZFS and Oracle are different, and usually they help each other. The is no reason to avoid to cache the same data twice. Exemple: Oracle query ask for a range scan on index. ZFS detect sequential reads and start to prefetch the data. ZFS try to cache the data that Oracle will probably ask next. When Oracle ask, the data is cache twice. All the cache are dynamics. The best knowned record size for an OLTP environment is : Dataset Recordsize Table Data 8K (db_block_size) Redo Logs 128K Index 8K (db_block_size) Undo 128K Temp 128K We still recommand a distinct zpool for redologs. Regards. Alain Chéreau Enda O'Connor a écrit : Richard Elling wrote: On Sep 24, 2009, at 10:30 AM, Javier Conde wrote: Hello, Given the following configuration: * Server with 12 SPARCVII CPUs and 96 GB of RAM * ZFS used as file system for Oracle data * Oracle 10.2.0.4 with 1.7TB of data and indexes * 1800 concurrents users with PeopleSoft Financial * 2 PeopleSoft transactions per day * HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), total 48 disks * 2x 4Gbps FC with MPxIO Which is the best Oracle SGA size to avoid cache duplication between Oracle and ZFS? Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? Who does a better cache for overall performance? In general, it is better to cache closer to the consumer (application). You don't mention what version of Solaris or ZFS you are using. For later versions, the primarycache property allows you to control the ARC usage on a per-dataset basis. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Hi addign oracle-interest I would suggest some testing but standard recommendation to start with are keep zfs record size is db block size, keep oracle log writer to it's own pool ( 128k recordsize is recommended I believe for this one ), the log writer is a io limiting factor as such , use latest Ku's for solaris as they contain some critical fixes for zfs/oracle, ie 6775697 for instance. Small SGA is not usually recommended, but of course a lot depends on application layer as well, I can only say test with the recommendations above and then deviate from there, perhaps keeping zil on separate high latency device might help ( again only analysis can determine all that ). Then remember that even after that with a large SGA etc, sometimes perf can degrade, ie might need to instruct oracle to actually cache, via alter table cache command etc. getting familiar with statspack aws will be a must here :-) as only an analysis of Oracle from an oracle point of view can really tell what is workign as such. Enda ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ARC vs Oracle cache
Hi all, There is no generic response for: Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? We can awser: Have a large enough SGA do get good cache hit ratio (higher than 90 % for OLTP). Have some GB ZFS arc (Not less than 500M, usually more than 16GB is not usefull). Then you have to tune. We know that ZFS cache help the database reads. The cache strategies of ZFS and Oracle are different, and usually they help each other. The is no reason to avoid to cache the same data twice. Exemple: Oracle query ask for a range scan on index. ZFS detect sequential reads and start to prefetch the data. ZFS try to cache the data that Oracle will probably ask next. When Oracle ask, the data is cache twice. All the cache are dynamics. The best knowned record size for an OLTP environment is : Dataset Recordsize Table Data 8K (db_block_size) Redo Logs 128K Index 8K (db_block_size) Undo 128K Temp 128K We still recommand a distinct zpool for redologs. Regards. Alain Chéreau Enda O'Connor a écrit : Richard Elling wrote: On Sep 24, 2009, at 10:30 AM, Javier Conde wrote: Hello, Given the following configuration: * Server with 12 SPARCVII CPUs and 96 GB of RAM * ZFS used as file system for Oracle data * Oracle 10.2.0.4 with 1.7TB of data and indexes * 1800 concurrents users with PeopleSoft Financial * 2 PeopleSoft transactions per day * HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), total 48 disks * 2x 4Gbps FC with MPxIO Which is the best Oracle SGA size to avoid cache duplication between Oracle and ZFS? Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small ZFS ARC"? Who does a better cache for overall performance? In general, it is better to cache closer to the consumer (application). You don't mention what version of Solaris or ZFS you are using. For later versions, the primarycache property allows you to control the ARC usage on a per-dataset basis. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Hi addign oracle-interest I would suggest some testing but standard recommendation to start with are keep zfs record size is db block size, keep oracle log writer to it's own pool ( 128k recordsize is recommended I believe for this one ), the log writer is a io limiting factor as such , use latest Ku's for solaris as they contain some critical fixes for zfs/oracle, ie 6775697 for instance. Small SGA is not usually recommended, but of course a lot depends on application layer as well, I can only say test with the recommendations above and then deviate from there, perhaps keeping zil on separate high latency device might help ( again only analysis can determine all that ). Then remember that even after that with a large SGA etc, sometimes perf can degrade, ie might need to instruct oracle to actually cache, via alter table cache command etc. getting familiar with statspack aws will be a must here :-) as only an analysis of Oracle from an oracle point of view can really tell what is workign as such. Enda ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
7:56pm, Victor Latushkin wrote: While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool replace' because the zpool isn't online. ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it controls entire disk. As before upgrade it looked like this: NAMESTATE READ WRITE CKSUM datapoolONLINE 0 0 0 raidz1ONLINE 0 0 0 c2d0s0 ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 c4d0s0 ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 I guess something happened to the labeling of disk c7d0 (used to be c2d0) before, during or after upgrade. It would be nice to show what zdb -l shows for this disk and some other disk too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too. This is from c7d0: LABEL 0 version=13 name='datapool' state=0 txg=233478 pool_guid=3410059226836265661 hostid=519305 hostname='shebop' top_guid=7679950824008134671 guid=17458733222130700355 vdev_tree type='raidz' id=0 guid=7679950824008134671 nparity=1 metaslab_array=23 metaslab_shift=32 ashift=9 asize=7501485178880 is_log=0 children[0] type='disk' id=0 guid=17458733222130700355 path='/dev/dsk/c7d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742049/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@0/i...@1/c...@0,0:a' whole_disk=1 DTL=588 children[1] type='disk' id=1 guid=4735756507338772729 path='/dev/dsk/c8d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742050/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@0/c...@0,0:a' whole_disk=0 DTL=467 children[2] type='disk' id=2 guid=10113358996255761229 path='/dev/dsk/c9d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742059/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@1/c...@0,0:a' whole_disk=0 DTL=573 children[3] type='disk' id=3 guid=11460855531791764612 path='/dev/dsk/c11d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742048/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@1/c...@0,0:a' whole_disk=0 DTL=571 children[4] type='disk' id=4 guid=14986691153111294171 path='/dev/dsk/c10d0s0' devid='id1,c...@ast31500341as=9vs0ttwf/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@0/c...@0,0:a' whole_disk=0 DTL=473 Labels 1-3 are identical The other disks in the pool give identical results (except for the guid's, which match with what's above). c8d0 - c11d0 are identical, so I didn't include that output below: r...@shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0 * /dev/rdsk/c7d0s0 partition map * * Dimensions: * 512 bytes/sector * 2930264064 sectors * 2930263997 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First SectorLast * Sector CountSector * 34 222 255 * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400256 2930247391 2930247646 8 1100 2930247647 16384 2930264030 r...@shebop:/tmp# r...@shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0 * /dev/rdsk/c8d0s0 partition map * * Dimensions: * 512 bytes/sector * 2930264064 sectors * 2930277101 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 1700 34 2930277101 2930277134 Thanks for the help! Paul Archer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Quickest way to find files with cksum errors without doing scrub
Without doing a zpool scrub, what's the quickest way to find files in a filesystem with cksum errors? Iterating over all files with "find" takes quite a bit of time. Maybe there's some zdb fu that will perform the check for me? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
On 28.09.09 18:09, Paul Archer wrote: 8:30am, Paul Archer wrote: And the hits just keep coming... The resilver finished last night, so rebooted the box as I had just upgraded to the latest Dev build. Not only did the upgrade fail (love that instant rollback!), but now the zpool won't come online: r...@shebop:~# zpool import pool: datapool id: 3410059226836265661 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: datapool UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data c7d0 ONLINE c8d0s0 ONLINE c9d0s0 ONLINE c11d0s0 ONLINE c10d0s0 ONLINE I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. Is it OK to scream and tear my hair out now? A little more research came up with this: r...@shebop:~# zdb -l /dev/dsk/c7d0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool replace' because the zpool isn't online. ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it controls entire disk. As before upgrade it looked like this: NAMESTATE READ WRITE CKSUM datapoolONLINE 0 0 0 raidz1ONLINE 0 0 0 c2d0s0 ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 c4d0s0 ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 I guess something happened to the labeling of disk c7d0 (used to be c2d0) before, during or after upgrade. It would be nice to show what zdb -l shows for this disk and some other disk too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too. victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OS install question
Hi Ron, Any reason why you want to use slices except for the root pool? I would recommend a 4-disk configuration like this: mirrored root pool on c1t0d0s0 and c2t0d0s0 mirrored app pool on c1t1d0 and c2t1d0 Let the install use one big slice for each disk in the mirrored root pool, which is required for booting and whole disks for the app pool. Other than for the root pool, slices are not required. In the future, you can attach/add more disks to the app pool and/or replace with larger disks in either pool. Any additional administration, such as trying to expand a slice (you can't expand an existing slice under a live pool) or reconfiguration is much easier without having to muck with slices. Cindy On 09/27/09 18:41, Ron Watkins wrote: I have a box with 4 disks. It was my intent to place a mirrored root partition on 2 disks on different controllers, then use the remaining space and the other 2 disks to create a raid-5 configuration from which to export iscsi luns for use by other hosts. The problem im having is that when I try to install OS, it either takes the entire disk or a partition the same size as the entire disk. I tried creating 2 slices, but the install won't allow it and if I make the solaris partition smaller, then the OS no longer sees the rest of the disk, only the small piece. I found references on how to mirror the root disk pool, but the grub piece doesn't seem to work as when I disconnect the first disk all I get at reboot is a grub prompt. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
8:30am, Paul Archer wrote: And the hits just keep coming... The resilver finished last night, so rebooted the box as I had just upgraded to the latest Dev build. Not only did the upgrade fail (love that instant rollback!), but now the zpool won't come online: r...@shebop:~# zpool import pool: datapool id: 3410059226836265661 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: datapool UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data c7d0 ONLINE c8d0s0 ONLINE c9d0s0 ONLINE c11d0s0 ONLINE c10d0s0 ONLINE I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. Is it OK to scream and tear my hair out now? A little more research came up with this: r...@shebop:~# zdb -l /dev/dsk/c7d0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool replace' because the zpool isn't online. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
On 2009/09/28, at 22:09, Jim Grisanzio wrote: Jorgen Lundman wrote: When I approach Sun-Japan directly I just get told that they don't speak English. When my Japanese colleagues approach Sun-Japan directly, it is suggested to us that we stay with our current Vendor. hey ... I work at Sun Japan in the Yoga office. I can connect you with English speakers here. Contact me off list if you are interested. Also, there are some Japan lists for OpenSolaris you may want to subscribe to: The Japan OpenSolaris User Group and The Tokyo OpenSolaris User Group. The Japan group is mostly Japanese, but we are trying to build an international group in English for the Tokyo OSUG. There are bi-lingual westerners and Japanese on both lists, and we have events in Yoga as well. http://mail.opensolaris.org/mailman/listinfo/ug-tsug (English ) http://mail.opensolaris.org/mailman/listinfo/ug-jposug (Japanese) Jim -- http://blogs.sun.com/jimgris/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Hi, I am a leader of Tokyo OpenSolaris User Group.as Jimgris says,we are now trying to build an international group like Tokyo2pointO and Tokyo Linux User Group(TLUG) You can ask us in English in our tsug mailing list,especially issue in Japan - OpenSolaris support program in Japan. We hope we would be your help. Thanks, Masafumi Ohta a Leader of Tokyo OpenSolaris User Group mailto:masafumi.o...@gmail.com http://www.twitter.com/masafumi_ohta ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Yesterday, Paul Archer wrote: I estimate another 10-15 hours before this disk is finished resilvering and the zpool is OK again. At that time, I'm going to switch some hardware out (I've got a newer and higher-end LSI card that I hadn't used before because it's PCI-X, and won't fit on my current motherboard.) I'll report back what I get with it tomorrow or the next day, depending on the timing on the resilver. Paul Archer And the hits just keep coming... The resilver finished last night, so rebooted the box as I had just upgraded to the latest Dev build. Not only did the upgrade fail (love that instant rollback!), but now the zpool won't come online: r...@shebop:~# zpool import pool: datapool id: 3410059226836265661 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: datapool UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data c7d0 ONLINE c8d0s0 ONLINE c9d0s0 ONLINE c11d0s0 ONLINE c10d0s0 ONLINE I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. Is it OK to scream and tear my hair out now? Paul PS I don't suppose there's an RFE out there for "give useful data when a pool is unavailable." Or even better, "allow a pool to be imported (but no filesystems mounted) so it *can be fixed*." ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hungs up forever...
On 29.07.09 15:18, Markus Kovero wrote: I recently noticed that importing larger pools that are occupied by large amounts of data can do zpool import for several hours while zpool iostat only showing some random reads now and then and iostat -xen showing quite busy disk usage, It's almost it goes thru every bit in pool before it goes thru. Somebody said that zpool import got faster on snv118, but I don't have real information on that yet. This had nothing to do with speed of 'zpool import'. There was corrupted pool-wide metadata block that prevented pool from importing successfully. Fortunately enough, we found better previous state a few txgs back with txg 2683802 (last synced was 2682802: #zdb -e -ubbcsL -t 2682802 data1 ... 4.25K 19.9M 8.62M 25.8M 6.08K2.31 0.00 SPA space map 1 128K128K128K128K1.00 0.00 ZIL intent log 1.77K 28.4M 8.48M 17.3M9.8K3.35 0.00 DMU dnode 2 2K 1K 2.50K 1.25K2.00 0.00 DMU objset - - - - - -- DSL directory 2 1K 1K 3.00K 1.50K1.00 0.00 DSL directory child map 1512 512 1.50K 1.50K1.00 0.00 DSL dataset snap map 2 1K 1K 3.00K 1.50K1.00 0.00 DSL props - - - - - -- DSL dataset - - - - - -- ZFS znode - - - - - -- ZFS V0 ACL 46.3M 5.74T 5.74T 5.74T127K1.00 100.00 ZFS plain file 1.87K 9.04M 2.75M 5.50M 2.94K3.29 0.00 ZFS directory 1512 512 1K 1K1.00 0.00 ZFS master node 1512 512 1K 1K1.00 0.00 ZFS delete queue - - - - - -- zvol object - - - - - -- zvol prop - - - - - -- other uint8[] - - - - - -- other uint64[] - - - - - -- other ZAP - - - - - -- persistent error log 1 128K 4.50K 13.5K 13.5K 28.44 0.00 SPA history - - - - - -- SPA history offsets - - - - - -- Pool properties - - - - - -- DSL permissions - - - - - -- ZFS ACL - - - - - -- ZFS SYSACL - - - - - -- FUID table - - - - - -- FUID table size - - - - - -- DSL dataset next clones - - - - - -- scrub work queue 46.3M 5.74T 5.74T 5.74T127K1.00 100.00 Total capacity operations bandwidth errors descriptionused avail read write read write read write cksum data1 5.74T 6.99T 523 0 65.1M 0 0 0 1 /dev/dsk/c14t0d05.74T 6.99T 523 0 65.1M 0 0 017 So we reactivated it and were able to import pool just fine. Subsequent scrub did find couple of errors in metadata. There were no user data error at all: # zpool status -v data1 pool: data1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 12h43m with 0 errors on Thu Aug 6 06:00:11 2009 config: NAMESTATE READ WRITE CKSUM data1 ONLINE 0 0 0 c14t0d0 ONLINE 0 0 2 12K repaired errors: No known data errors Upcoming zpool recovery support is going to help perform this kind of recovery in user-friendlier and more automated way. Btw, pool was originally created on FreeBSD, but we performed recovery on Solaris. Pavel said that he was going to stay on OpenSolaris as he learned a lot about it along the way ;-) Cheers, Victor Yours Markus Kovero -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Victor Latushkin Sent: 29. heinäkuuta 2009 14:05 To: Pavel Kovalenko Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] zpool import hungs up forever... On 29.07.09 14:42, Pavel Kovalenko wrote: fortunately, after several hours terminal went back --> # zdb -e data1 Uberblock magic = 00bab10c version = 6 txg = 2682808 guid_sum = 14250651627001887594
Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)
Trying to move this to a new thread, although I don't think it has anything to do with ZFS :-) On 09/28/09 08:54 AM, Chris Gerhard wrote: TMPFS was not in the first release of 4.0. It was introduced to boost the performance of diskless clients which no longer had the old network disk for their root file systems and hence /tmp was now over NFS. Whether there was a patch that brought it back into 4.0 I don't recall but I don't think so. 4.0.1 would have been the first release that actually had it. --chris On 09/28/09 03:00 AM, Joerg Schilling wrote: I am not sure whether my changes will be kept as wikipedia prefers to keep badly quoted wrong information before correct information supplied by people who have first hand information. They actually disallow "first hand information". Everything on Wikipedia is supposed to be confirmed by secondary or tertiary sources. That's why I asked if there was any supporting documentation - papers, manuals, proceedings, whatever, that describe the introduction of tmpfs before 1990. If you were to write a personal page (in Wikipedia if you like) that describes the history of tmpfs, then you could refer to it in the tmpfs page as a secondary source. Actually, I suppose if it was in the source code itself, that would be pretty irrefutable! http://en.wikipedia.org/wiki/Wikipedia:Reliable_sources Wikipedia also has a lofi page (http://en.wikipedia.org/wiki/Lofi) that redirects to "loop mount". It has no historical section at all... There is no fbk (file system) page. Cheers -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
Jorgen Lundman wrote: When I approach Sun-Japan directly I just get told that they don't speak English. When my Japanese colleagues approach Sun-Japan directly, it is suggested to us that we stay with our current Vendor. hey ... I work at Sun Japan in the Yoga office. I can connect you with English speakers here. Contact me off list if you are interested. Also, there are some Japan lists for OpenSolaris you may want to subscribe to: The Japan OpenSolaris User Group and The Tokyo OpenSolaris User Group. The Japan group is mostly Japanese, but we are trying to build an international group in English for the Tokyo OSUG. There are bi-lingual westerners and Japanese on both lists, and we have events in Yoga as well. http://mail.opensolaris.org/mailman/listinfo/ug-tsug (English ) http://mail.opensolaris.org/mailman/listinfo/ug-jposug (Japanese) Jim -- http://blogs.sun.com/jimgris/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OS install question
On 09/28/09 12:40 AM, Ron Watkins wrote: Thus, im at a loss as to how to get the root pool setup as a 20Gb slice 20GB is too small. You'll be fighting for space every time you use pkg. From my considerable experience installing to a 20GB mirrored rpool, I would go for 32GB if you can. Assuming this is X86, couldn't you simply use fdisk to create whatever partitions you want and then install to one of them? Than you should be able to create the data pool using another partition. You might need to use a weird partition type temporarily. On SPARC there doesn't seem to be a problem using slices for different zpools, in fact it insists on using a slice for the root pool. Cheers -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which directories must be part of rpool?
TMPFS was not in the first release of 4.0. It was introduced to boost the performance of diskless clients which no longer had the old network disk for their root file systems and hence /tmp was now over NFS. Whether there was a patch that brought it back into 4.0 I don't recall but I don't think so. 4.0.1 would have been the first release that actually had it. --chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Borked zpool, missing slog/zil
On 27.09.09 02:28, Ross wrote: Do you have a backup copy of your zpool.cache file? If you have that file, ZFS will happily mount a pool on boot without its slog device - it'll just flag the slog as faulted and you can do your normal replace. I used that for a long while on a test server with a ramdisk slog - and I never needed to swap it to a file based slog. However without a backup of that file to make zfs load the pool on boot I don't believe there is any way to import that pool. If there's no backup of that file, contents of it can be constructed by extracting contents of config object from the pool and using it to construct cachefile (basically creating nvlist with single name-value pair, where name is name of the pool and value is nvlist from the config object). -- -- Victor Latushkin phone: x11467 / +74959370467 TSC-Kernel EMEAmobile: +78957693012 Sun Services, Moscow blog: http://blogs.sun.com/vlatushkin Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Borked zpool, missing slog/zil
On 27.09.09 14:34, Erik Ableson wrote: Hmmm - I've got a fairly old copy of the zpool cache file (circa July), but nothing structural has changed in pool since that date. What other data is held in that file? There have been some filesystem changes, but nothing critical is in the newer filesystems. Cache file keeps pool configuration so zfs can quickly open it upon reboot. If there were no changes to the configuration of pool vdevs, then it should describe good config. victor Any particular procedure required for swapping out the zpool.cache file? Erik On Sunday, 27 September, 2009, at 12:28AM, "Ross" wrote: Do you have a backup copy of your zpool.cache file? If you have that file, ZFS will happily mount a pool on boot without its slog device - it'll just flag the slog as faulted and you can do your normal replace. I used that for a long while on a test server with a ramdisk slog - and I never needed to swap it to a file based slog. However without a backup of that file to make zfs load the pool on boot I don't believe there is any way to import that pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- -- Victor Latushkin phone: x11467 / +74959370467 TSC-Kernel EMEAmobile: +78957693012 Sun Services, Moscow blog: http://blogs.sun.com/vlatushkin Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Borked zpool, missing slog/zil
On 27.09.09 19:35, Erik Ableson wrote: Good link - thanks. I'm looking at the details for that one and learning a little zdb at the same time. I've got a situation perhaps a little different in that I _do_ have a current copy of the slog in a file with what appears to be current data. However, I don't see how to attach the slog file to an offline zpool - I have both a dd backup of the ramdisk slog from midnight as well as the current file based slog : Have you tried to make symbolic link from e.g. /dev/dsk/slog to /root/slog.tmp and check what 'zpool import' says? zdb -l /root/slog.tmp version=14 name='siovale' state=1 txg=4499446 pool_guid=13808783103733022257 hostid=4834000 hostname='shemhazai' top_guid=6374488381605474740 guid=6374488381605474740 is_log=1 vdev_tree type='file' id=1 guid=6374488381605474740 path='/root/slog.tmp' metaslab_array=230 metaslab_shift=21 ashift=9 asize=938999808 is_log=1 DTL=51 Is there any way that I can attach this slog to the zpool while it's offline? Erik On 27 sept. 2009, at 02:23, David Turnbull wrote: I believe this is relevant: http://github.com/pjjw/logfix Saved my array last year, looks maintained. On 27/09/2009, at 4:49 AM, Erik Ableson wrote: Hmmm - this is an annoying one. I'm currently running an OpenSolaris install (2008.11 upgraded to 2009.06) : SunOS shemhazai 5.11 snv_111b i86pc i386 i86pc Solaris with a zpool made up of one radiz vdev and a small ramdisk based zil. I usually swap out the zil for a file-based copy when I need to reboot (zpool replace /dev/ramdisk/slog /root/slog.tmp) but this time I had a brain fart and forgot to. The server came back up and I could sort of work on the zpool but it was complaining so I did my replace command and it happily resilvered. Then I restarted one more time in order to test bringing everything up cleanly and this time it can't find the file based zil. I try importing and it comes back with: zpool import pool: siovale id: 13808783103733022257 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: siovale UNAVAIL missing device raidz1ONLINE c8d0ONLINE c9d0ONLINE c10d0 ONLINE c11d0 ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined. Now the file still exists so I don't know why it can't seem to find it and I thought the missing zil issue was corrected in this version (or did I miss something?). I've looked around for solutions to bring it back online and ran across this method:
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
Tomas Ögren wrote: http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html which is in no way official, says it'll be in 10u8 which should be coming within a month. /Tomas That would be perfect. I wonder why I have so much trouble finding information about "future releases" of Solaris. Thanks Lund -- Jorgen Lundman | Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs vbox and shared folders
Not that I have seen. I use them, they work. --chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
Hi So ship date is 19th October for Solaris 10 10/09 ( update 8 ). Enda Enda O'Connor wrote: Hi Yes Solaris 10/09 ( update 8 ) will contain 6501037 want user/group quotas on zfs it should be out within a few weeks. So if they have zpools already installed they can apply 141444-09/141445-09 ( 10/09 kernel patch ) and post reboot run zpool upgrade to go to zpool version 15 ( the process is non reversible by the ay ), which contains 6501037. The patches mentioned will be released shortly after 10/09 itself ships ( within a few days of 10/09 shipping ), if applying patches make sure to apply latest rev of 119254/119255 first ( the patch utilities patch ), and read the README as well for any further instructions. Enda Tomas Ögren wrote: On 28 September, 2009 - Jorgen Lundman sent me these 1,7K bytes: Hello list, We are unfortunately still experiencing some issues regarding our support license with Sun, or rather our Sun Vendor. We need ZFS User quotas. (That's not the zfs file-system quota) which first appeared in svn_114. We would like to run something like svn_117 (don't really care which version per-se, that is just the one version we have done the most testing with). But our Vendor will only support Solaris 10. After weeks of wrangling, they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not have ZFS User quotas). When I approach Sun-Japan directly I just get told that they don't speak English. When my Japanese colleagues approach Sun-Japan directly, it is suggested to us that we stay with our current Vendor. * Will there be official Solaris 10, or OpenSolaris releases with ZFS User quotas? (Will 2010.02 contain ZFS User quotas?) http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html which is in no way official, says it'll be in 10u8 which should be coming within a month. /Tomas -- Enda O'Connor x19781 Software Product Engineering Patch System Test : Ireland : x19781/353-1-8199718 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
Hi Yes Solaris 10/09 ( update 8 ) will contain 6501037 want user/group quotas on zfs it should be out within a few weeks. So if they have zpools already installed they can apply 141444-09/141445-09 ( 10/09 kernel patch ) and post reboot run zpool upgrade to go to zpool version 15 ( the process is non reversible by the ay ), which contains 6501037. The patches mentioned will be released shortly after 10/09 itself ships ( within a few days of 10/09 shipping ), if applying patches make sure to apply latest rev of 119254/119255 first ( the patch utilities patch ), and read the README as well for any further instructions. Enda Tomas Ögren wrote: On 28 September, 2009 - Jorgen Lundman sent me these 1,7K bytes: Hello list, We are unfortunately still experiencing some issues regarding our support license with Sun, or rather our Sun Vendor. We need ZFS User quotas. (That's not the zfs file-system quota) which first appeared in svn_114. We would like to run something like svn_117 (don't really care which version per-se, that is just the one version we have done the most testing with). But our Vendor will only support Solaris 10. After weeks of wrangling, they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not have ZFS User quotas). When I approach Sun-Japan directly I just get told that they don't speak English. When my Japanese colleagues approach Sun-Japan directly, it is suggested to us that we stay with our current Vendor. * Will there be official Solaris 10, or OpenSolaris releases with ZFS User quotas? (Will 2010.02 contain ZFS User quotas?) http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html which is in no way official, says it'll be in 10u8 which should be coming within a month. /Tomas -- Enda O'Connor x19781 Software Product Engineering Patch System Test : Ireland : x19781/353-1-8199718 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)
Joerg Schilling wrote: Just to prove my information: I invented "fbk" (which Sun now calls "lofi") Sun does NOT call your fbk by the name lofi. Lofi is a completely different implementation of the same concept. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unusual latency issues
Markus Kovero wrote: Hi, this may not be correct mailinglist for this, but I’d like to share this with you, I noticed weird network behavior with osol snv_123. icmp for host lags randomly between 500ms-5000ms and ssh sessions seem to tangle, I guess this could affect iscsi/nfs as well. what was most intresting that I found workaround to be running snoop with promiscuous mode disabled on interfaces suffering lag, this did make interruptions go away. Is this somekind cpu/irq scheduling issue? Behaviour was noticed on two different platform and with two different nics (bge and e1000). Unless you have some specific reason for thinking this is a zfs issue, you probably want to ask on the crossbow-discuss mailing list. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
On 28 September, 2009 - Jorgen Lundman sent me these 1,7K bytes: > > Hello list, > > We are unfortunately still experiencing some issues regarding our support > license with Sun, or rather our Sun Vendor. > > We need ZFS User quotas. (That's not the zfs file-system quota) which > first appeared in svn_114. > > We would like to run something like svn_117 (don't really care which > version per-se, that is just the one version we have done the most > testing with). > > But our Vendor will only support Solaris 10. After weeks of wrangling, > they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which > does not have ZFS User quotas). > > When I approach Sun-Japan directly I just get told that they don't speak > English. When my Japanese colleagues approach Sun-Japan directly, it is > suggested to us that we stay with our current Vendor. > > * Will there be official Solaris 10, or OpenSolaris releases with ZFS > User quotas? (Will 2010.02 contain ZFS User quotas?) http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html which is in no way official, says it'll be in 10u8 which should be coming within a month. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Unusual latency issues
Hi, this may not be correct mailinglist for this, but I'd like to share this with you, I noticed weird network behavior with osol snv_123. icmp for host lags randomly between 500ms-5000ms and ssh sessions seem to tangle, I guess this could affect iscsi/nfs as well. what was most intresting that I found workaround to be running snoop with promiscuous mode disabled on interfaces suffering lag, this did make interruptions go away. Is this somekind cpu/irq scheduling issue? Behaviour was noticed on two different platform and with two different nics (bge and e1000). Yours Markus Kovero ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
On Mon, Sep 28, 2009 at 2:20 PM, Jorgen Lundman wrote: > We would like to run something like svn_117 (don't really care which version > per-se, that is just the one version we have done the most testing with). > > But our Vendor will only support Solaris 10. After weeks of wrangling, they > have reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not > have ZFS User quotas). I though http://www.sun.com/service/opensolaris/ was supposed to be made for people with your needs? -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Solaris License with ZFS USER quotas?
Hello list, We are unfortunately still experiencing some issues regarding our support license with Sun, or rather our Sun Vendor. We need ZFS User quotas. (That's not the zfs file-system quota) which first appeared in svn_114. We would like to run something like svn_117 (don't really care which version per-se, that is just the one version we have done the most testing with). But our Vendor will only support Solaris 10. After weeks of wrangling, they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not have ZFS User quotas). When I approach Sun-Japan directly I just get told that they don't speak English. When my Japanese colleagues approach Sun-Japan directly, it is suggested to us that we stay with our current Vendor. * Will there be official Solaris 10, or OpenSolaris releases with ZFS User quotas? (Will 2010.02 contain ZFS User quotas?) * Can we get support overseas perhaps, that will let us run a version of Solaris with ZFS User quotas? Support generally includes having the ability to replace hardware when it dies, and/or, send panic dumps if they happen for future patches. Internally, we are now discussing returning our 12x x4540, and calling NetApp. I would rather not (more work for me). I understand Sun is probably experiencing some internal turmoil at the moment, but it has been rather frustrating for us. Lund -- Jorgen Lundman | Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)
Frank Middleton wrote: > On 09/27/09 11:25 AM, Joerg Schilling wrote: > > Frank Middleton wrote: > > >> Could you fix the Wikipedia article? http://en.wikipedia.org/wiki/TMPFS > >> > >> "it first appeared in SunOS 4.1, released in March 1990" > > > > It appeared with SunOS-4.0. The official release was probably Februars 1987, > > but there have been betas before IIRC. > > Do you have any references one could quote so that the Wikipedia > article can be corrected? The section on Solaris is rather skimpy > and could do with some work... I am not sure whether my changes will be kept as wikipedia prefers to keep badly quoted wrong information before correct information supplied by people who have first hand information. Just to prove my information: I invented "fbk" (which Sun now calls "lofi") in summer 1988 after I received the sources for SunOS-4.0. "fbk" was my playground for the new vnode interface before I wrote "wofs" the probably first copy on write filesystem. I definitely know that tmpfs was in 4.0. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss