Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)
Joe Little wrote: > On Nov 16, 2007 9:13 PM, Neil Perrin <[EMAIL PROTECTED]> wrote: >> Joe, >> >> I don't think adding a slog helped in this case. In fact I >> believe it made performance worse. Previously the ZIL would be >> spread out over all devices but now all synchronous traffic >> is directed at one device (and everything is synchronous in NFS). >> Mind you 15MB/s seems a bit on the slow side - especially is >> cache flushing is disabled. >> >> It would be interesting to see what all the threads are waiting >> on. I think the problem maybe that everything is backed >> up waiting to start a transaction because the txg train is >> slow due to NFS requiring the ZIL to push everything synchronously. >> > > I agree completely. The log (even though slow) was an attempt to > isolate writes away from the pool. I guess the question is how to > provide for async access for NFS. We may have 16, 32 or whatever > threads, but if a single writer keeps the ZIL pegged and prohibiting > reads, its all for nought. Is there anyway to tune/configure the > ZFS/NFS combination to balance reads/writes to not starve one for the > other. Its either feast or famine or so tests have shown. No there's no way currently to give reads preference over writes. All transactions get equal priority to enter a transaction group. Three txgs can be outstanding as we use a 3 phase commit model: open; quiescing; and syncing. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)
On Nov 16, 2007 9:17 PM, Joe Little <[EMAIL PROTECTED]> wrote: > On Nov 16, 2007 9:13 PM, Neil Perrin <[EMAIL PROTECTED]> wrote: > > Joe, > > > > I don't think adding a slog helped in this case. In fact I > > believe it made performance worse. Previously the ZIL would be > > spread out over all devices but now all synchronous traffic > > is directed at one device (and everything is synchronous in NFS). > > Mind you 15MB/s seems a bit on the slow side - especially is > > cache flushing is disabled. > > > > It would be interesting to see what all the threads are waiting > > on. I think the problem maybe that everything is backed > > up waiting to start a transaction because the txg train is > > slow due to NFS requiring the ZIL to push everything synchronously. > > Roch wrote this before (thus my interest in the log or NVRAM like solution): "There are 2 independant things at play here. a) NFS sync semantics conspire againts single thread performance with any backend filesystem. However NVRAM normally offers some releaf of the issue. b) ZFS sync semantics along with the Storage Software + imprecise protocol in between, conspire againts ZFS performance of some workloads on NVRAM backed storage. NFS being one of the affected workloads. The conjunction of the 2 causes worst than expected NFS perfomance over ZFS backend running __on NVRAM back storage__. If you are not considering NVRAM storage, then I know of no ZFS/NFS specific problems. Issue b) is being delt with, by both Solaris and Storage Vendors (we need a refined protocol); Issue a) is not related to ZFS and rather fundamental NFS issue. Maybe future NFS protocol will help. Net net; if one finds a way to 'disable cache flushing' on the storage side, then one reaches the state we'll be, out of the box, when b) is implemented by Solaris _and_ Storage vendor. At that point, ZFS becomes a fine NFS server not only on JBOD as it is today , both also on NVRAM backed storage. It's complex enough, I thougt it was worth repeating." > > I agree completely. The log (even though slow) was an attempt to > isolate writes away from the pool. I guess the question is how to > provide for async access for NFS. We may have 16, 32 or whatever > threads, but if a single writer keeps the ZIL pegged and prohibiting > reads, its all for nought. Is there anyway to tune/configure the > ZFS/NFS combination to balance reads/writes to not starve one for the > other. Its either feast or famine or so tests have shown. > > > > Neil. > > > > > > Joe Little wrote: > > > I have historically noticed that in ZFS, when ever there is a heavy > > > writer to a pool via NFS, the reads can held back (basically paused). > > > An example is a RAID10 pool of 6 disks, whereby a directory of files > > > including some large 100+MB in size being written can cause other > > > clients over NFS to pause for seconds (5-30 or so). This on B70 bits. > > > I've gotten used to this behavior over NFS, but didn't see it perform > > > as such when on the server itself doing similar actions. > > > > > > To improve upon the situation, I thought perhaps I could dedicate a > > > log device outside the pool, in the hopes that while heavy writes went > > > to the log device, reads would merrily be allowed to coexist from the > > > pool itself. My test case isn't ideal per se, but I added a local 9GB > > > SCSI (80) drive for a log, and added to LUNs for the pool itself. > > > You'll see from the below that while the log device is pegged at > > > 15MB/sec (sd5), my directory list request on devices sd15 and sd16 > > > never are answered. I tried this with both no-cache-flush enabled and > > > off, with negligible difference. Is there anyway to force a better > > > balance of reads/writes during heavy writes? > > > > > > extended device statistics > > > devicer/sw/s kr/s kw/s wait actv svc_t %w %b > > > fd0 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd0 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd1 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd2 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd3 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd4 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd5 0.0 118.00.0 15099.9 0.0 35.0 296.7 0 100 > > > sd6 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd7 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd8 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd9 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd10 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd11 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd12 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd13 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd14 0.00.00.00.0 0.0 0.00.0 0 0 > > > sd15 0.00.00
Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)
On Nov 16, 2007 9:13 PM, Neil Perrin <[EMAIL PROTECTED]> wrote: > Joe, > > I don't think adding a slog helped in this case. In fact I > believe it made performance worse. Previously the ZIL would be > spread out over all devices but now all synchronous traffic > is directed at one device (and everything is synchronous in NFS). > Mind you 15MB/s seems a bit on the slow side - especially is > cache flushing is disabled. > > It would be interesting to see what all the threads are waiting > on. I think the problem maybe that everything is backed > up waiting to start a transaction because the txg train is > slow due to NFS requiring the ZIL to push everything synchronously. > I agree completely. The log (even though slow) was an attempt to isolate writes away from the pool. I guess the question is how to provide for async access for NFS. We may have 16, 32 or whatever threads, but if a single writer keeps the ZIL pegged and prohibiting reads, its all for nought. Is there anyway to tune/configure the ZFS/NFS combination to balance reads/writes to not starve one for the other. Its either feast or famine or so tests have shown. > Neil. > > > Joe Little wrote: > > I have historically noticed that in ZFS, when ever there is a heavy > > writer to a pool via NFS, the reads can held back (basically paused). > > An example is a RAID10 pool of 6 disks, whereby a directory of files > > including some large 100+MB in size being written can cause other > > clients over NFS to pause for seconds (5-30 or so). This on B70 bits. > > I've gotten used to this behavior over NFS, but didn't see it perform > > as such when on the server itself doing similar actions. > > > > To improve upon the situation, I thought perhaps I could dedicate a > > log device outside the pool, in the hopes that while heavy writes went > > to the log device, reads would merrily be allowed to coexist from the > > pool itself. My test case isn't ideal per se, but I added a local 9GB > > SCSI (80) drive for a log, and added to LUNs for the pool itself. > > You'll see from the below that while the log device is pegged at > > 15MB/sec (sd5), my directory list request on devices sd15 and sd16 > > never are answered. I tried this with both no-cache-flush enabled and > > off, with negligible difference. Is there anyway to force a better > > balance of reads/writes during heavy writes? > > > > extended device statistics > > devicer/sw/s kr/s kw/s wait actv svc_t %w %b > > fd0 0.00.00.00.0 0.0 0.00.0 0 0 > > sd0 0.00.00.00.0 0.0 0.00.0 0 0 > > sd1 0.00.00.00.0 0.0 0.00.0 0 0 > > sd2 0.00.00.00.0 0.0 0.00.0 0 0 > > sd3 0.00.00.00.0 0.0 0.00.0 0 0 > > sd4 0.00.00.00.0 0.0 0.00.0 0 0 > > sd5 0.0 118.00.0 15099.9 0.0 35.0 296.7 0 100 > > sd6 0.00.00.00.0 0.0 0.00.0 0 0 > > sd7 0.00.00.00.0 0.0 0.00.0 0 0 > > sd8 0.00.00.00.0 0.0 0.00.0 0 0 > > sd9 0.00.00.00.0 0.0 0.00.0 0 0 > > sd10 0.00.00.00.0 0.0 0.00.0 0 0 > > sd11 0.00.00.00.0 0.0 0.00.0 0 0 > > sd12 0.00.00.00.0 0.0 0.00.0 0 0 > > sd13 0.00.00.00.0 0.0 0.00.0 0 0 > > sd14 0.00.00.00.0 0.0 0.00.0 0 0 > > sd15 0.00.00.00.0 0.0 0.00.0 0 0 > > sd16 0.00.00.00.0 0.0 0.00.0 0 0 > ... > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)
Joe, I don't think adding a slog helped in this case. In fact I believe it made performance worse. Previously the ZIL would be spread out over all devices but now all synchronous traffic is directed at one device (and everything is synchronous in NFS). Mind you 15MB/s seems a bit on the slow side - especially is cache flushing is disabled. It would be interesting to see what all the threads are waiting on. I think the problem maybe that everything is backed up waiting to start a transaction because the txg train is slow due to NFS requiring the ZIL to push everything synchronously. Neil. Joe Little wrote: > I have historically noticed that in ZFS, when ever there is a heavy > writer to a pool via NFS, the reads can held back (basically paused). > An example is a RAID10 pool of 6 disks, whereby a directory of files > including some large 100+MB in size being written can cause other > clients over NFS to pause for seconds (5-30 or so). This on B70 bits. > I've gotten used to this behavior over NFS, but didn't see it perform > as such when on the server itself doing similar actions. > > To improve upon the situation, I thought perhaps I could dedicate a > log device outside the pool, in the hopes that while heavy writes went > to the log device, reads would merrily be allowed to coexist from the > pool itself. My test case isn't ideal per se, but I added a local 9GB > SCSI (80) drive for a log, and added to LUNs for the pool itself. > You'll see from the below that while the log device is pegged at > 15MB/sec (sd5), my directory list request on devices sd15 and sd16 > never are answered. I tried this with both no-cache-flush enabled and > off, with negligible difference. Is there anyway to force a better > balance of reads/writes during heavy writes? > > extended device statistics > devicer/sw/s kr/s kw/s wait actv svc_t %w %b > fd0 0.00.00.00.0 0.0 0.00.0 0 0 > sd0 0.00.00.00.0 0.0 0.00.0 0 0 > sd1 0.00.00.00.0 0.0 0.00.0 0 0 > sd2 0.00.00.00.0 0.0 0.00.0 0 0 > sd3 0.00.00.00.0 0.0 0.00.0 0 0 > sd4 0.00.00.00.0 0.0 0.00.0 0 0 > sd5 0.0 118.00.0 15099.9 0.0 35.0 296.7 0 100 > sd6 0.00.00.00.0 0.0 0.00.0 0 0 > sd7 0.00.00.00.0 0.0 0.00.0 0 0 > sd8 0.00.00.00.0 0.0 0.00.0 0 0 > sd9 0.00.00.00.0 0.0 0.00.0 0 0 > sd10 0.00.00.00.0 0.0 0.00.0 0 0 > sd11 0.00.00.00.0 0.0 0.00.0 0 0 > sd12 0.00.00.00.0 0.0 0.00.0 0 0 > sd13 0.00.00.00.0 0.0 0.00.0 0 0 > sd14 0.00.00.00.0 0.0 0.00.0 0 0 > sd15 0.00.00.00.0 0.0 0.00.0 0 0 > sd16 0.00.00.00.0 0.0 0.00.0 0 0 ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS + DB + "fragments"
I've been observing two threads on zfs-discuss with the following Subject lines: Yager on ZFS ZFS + DB + "fragments" and have reached the rather obvious conclusion that the author "can you guess?" is a professional spinmeister, who gave up a promising career in political speech writing, to hassle the technical list membership on zfs-discuss. To illustrate my viewpoint, I offer the following excerpts (reformatted from an obvious WinDoze Luser Mail client): Excerpt 1: Is this premium technical BullShit (BS) or what? - BS 301 'grad level technical BS' --- Still, it does drive up snapshot overhead, and if you start trying to use snapshots to simulate 'continuous data protection' rather than more sparingly the problem becomes more significant (because each snapshot will catch any background defragmentation activity at a different point, such that common parent blocks may appear in more than one snapshot even if no child data has actually been updated). Once you introduce CDP into the process (and it's tempting to, since the file system is in a better position to handle it efficiently than some add-on product), rethinking how one approaches snapshots (and COW in general) starts to make more sense. - end of BS 301 'grad level technical BS' --- Comment: Amazing: so many words, so little meaningful technical content! Excerpt 2: Even better than Excerpt 1 - truely exceptional BullShit: - BS 401 'PhD level technical BS' -- No, but I described how to use a transaction log to do so and later on in the post how ZFS could implement a different solution more consistent with its current behavior. In the case of the transaction log, the key is to use the log not only to protect the RAID update but to protect the associated higher-level file operation as well, such that a single log force satisfies both (otherwise, logging the RAID update separately would indeed slow things down - unless you had NVRAM to use for it, in which case you've effectively just reimplemented a low-end RAID controller - which is probably why no one has implemented that kind of solution in a stand-alone software RAID product). ... - end of BS 401 'PhD level technical BS' -- Go ahead and lookup the full context of these exceptional BS excerpts and see if the full context brings any further enlightment. I think you'll quickly realize that, after reading the full context, this is nothing more than a complete waste of time and that there is nothing of technical value to learned from this text. In fact, there is very, very little to be learned from any posts on this list where the Subject line is either: Yager on ZFS ZFS + DB + "fragments" and the author is: "can you guess? <[EMAIL PROTECTED]>" I'm not, for a moment, suggesting that one can't learn *something* from the posts of the author "can you guess? <[EMAIL PROTECTED]>"... indeed there are significant spinmeistering skills to be learned from these posts; including how to combine portions of cited published technical studies (Google Study, CERN study) with a line of total semi-technical bullshit worthy of any political spinmeister working withing the DC "Beltway Bandit" area. In fact, if I'm trying to conn^H^H^H^H talk someone out of several million dollars to fund a totally BS research project, I'll pay any reasonable fees that "can you guess?" would demand. Because I'm convinced, that with his premium spinmeistering/BS skills - nothing is impossible: pigs can fly, NetApp == ZFS, the world is flat and ZFS is a totally deficient technical design because they did'nt solicit his totally invaluable technical input. And.. one note of caution for Jeff Bonwick and Team ZFS - lookout ... for this guy - because his new ZFS competitor filesystem, called, appropriately, GOMFS (Guess-O-Matic-File-System) is about to be released and it'll basically, if I understand "can you guess?"'s email fully, solve all the current ZFS design deficiencies, and totally dominate all *nix based filesystems for the next 400 years. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ Graduate from "sugar-coating school"? Sorry - I never attended! :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] slog tests on read throughput exhaustion (NFS)
I have historically noticed that in ZFS, when ever there is a heavy writer to a pool via NFS, the reads can held back (basically paused). An example is a RAID10 pool of 6 disks, whereby a directory of files including some large 100+MB in size being written can cause other clients over NFS to pause for seconds (5-30 or so). This on B70 bits. I've gotten used to this behavior over NFS, but didn't see it perform as such when on the server itself doing similar actions. To improve upon the situation, I thought perhaps I could dedicate a log device outside the pool, in the hopes that while heavy writes went to the log device, reads would merrily be allowed to coexist from the pool itself. My test case isn't ideal per se, but I added a local 9GB SCSI (80) drive for a log, and added to LUNs for the pool itself. You'll see from the below that while the log device is pegged at 15MB/sec (sd5), my directory list request on devices sd15 and sd16 never are answered. I tried this with both no-cache-flush enabled and off, with negligible difference. Is there anyway to force a better balance of reads/writes during heavy writes? extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b fd0 0.00.00.00.0 0.0 0.00.0 0 0 sd0 0.00.00.00.0 0.0 0.00.0 0 0 sd1 0.00.00.00.0 0.0 0.00.0 0 0 sd2 0.00.00.00.0 0.0 0.00.0 0 0 sd3 0.00.00.00.0 0.0 0.00.0 0 0 sd4 0.00.00.00.0 0.0 0.00.0 0 0 sd5 0.0 118.00.0 15099.9 0.0 35.0 296.7 0 100 sd6 0.00.00.00.0 0.0 0.00.0 0 0 sd7 0.00.00.00.0 0.0 0.00.0 0 0 sd8 0.00.00.00.0 0.0 0.00.0 0 0 sd9 0.00.00.00.0 0.0 0.00.0 0 0 sd10 0.00.00.00.0 0.0 0.00.0 0 0 sd11 0.00.00.00.0 0.0 0.00.0 0 0 sd12 0.00.00.00.0 0.0 0.00.0 0 0 sd13 0.00.00.00.0 0.0 0.00.0 0 0 sd14 0.00.00.00.0 0.0 0.00.0 0 0 sd15 0.00.00.00.0 0.0 0.00.0 0 0 sd16 0.00.00.00.0 0.0 0.00.0 0 0 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b fd0 0.00.00.00.0 0.0 0.00.0 0 0 sd0 0.00.00.00.0 0.0 0.00.0 0 0 sd1 0.00.00.00.0 0.0 0.00.0 0 0 sd2 0.00.00.00.0 0.0 0.00.0 0 0 sd3 0.00.00.00.0 0.0 0.00.0 0 0 sd4 0.00.00.00.0 0.0 0.00.0 0 0 sd5 0.0 117.00.0 14970.1 0.0 35.0 299.2 0 100 sd6 0.00.00.00.0 0.0 0.00.0 0 0 sd7 0.00.00.00.0 0.0 0.00.0 0 0 sd8 0.00.00.00.0 0.0 0.00.0 0 0 sd9 0.00.00.00.0 0.0 0.00.0 0 0 sd10 0.00.00.00.0 0.0 0.00.0 0 0 sd11 0.00.00.00.0 0.0 0.00.0 0 0 sd12 0.00.00.00.0 0.0 0.00.0 0 0 sd13 0.00.00.00.0 0.0 0.00.0 0 0 sd14 0.00.00.00.0 0.0 0.00.0 0 0 sd15 0.00.00.00.0 0.0 0.00.0 0 0 sd16 0.00.00.00.0 0.0 0.00.0 0 0 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b fd0 0.00.00.00.0 0.0 0.00.0 0 0 sd0 0.00.00.00.0 0.0 0.00.0 0 0 sd1 0.00.00.00.0 0.0 0.00.0 0 0 sd2 0.00.00.00.0 0.0 0.00.0 0 0 sd3 0.00.00.00.0 0.0 0.00.0 0 0 sd4 0.00.00.00.0 0.0 0.00.0 0 0 sd5 0.0 118.10.0 15111.9 0.0 35.0 296.4 0 100 sd6 0.00.00.00.0 0.0 0.00.0 0 0 sd7 0.00.00.00.0 0.0 0.00.0 0 0 sd8 0.00.00.00.0 0.0 0.00.0 0 0 sd9 0.00.00.00.0 0.0 0.00.0 0 0 sd10 0.00.00.00.0 0.0 0.00.0 0 0 sd11 0.00.00.00.0 0.0 0.00.0 0 0 sd12 0.00.00.00.0 0.0 0.00.0 0 0 sd13 0.00.00.00.0 0.0 0.00.0 0 0 sd14 0.00.00.00.0 0.0 0.00.0 0 0 sd15 0.00.00.00.0 0.0 0.00.0 0 0 sd16 0.00.00.00.0 0.0 0.00.0 0 0 extended device statistics devicer/sw/s kr/s kw/s wait actv svc_t %w %b fd0 0.00.00.00.0 0.0 0.00.0 0 0 sd0 0.00.00.00.0 0.0 0.00.0 0 0 sd1 0.00.00.00.0 0.0 0.00.0 0 0 sd2 0.00.00.0
Re: [zfs-discuss] How to destory a faulted pool
Manoj, # zpool destroy -f mstor0 Regards, Marco Lopes. Manoj Nayak wrote: >How I can destroy the following pool ? > >pool: mstor0 >id: 5853485601755236913 > state: FAULTED >status: One or more devices contains corrupted data. >action: The pool cannot be imported due to damaged devices or data. > see: http://www.sun.com/msg/ZFS-8000-5E >config: > >mstor0 UNAVAIL insufficient replicas > raidz1UNAVAIL insufficient replicas >c5t0d0 FAULTED corrupted data >c4t0d0 FAULTED corrupted data >c1t0d0 ONLINE >c0t0d0 ONLINE > > >pool: zpool1 >id: 14693037944182338678 > state: FAULTED >status: One or more devices are missing from the system. >action: The pool cannot be imported. Attach the missing >devices and try again. > see: http://www.sun.com/msg/ZFS-8000-3C >config: > >zpool1 UNAVAIL insufficient replicas > raidz1UNAVAIL insufficient replicas >c0t1d0 UNAVAIL cannot open >c1t1d0 UNAVAIL cannot open >c4t1d0 UNAVAIL cannot open >c6t1d0 UNAVAIL cannot open >c7t1d0 UNAVAIL cannot open > raidz1UNAVAIL insufficient replicas >c0t2d0 UNAVAIL cannot open >c1t2d0 UNAVAIL cannot open >c4t2d0 UNAVAIL cannot open >c6t2d0 UNAVAIL cannot open >c7t2d0 UNAVAIL cannot open >___ >zfs-discuss mailing list >zfs-discuss@opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- Marco S. Lopes Senior Technical Specialist US Systems Practice Professional Services Delivery Sun Microsystems 925 984 6611 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool io to 6140 is really slow
I have the following layout A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using A1 anfd B1 controller port 4Gbps speed. Each controller has 2G NVRAM On 6140s I setup raid0 lun per SAS disks with 16K segment size. On 490 I created a zpool with 8 4+1 raidz1s I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in /etc/system Is there a way I can improve the performance. I like to get 1GB/sec IO. Currently each lun is setup as primary A1 and secondary B1 or vice versa I also have write cache eanble according to CAM -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot mount 'mypool': Input/output error
On Nov 15, 2007 9:42 AM, Nabeel Saad <[EMAIL PROTECTED]> wrote: > I am sure I will not use ZFS to its fullest potential at all.. right now I'm > trying to recover the dead disk, so if it works to mount a single disk/boot > disk, that's all I need, I don't need it to be very functional. As I > suggested, I will only be using this to change permissions and then return > the disk into the appropriate Server once I am able to log back into that > server. (Sorry, forgot to CC the list.) Ok, so assuming that all you want to do is mount your old Solaris disk and change some permissions, then there is probably an easier solution which is to put the hard drive back in the original machine and boot from a (Open)Solaris CD or DVD. This eliminates the whole Linux/FUSE issues you're getting into. Your easiest option might be to try the new OpenSolaris Developer Preview distribution since it's actually a Live CD which would give you a full GUI and networking to play with. http://www.opensolaris.org/os/downloads/ Once the Live CD boots, you should be able to mount your drive to an alternate path like /a and then change permissions. If you boot from a regular Solaris CD or DVD it will start the install process, but then you should be able to simply cancel the install and get to a command line and work from there. Good luck! Regards, -Eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS snapshot send/receive via intermediate device
Hey folks, I have no knowledge at all about how streams work in Solaris, so this might have a simple answer, or be completely impossible. Unfortunately I'm a windows admin so haven't a clue which :) We're looking at rolling out a couple of ZFS servers on our network, and instead of tapes we're considering using off-site NAS boxes for backups. We think there's likely to be too much data each day to send the incremental snapshots to the remote systems over the wire, so we're wondering if we can use removable disks instead to transport just the incremental changes. The idea is that we can do the initial "zfs send" on-site with the NAS plugged on the network, and from then on we just need a 500GB removable disk to take the changes off site each night. Let me be clear on that: We're not thinking of storing the whole zfs pool on the removable disk, there's just too much data. Instead, we want to use "zfs send -i" to store just the incremental changes on a removable disk, so we can then take that disk home and plug it into another device and use zfs receive to upload the changes. Does anybody know if that's possible? If it works it's a nice and simple off-site backup, with the added benefit that we have a very rapid disaster recovery response. No need to waste time restoring from tape: the off-site backup can be brought onto the network and data is accessible immediately. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] read/write NFS block size and ZFS
msl wrote: > Hello all... > I'm migrating a nfs server from linux to solaris, and all clients(linux) are > using read/write block sizes of 8192. That was the better performance that i > got, and it's working pretty well (nfsv3). I want to use all the zfs' > advantages, and i know i can have a performance loss, so i want to know if > there is a "recomendation" for bs on nfs/zfs, or what do you think about it. > That is the network block transfer size. The default is normally 32 kBytes. I don't see any reason to change ZFS's block size to match. You should follow the best practices as described at http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide If you notice a performance issue with metadata updates, be sure to check out http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide -- richard > I must test, or there is no need to make such configurations with zfs? > Thanks very much for your time! > Leal. > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
> can you guess? metrocast.net> writes: > > > > You really ought to read a post before responding > to it: the CERN study > > did encounter bad RAM (and my post mentioned that) > - but ZFS usually can't > > do a damn thing about bad RAM, because errors tend > to arise either > > before ZFS ever gets the data or after it has > already returned and checked > > it (and in both cases, ZFS will think that > everything's just fine). > > According to the memtest86 author, corruption most > often occurs at the moment > memory cells are written to, by causing bitflips in > adjacent cells. So when a > disk DMA data to RAM, and corruption occur when the > DMA operation writes to > the memory cells, and then ZFS verifies the checksum, > then it will detect the > corruption. > > Therefore ZFS is perfectly capable (and even likely) > to detect memory > corruption during simple read operations from a ZFS > pool. > > Of course there are other cases where neither ZFS nor > any other checksumming > filesystem is capable of detecting anything (e.g. the > sequence of events: data > is corrupted, checksummed, written to disk). Indeed - the latter was the first of the two scenarios that I sketched out. But at least on the read end of things ZFS should have a good chance of catching errors due to marginal RAM. That must mean that most of the worrisome alpha-particle problems of yore have finally been put to rest (since they'd be similarly likely to trash data on the read side after ZFS had verified it). I think I remember reading that somewhere at some point, but I'd never gotten around to reading that far in the admirably-detailed documentation that accompanies memtest: thanks for enlightening me. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool question
On Thu, 15 Nov 2007, Brian Lionberger wrote: > The question is, should I create one zpool or two to hold /export/home > and /export/backup? > Currently I have one pool for /export/home and one pool for /export/backup. > > Should it be on pool for both??? Would this be better and why? One thing to consider is that pools are the granularity of 'export' operations, so if you ever want to, for example, move the /export/backup disks to a new computer, but keep /export/home on the current computer, you couldn't do that easily if you create a pair of striped 2-way mirrors. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
A little extra info: ZFS brings in a ZFS spare device the next time the pool is accessed, not a raidbox hot spare. Resilvering starts automatically and increases disk access times by about 30%. The first hour of estimated time left ( for 5-6 TB pools ) is wildly inaccurate, but it starts to settle down after that. Tom Mooney Dan Pritts wrote: On Fri, Nov 16, 2007 at 11:31:00AM +0100, Paul Boven wrote: Thanks for your reply. The SCSI-card in the X4200 is a Sun Single Channel U320 card that came with the system, but the PCB artwork does sport a nice 'LSI LOGIC' imprint. That is probably the same card i'm using; it's actually a "Sun" card but as you say is OEM by LSI. So, just to make sure we're talking about the same thing here - your drives are SATA, yes you're exporting each drive through the Western Scientific raidbox as a seperate volume, yes and zfs actually brings in a hot spare when you pull a drive? yes OS is Sol10U4, system is an X4200, original hardware rev. Over here, I've still not been able to accomplish that - even after installing Nevada b76 on the machine, removing a disk will not cause a hot-spare to become active, nor does resilvering start. Our Transtec raidbox seems to be based on a chipset by Promise, by the way. I have heard some bad things about the Promise RAID boxes but I haven't had any direct experience. I do own one Promise box that accepts 4 PATA drives and exports them to a host as scsi disks. Shockingly, it uses a master/slave IDE configuration rather than 4 separate IDE controllers. It wasn't super expensive but it wasn't dirt cheap, either, and it seems it would have cost another $5 to manufacture the "right way." I've had fine luck with Promise $25 ATA PCI cards :) The infortrend units, on the other hand, I have had generally quite good luck with. When I worked at UUNet in the late '90s we had hundreds of their SCSI RAIDs deployed. I do have an Infortrend FC-attached raid with SATA disks, which basically works fine. It has an external JBOD also SATA disks connecting to the main raid with FC. Unfortunately, The RAID unit boots faster than the JBOD. So, if you turn them on at the same time, it thinks the JBOD is gone and doesn't notice it's there until you reboot the controller. That caused a little pucker for my colleagues when it happened while i was on vacation. The support guy at the reseller we were working with (NOT Western Scientific) told them the raid was hosed and they should rebuild from scratch, hope you had a backup. danno -- Dan Pritts, System Administrator Internet2 office: +1-734-352-4953 | mobile: +1-734-834-7224 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Need a 2-port PCI-X SATA-II controller for x86
I'll be setting up a small server and need two SATA-II ports for an x86 box. The cheaper the better. Thanks!! -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
> Brain damage seems a bit of an alarmist label. While you're certainly right > that for a given block we do need to access all disks in the given stripe, > it seems like a rather quaint argument: aren't most environments that > matter trying to avoid waiting for the disk at all? Intelligent prefetch > and large caches -- I'd argue -- are far more important for performance > these days. The concurrent small-i/o problem is fundamental though. If you have an application where you care only about random concurrent reads for example, you would not want to use raidz/raidz2 currently. No amount of smartness in the application gets around this. It *is* a relevant shortcoming of raidz/raidz2 compared to raid5/raid6, even if in many cases it is not significant. If disk space is not an issue, striping across mirrors will be okay for random seeks. But if you also care about diskspace, it's a show stopper unless you can throw money at the problem. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org signature.asc Description: This is a digitally signed message part. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slightly Off-Topic: Decent replication on Solaris using rNFS.
Razvan Corneliu VILT wrote: > Hi, > > In my infinite search for a reliable work-around for the lack of bandwidth in > the United States*, I've reached the conclusion that I need a file-system > replication solution for the data stored on my ZFS partition. > I've noticed that I'm not the only one asking for this, but I still have no > clear answer on my options from Google. > After looking into some reports on rNFS on citi.umich.edu, I found out that > I'm not the only one with the problem (go figure). I am not really up to date > with the NFSv4 spec and drafts, but I am curious if rNFS is part of the > current NFSv4 spec or of the upcoming 4.1, and if it's considered or > available for OpenSolaris, or if there are any alternatives (such as a > replicated ZFS solution that supports simultaneous r/w access on at least 2 > geographically separate servers). > Some might argue that QFS + Sun Cluster is the way to go, but I need a few > things that ZFS currently offers (NFSv4 ACLs and snapshots that Samba can be > made aware of), and will want to move to CIFS server as soon as it's > production quality. > Generally, the write traffic on the Samba shares that need replication is > light (around 1GByte/day), but it does need to happen whenever there's a > change. > I've tried creating a smart cron script that runs unison every minute (lame, > I know), but it does not replicate the NFSv4 ACLs, and it's a rather bad > approach to the problem to start with. A daemonized unison with support for > all the ZFS features that gets the file-change notifications from the kernel > along with a distributed lock manager might do the job, but it's something > that I'm not qualified to write. > I am sure that what I'm looking for is not unheard of. I am hopeful that the > ZFS+Lustre integration in the future might allow me something like this, but > it doesn't sound like it's close. > > Any sugestions?!? > AVS. See http://www.opensolaris.org/os/project/avs/ Jim Dunham has a good blog and demo on using it with ZFS. -- richard > Cheers, > Razvan > > * Our Bucharest branch has access to 10 Mbits/sec internationally and 100 > Mbits/sec nationally (fiber of course) with BGP and our own IP classes, for > around EUR 250. This is in contrast with our San Jose, CA branch, which has a > connectivity budget of $700 and can get only a bonded-T1 at best in that > money (a T1 is $500 ($399 + taxes)). I wish that the most economically > advanced country in the world could have a decent internet infrastructure. > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool question
I have a zpool issue that I need to discuss. My application is going to run on a 3120 with 4 disks. Two(mirrored) disks will represent /export/home and the other two(mirrored) will be /export/backup. The question is, should I create one zpool or two to hold /export/home and /export/backup? Currently I have one pool for /export/home and one pool for /export/backup. Should it be on pool for both??? Would this be better and why? Thanks for any help and advice. Brian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS for consumers WAS:Yager on ZFS
Splitting this thread and changing the subject to reflect that... On 11/14/07, can you guess? <[EMAIL PROTECTED]> wrote: > Another prominent debate in this thread revolves around the question of > just how significant ZFS's unusual strengths are for *consumer* use. > WAFL clearly plays no part in that debate, because it's available only > on closed, server systems. I am both a large systems administrator and a 'home user' (I prefer that term to 'consumer'). I am also very slow to adopt new technologies in either environment. We have started using ZFS at work due to performance improvements (for our workload) over UFS (or any other FS we tested). At home the biggest reason I went with ZFS for my data is ease of management. I split my data up based on what it is ... media (photos, movies, etc.), vendor stuff (software, datasheets, etc.), home directories, and other misc. data. This gives me a good way to control backups based on the data type. I know, this is all more sophisticated than the typical home user. The biggest win for me is that I don't have to partition my storage in advance. I build one zpool and multiple datasets. I don't set quotas or reservations (although I could). So I suppose my argument for ZFS in home use is not data integrity, but much simpler management, both short and long term. -- Paul Kraus Albacon 2008 Facilities ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS children stepping on parent
I was doing some disaster recovery testing with ZFS, where I did a mass backup of a family of ZFS filesystems using snapshots, destroyed them, and then did a mass restore from the backups. The ZFS filesystems I was testing with had only one parent in the ZFS namespace; and the backup and restore went well until it came time to mount the restored ZFS filesystems. Because I had destroyed everything but the zpool, there was no mountpoint set for the restored parent ZFS filesystem or for its children. They were all restored, but unmounted. I set the mountpoint property for the parent ZFS filesystem, and all its children mounted instantly as I expected; but the parent failed to mount, because ZFS had created the mountpoints for the children before mounting the parent. I had to unmount the children manually, delete their mountpoints, mount the parent manually, and then mount the children manually. Is it supposed to work that way? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + DB + "fragments"
... > I personally believe that since most people will have > hardware LUN's > (with underlying RAID) and cache, it will be > difficult to notice > anything. Given that those hardware LUN's might be > busy with their own > wizardry ;) You will also have to minimize the effect > of the database > cache ... By definition, once you've got the entire database in cache, none of this matters (though filling up the cache itself takes some added time if the table is fragmented). Most real-world databases don't manage to fit all or even mostly in cache, because people aren't willing to dedicate that much RAM to running them. Instead, they either use a lot less RAM than the database size or share the system with other activity that shares use of the RAM. In other words, they use a cost-effective rather than a money-is-no-object configuration, but then would still like to get the best performance they can from it. > > It will be a tough assignment ... maybe someone has > already done this? > > Thinking about this (very abstract) ... does it > really matter? > > [8KB-a][8KB-b][8KB-c] > > So what it 8KB-b gets updated and moved somewhere > else? If the DB gets > a request to read 8KB-a, it needs to do an I/O > (eliminate all > caching). If it gets a request to read 8KB-b, it > needs to do an I/O. > > Does it matter that b is somewhere else ... Yes, with any competently-designed database. it still > needs to go get > it ... only in a very abstract world with read-ahead > (both hardware or > db) would 8KB-b be in cache after 8KB-a was read. 1. If there's no other activity on the disk, then the disk's track cache will acquire the following data when the first block is read, because it has nothing better to do. But if the all the disks are just sitting around waiting for this table scan to get to them, then if ZFS has a sufficiently intelligent read-ahead mechanism it could help out a lot here as well: the differences become greater when the system is busier. 2. Even a moderately smart disk will detect a sequential access pattern if one exists and may read ahead at least modestly after having detected that pattern even if it *does* have other requests pending. 3. But in any event any competent database will explicitly issue prefetches when it knows (and it *does* know) that it is scanning a table sequentially - and will also have taken pains to try to ensure that the table data is laid out such that it can be scanned efficiently. If it's using disks that support tagged command queuing it may just issue a bunch of single-database-block requests at once, and the disk will organize them such that they can all be satisfied by a single streaming access; with disks that don't support queuing, the database can elect to issue a single large I/O request covering many database blocks, accomplishing the same thing as long as the table is in fact laid out contiguously on the medium (the database knows this if it's handling the layout directly, but when it's using a file system as an intermediary it usually can only hope that the file system has minimized file fragmentation). > > Hmmm... the only way is to get some data :) *hehe* Data is good, as long as you successfully analyze what it actually means: it either tends to confirm one's understanding or to refine it. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] read/write NFS block size and ZFS
If you're running over NFS, the ZFS block size most likely won't have a measurable impact on your performance. Unless you've got multiple gigabit ethernet interfaces, the network will generally be the bottleneck rather than your disks, and NFS does enough caching at both client & server end to aggregate updates into large writes. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Slightly Off-Topic: Decent replication on Solaris using rNFS.
Hi, In my infinite search for a reliable work-around for the lack of bandwidth in the United States*, I've reached the conclusion that I need a file-system replication solution for the data stored on my ZFS partition. I've noticed that I'm not the only one asking for this, but I still have no clear answer on my options from Google. After looking into some reports on rNFS on citi.umich.edu, I found out that I'm not the only one with the problem (go figure). I am not really up to date with the NFSv4 spec and drafts, but I am curious if rNFS is part of the current NFSv4 spec or of the upcoming 4.1, and if it's considered or available for OpenSolaris, or if there are any alternatives (such as a replicated ZFS solution that supports simultaneous r/w access on at least 2 geographically separate servers). Some might argue that QFS + Sun Cluster is the way to go, but I need a few things that ZFS currently offers (NFSv4 ACLs and snapshots that Samba can be made aware of), and will want to move to CIFS server as soon as it's production quality. Generally, the write traffic on the Samba shares that need replication is light (around 1GByte/day), but it does need to happen whenever there's a change. I've tried creating a smart cron script that runs unison every minute (lame, I know), but it does not replicate the NFSv4 ACLs, and it's a rather bad approach to the problem to start with. A daemonized unison with support for all the ZFS features that gets the file-change notifications from the kernel along with a distributed lock manager might do the job, but it's something that I'm not qualified to write. I am sure that what I'm looking for is not unheard of. I am hopeful that the ZFS+Lustre integration in the future might allow me something like this, but it doesn't sound like it's close. Any sugestions?!? Cheers, Razvan * Our Bucharest branch has access to 10 Mbits/sec internationally and 100 Mbits/sec nationally (fiber of course) with BGP and our own IP classes, for around EUR 250. This is in contrast with our San Jose, CA branch, which has a connectivity budget of $700 and can get only a bonded-T1 at best in that money (a T1 is $500 ($399 + taxes)). I wish that the most economically advanced country in the world could have a decent internet infrastructure. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
On Fri, Nov 16, 2007 at 11:31:00AM +0100, Paul Boven wrote: > Thanks for your reply. The SCSI-card in the X4200 is a Sun Single > Channel U320 card that came with the system, but the PCB artwork does > sport a nice 'LSI LOGIC' imprint. That is probably the same card i'm using; it's actually a "Sun" card but as you say is OEM by LSI. > So, just to make sure we're talking about the same thing here - your > drives are SATA, yes > you're exporting each drive through the Western > Scientific raidbox as a seperate volume, yes > and zfs actually brings in a > hot spare when you pull a drive? yes OS is Sol10U4, system is an X4200, original hardware rev. > Over here, I've still not been able to accomplish that - even after > installing Nevada b76 on the machine, removing a disk will not cause a > hot-spare to become active, nor does resilvering start. Our Transtec > raidbox seems to be based on a chipset by Promise, by the way. I have heard some bad things about the Promise RAID boxes but I haven't had any direct experience. I do own one Promise box that accepts 4 PATA drives and exports them to a host as scsi disks. Shockingly, it uses a master/slave IDE configuration rather than 4 separate IDE controllers. It wasn't super expensive but it wasn't dirt cheap, either, and it seems it would have cost another $5 to manufacture the "right way." I've had fine luck with Promise $25 ATA PCI cards :) The infortrend units, on the other hand, I have had generally quite good luck with. When I worked at UUNet in the late '90s we had hundreds of their SCSI RAIDs deployed. I do have an Infortrend FC-attached raid with SATA disks, which basically works fine. It has an external JBOD also SATA disks connecting to the main raid with FC. Unfortunately, The RAID unit boots faster than the JBOD. So, if you turn them on at the same time, it thinks the JBOD is gone and doesn't notice it's there until you reboot the controller. That caused a little pucker for my colleagues when it happened while i was on vacation. The support guy at the reseller we were working with (NOT Western Scientific) told them the raid was hosed and they should rebuild from scratch, hope you had a backup. danno -- Dan Pritts, System Administrator Internet2 office: +1-734-352-4953 | mobile: +1-734-834-7224 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4500 device disconnect problem persists
We are having the same problem. First with 125025-05 and then also with 125205-07 Solaris 10 update 4 - Know with all Patchesx We opened a Case and got T-PATCH 127871-02 we installed the Marvell Driver Binary 3 Days ago. T127871-02/SUNWckr/reloc/kernel/misc/sata T127871-02/SUNWmv88sx/reloc/kernel/drv/marvell88sx T127871-02/SUNWmv88sx/reloc/kernel/drv/amd64/marvell88sx T127871-02/SUNWsi3124/reloc/kernel/drv/si3124 T127871-02/SUNWsi3124/reloc/kernel/drv/amd64/si3124 It seems that this resolve the device reset problem and the nfsd crash on x4500 with one raidz2 pool and a lot of zfs Filesystems This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to destory a faulted pool
How I can destroy the following pool ? pool: mstor0 id: 5853485601755236913 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: mstor0 UNAVAIL insufficient replicas raidz1UNAVAIL insufficient replicas c5t0d0 FAULTED corrupted data c4t0d0 FAULTED corrupted data c1t0d0 ONLINE c0t0d0 ONLINE pool: zpool1 id: 14693037944182338678 state: FAULTED status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-3C config: zpool1 UNAVAIL insufficient replicas raidz1UNAVAIL insufficient replicas c0t1d0 UNAVAIL cannot open c1t1d0 UNAVAIL cannot open c4t1d0 UNAVAIL cannot open c6t1d0 UNAVAIL cannot open c7t1d0 UNAVAIL cannot open raidz1UNAVAIL insufficient replicas c0t2d0 UNAVAIL cannot open c1t2d0 UNAVAIL cannot open c4t2d0 UNAVAIL cannot open c6t2d0 UNAVAIL cannot open c7t2d0 UNAVAIL cannot open ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Macs & compatibility (was Re: Yager on ZFS)
On 16-Nov-07, at 4:36 AM, Anton B. Rang wrote: > This is clearly off-topic :-) but perhaps worth correcting -- > >> Long-time MAC users must be getting used to having their entire world >> disrupted and having to re-buy all their software. This is at >> least the >> second complete flag-day (no forward or backwards compatibility) >> change >> they've been through. > > Actually, no; a fair number of Macintosh applications written in > 1984, for the original Macintosh, still run on machines/OSes > shipped in 2006. Apple provided processor compatibility by > emulating the 68000 series on PowerPC, and the PowerPC on Intel; Absolutely Anton, original poster deserves firm correction. Very little broke in either transition; Apple had excellent success with fast and reliable emulation (68K, classic runtime on OS X, PPC on Rosetta). > and OS compatibility by providing essentially a virtual machine > running Mac OS 9 inside Mac OS X (up through 10.4). > > Sadly, Mac OS 9 applications no longer run on Mac OS 10.5, so it's > true that "the world is disrupted" now for those with software > written prior to 2000 or so. I will miss MPW. I wish they would release sources so we could bring it native to OS X. --Toby (Mac user since 1986 or so). > > To make this vaguely Solaris-relevant, it's impressive that SunOS > 4.x applications still generally run on Solaris 10, at least on > SPARC systems, though Sun doesn't do processor emulation. Still not > very ZFS-relevant. :-) > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] "Not owner" from a zone
Yeah, this is annoying. I'm seeing this on a Thumper running Update 3 too... Has this issue been fixed in Update 4 and/or current releases of OpenSolaris? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
Hi Dan, Dan Pritts wrote: > On Tue, Nov 13, 2007 at 12:25:24PM +0100, Paul Boven wrote: >> We've building a storage system that should have about 2TB of storage >> and good sequential write speed. The server side is a Sun X4200 running >> Solaris 10u4 (plus yesterday's recommended patch cluster), the array we >> bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and >> it's connected to the Sun through U320-scsi. > > We are doing basically the same thing with simliar Western Scientific > (wsm.com) raids, based on infortrend controllers. ZFS notices when we > pull a disk and goes on and does the right thing. > > I wonder if you've got a scsi card/driver problem. We tried using > an Adaptec card with solaris with poor results; switched to LSI, > it "just works". Thanks for your reply. The SCSI-card in the X4200 is a Sun Single Channel U320 card that came with the system, but the PCB artwork does sport a nice 'LSI LOGIC' imprint. So, just to make sure we're talking about the same thing here - your drives are SATA, you're exporting each drive through the Western Scientific raidbox as a seperate volume, and zfs actually brings in a hot spare when you pull a drive? Over here, I've still not been able to accomplish that - even after installing Nevada b76 on the machine, removing a disk will not cause a hot-spare to become active, nor does resilvering start. Our Transtec raidbox seems to be based on a chipset by Promise, by the way. Regards, Paul Boven. -- Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547 Unix/Linux/Networking specialist Joint Institute for VLBI in Europe - www.jive.nl VLBI - It's a fringe science ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS mirror and sun STK 2540 FC array
Hi all, we have just bought a sun X2200M2 (4GB / 2 opteron 2214 / 2 disks 250GB SATA2, solaris 10 update 4) and a sun STK 2540 FC array (8 disks SAS 146 GB, 1 raid controller). The server is attached to the array with a single 4 Gb Fibre Channel link. I want to make a mirror using ZFS with this array. I have created 2 volumes on the array in RAID0 (stripe of 128 KB) presented to the host with lun0 and lun1. So, on the host : bash-3.00# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1d0 /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 1. c2d0 /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2. c6t600A0B800038AFBC02F7472155C0d0 /scsi_vhci/[EMAIL PROTECTED] 3. c6t600A0B800038AFBC02F347215518d0 /scsi_vhci/[EMAIL PROTECTED] Specify disk (enter its number): bash-3.00# zpool create tank mirror c6t600A0B800038AFBC02F347215518d0 c6t600A0B800038AFBC02F7472155C0d0 bash-3.00# df -h /tank Filesystem size used avail capacity Mounted on tank 532G24K 532G 1%/tank I have tested the performance with a simple dd [ time dd if=/dev/zero of=/tank/testfile bs=1024k count=1 time dd if=/tank/testfile of=/dev/null bs=1024k count=1 ] command and it gives : # local throughput stk2540 mirror zfs /tank read 232 MB/s write 175 MB/s # just to test the max perf I did: zpool destroy -f tank zpool create -f pool c6t600A0B800038AFBC02F347215518d0 And the same basic dd gives me : single zfs /pool read 320 MB/s write 263 MB/s Just to give an idea the SVM mirror using the two local sata2 disks gives : read 58 MB/s write 52 MB/s So, in production the zfs /tank mirror will be used to hold our home directories (10 users using 10GB each), our projects files (200 GB mostly text files and cvs database), and some vendors tools (100 GB). People will access the data (/tank) using nfs4 with their workstations (sun ultra 20M2 with centos 4update5). On the ultra20 M2, the basic test via nfs4 gives : read 104 MB/s write 63 MB/s A this point, I have the following questions : -- Does someone has some similar figures about the STK 2540 using zfs ? -- Instead of doing only 2 volumes in the array, what do you think about doing 8 volumes (one for each disk) and doing a 4 two way mirror : zpool create tank mirror c6t6001.. c6t6002.. mirror c6t6003.. c6t6004.. {...} mirror c6t6007.. c6t6008.. -- I will add 4 disks in the array next summer. Do you think I should create 2 new luns in the array and doing a : zpool add tank mirror c6t6001..(lun3) c6t6001..(lun4) or build from scratch the 2 luns (6 disks raid0) , and the pool tank (ie : backup /tank - zpool destroy -- add disk - reconfigure array -- zpool create tank ... - restore backuped data) -- I think about doing a disk scrubbing once a month. Is it sufficient ? -- Have you got any comment on the performance from the nfs4 client ? If you add any advices / suggestions, feel free to share. Thanks, Benjamin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
On Thu, Nov 08, 2007 at 07:28:47PM -0800, can you guess? wrote: > > How so? In my opinion, it seems like a cure for the brain damage of RAID-5. > > Nope. > > A decent RAID-5 hardware implementation has no 'write hole' to worry about, > and one can make a software implementation similarly robust with some effort > (e.g., by using a transaction log to protect the data-plus-parity > double-update or by using COW mechanisms like ZFS's in a more intelligent > manner). Can you reference a software RAID implementation which implements a solution to the write hole and performs well. My understanding (and this is based on what I've been told from people more knowledgeable in this domain than I) is that software RAID has suffered from being unable to provide both correctness and acceptable performance. > The part of RAID-Z that's brain-damaged is its > concurrent-small-to-medium-sized-access performance (at least up to request > sizes equal to the largest block size that ZFS supports, and arguably > somewhat beyond that): while conventional RAID-5 can satisfy N+1 > small-to-medium read accesses or (N+1)/2 small-to-medium write accesses in > parallel (though the latter also take an extra rev to complete), RAID-Z can > satisfy only one small-to-medium access request at a time (well, plus a > smidge for read accesses if it doesn't verity the parity) - effectively > providing RAID-3-style performance. Brain damage seems a bit of an alarmist label. While you're certainly right that for a given block we do need to access all disks in the given stripe, it seems like a rather quaint argument: aren't most environments that matter trying to avoid waiting for the disk at all? Intelligent prefetch and large caches -- I'd argue -- are far more important for performance these days. > The easiest way to fix ZFS's deficiency in this area would probably be to map > each group of N blocks in a file as a stripe with its own parity - which > would have the added benefit of removing any need to handle parity groups at > the disk level (this would, incidentally, not be a bad idea to use for > mirroring as well, if my impression is correct that there's a remnant of > LVM-style internal management there). While this wouldn't allow use of > parity RAID for very small files, in most installations they really don't > occupy much space compared to that used by large files so this should not > constitute a significant drawback. I don't really think this would be feasible given how ZFS is stratified today, but go ahead and prove me wrong: here are the instructions for bringing over a copy of the source code: http://www.opensolaris.org/os/community/tools/scm - ahl -- Adam Leventhal, FishWorkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss