Re: [zfs-discuss] Fwd: ZFS for consumers WAS:Yager on ZFS
Paul Kraus wrote: I also like being able to see how much space I am using for each with a simple df rather than a du (that takes a while to run). I can also tune compression on a data type basis (no real point in trying to compress media files that are already compressed MPEG and JPEGs). That's a very good point. I do the same and as a side effect, my data has never been better organised. For a home user, data integrity is probably as, if not more, important than for a corporate user. How many home users do regular backups? Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and
OTOH, when someone whom I don't know comes across as a pushover, he loses credibility. It may come as a shock to you, but some people couldn't care less about those who assess 'credibility' on the basis of form rather than on the basis of content - which means that you can either lose out on potentially useful knowledge by ignoring them due to their form, or change your own attitude. I'd expect a senior engineer to show not only technical expertise but also the ability to handle difficult situations, *not* adding to the difficulties by his comments. Another surprise for you, I'm afraid: some people just don't meet your expectations in this area. In particular, I only 'show my ability to handle difficult situations' in the manner that you suggest when I have some actual interest in the outcome - otherwise, I simply do what I consider appropriate and let the chips fall where they may. Deal with that in whatever manner you see fit (just as I do). - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fwd: ZFS for consumers WAS:Yager on ZFS
For a home user, data integrity is probably as, if not more, important than for a corporate user. How many home users do regular backups? I'm a heavy computer user and probably passed the 500GB mark way before most other home users, did various stunts like running a RAID0 on IBM Deathstars, and I never back up. And I'm only running a ZFS mirror since a month or two, as insurance against disk failure (suddenly felt I needed to do this). What ZFS can give home users is safety for certain parts of their data, via checksums and ditto blocks. Doesn't prevent disk failure, but sure helps keeping important personal documents uncorrupted. -mg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Recommended settings for dom0_mem when using zfs
I have a xVm b75 server and use zfs for storage (zfs root mirror and a raid-z2 datapool.) I see everywhere that it is recommended to have a lot of memory on a zfs file server... but I also need to relinquish a lot of my memory to be used by the domUs. What would a good value for dom0_mem on a box with 4 gig of ram? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [xen-discuss] Recommended settings for dom0_mem when using zfs
K wrote: I have a xVm b75 server and use zfs for storage (zfs root mirror and a raid-z2 datapool.) I see everywhere that it is recommended to have a lot of memory on a zfs file server... but I also need to relinquish a lot of my memory to be used by the domUs. What would a good value for dom0_mem on a box with 4 gig of ram? I have found setting dom0_mem to 2G (with zfs in dom0) works pretty well as long as you don't balloon down. If you want to set dom0_mem to 1G, you should limit the amount of memory zfs uses. e.g. set the following in /etc/system. set zfs:zfs_arc_max = 0x1000 MRJ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fwd: ZFS for consumers WAS:Yager on ZFS
On 11/19/07, Ian Collins [EMAIL PROTECTED] wrote: For a home user, data integrity is probably as, if not more, important than for a corporate user. How many home users do regular backups? Let me correct a point I made badly the first time around, I value the data integrity provided by mirroring (I have always used mirrored drives for data and OS on my home servers), I don't know how much the end-to-end checksumming buys me, but it is not a compelling feature. In other words, I didn't choose ZFS because of the end-to-end checksumming, I chose it for the ease of management and flexibility in configuration. The checksummed data is just a bonus that came along for the ride :-) Remember, this thread was essentially Why would a home user choose ZFS over other options... I tried using software mirrors under Linux ... maybe I was spoiled by Disk Suite / Solaris Volume Manager, but I found the Linux software mirrors clunky and unreliable (when installing the OS, the metadevices came up in one order, after booting off of the hard disk they came up in another order, leaving my mirrored root unmountable). I'm not a big fan of hardware RAID as I have seen terrible performance out of HW RAID cards and from the OS layer you need additional hardware vendor drivers to really manage and monitor the drives (if you even can from the OS layer, I hate rebooting, even home servers). Just one geeks opinion. -- Paul Kraus Albacon 2008 Facilities ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on a raid box
On Mon, Nov 19, 2007 at 11:10:32AM +0100, Paul Boven wrote: Any suggestions on how to further investigate / fix this would be very much welcomed. I'm trying to determine whether this is a zfs bug or one with the Transtec raidbox, and whether to file a bug with either Transtec (Promise) or zfs. the way i'd try to do this would be to use the same box under solaris software RAID, or better yet linux or windows software RAID (to make sure it's not a solaris device driver problem). does pulling the disk then get noticed? If so, it's a zfs bug. danno -- Dan Pritts, System Administrator Internet2 office: +1-734-352-4953 | mobile: +1-734-834-7224 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)
Neil Perrin writes: Joe Little wrote: On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote: Joe, I don't think adding a slog helped in this case. In fact I believe it made performance worse. Previously the ZIL would be spread out over all devices but now all synchronous traffic is directed at one device (and everything is synchronous in NFS). Mind you 15MB/s seems a bit on the slow side - especially is cache flushing is disabled. It would be interesting to see what all the threads are waiting on. I think the problem maybe that everything is backed up waiting to start a transaction because the txg train is slow due to NFS requiring the ZIL to push everything synchronously. I agree completely. The log (even though slow) was an attempt to isolate writes away from the pool. I guess the question is how to provide for async access for NFS. We may have 16, 32 or whatever threads, but if a single writer keeps the ZIL pegged and prohibiting reads, its all for nought. Is there anyway to tune/configure the ZFS/NFS combination to balance reads/writes to not starve one for the other. Its either feast or famine or so tests have shown. No there's no way currently to give reads preference over writes. All transactions get equal priority to enter a transaction group. Three txgs can be outstanding as we use a 3 phase commit model: open; quiescing; and syncing. That makes me wonder if this is not just the lack of write throttling issue. If one txg is syncing and the other is quiesced out, I think it means we have let in too many writes. We do need a better balance. Neil is it correct that reads never hit txg_wait_open(), but they just need an I/O scheduler slot ? If so seems to me just a matter of 6429205 each zpool needs to monitor it's throughput and throttle heavy writers However, if this is it, disabling the zil would not solve the issue (it might even make it worse). So I am lost as to what could be blocking the reads other than lack of I/O slots. As another way to improve I/O scheduler we have : 6471212 need reserved I/O scheduler slots to improve I/O latency of critical ops -r Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)
Roch - PAE wrote: Neil Perrin writes: Joe Little wrote: On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote: Joe, I don't think adding a slog helped in this case. In fact I believe it made performance worse. Previously the ZIL would be spread out over all devices but now all synchronous traffic is directed at one device (and everything is synchronous in NFS). Mind you 15MB/s seems a bit on the slow side - especially is cache flushing is disabled. It would be interesting to see what all the threads are waiting on. I think the problem maybe that everything is backed up waiting to start a transaction because the txg train is slow due to NFS requiring the ZIL to push everything synchronously. I agree completely. The log (even though slow) was an attempt to isolate writes away from the pool. I guess the question is how to provide for async access for NFS. We may have 16, 32 or whatever threads, but if a single writer keeps the ZIL pegged and prohibiting reads, its all for nought. Is there anyway to tune/configure the ZFS/NFS combination to balance reads/writes to not starve one for the other. Its either feast or famine or so tests have shown. No there's no way currently to give reads preference over writes. All transactions get equal priority to enter a transaction group. Three txgs can be outstanding as we use a 3 phase commit model: open; quiescing; and syncing. That makes me wonder if this is not just the lack of write throttling issue. If one txg is syncing and the other is quiesced out, I think it means we have let in too many writes. We do need a better balance. Neil is it correct that reads never hit txg_wait_open(), but they just need an I/O scheduler slot ? Yes, they don't modify any meta data (except access time which is handled separately). I'm less clear about what happens further down in the DMU and SPA. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] raidz2
If I yank out a disk in a raidz2 4 disk array, shouldn't the other disks pick up without any errrors? I have a 3120 JBOD and I went and yanked out a disk and the everything got hosed. It's okay, because I'm just testing stuff and wanted to see raidz2 in action when a disk goes down. Am I missing a step? I set up the disks with this command: zpool create apool raidz2 c#t#d0 c#t#d0 c#t#d0 c#t#d0. zfs create apool/export_home zfs create apool/export_backup zfs set mountpoint=/export/home apool/export_home zfs set mountpoint=/export/backup apool/export_backup Thanks for your help/advice. Brian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz2
On Mon, Nov 19, 2007 at 04:33:26PM -0700, Brian Lionberger wrote: If I yank out a disk in a raidz2 4 disk array, shouldn't the other disks pick up without any errrors? I have a 3120 JBOD and I went and yanked out a disk and the everything got hosed. It's okay, because I'm just testing stuff and wanted to see raidz2 in action when a disk goes down. Am I missing a step? What version of Solaris are you running? What does got hosed mean? There have been many improvements in proactively detecting failure, culminating in build 77 of Nevada. Earlier builds: - Were unable to distinguish device removal from devices misbehaving, depending on the driver and hardware. - Did not diagnose a series of I/O failures as disk failure. - Allowed several (painful) SCSI retries and continued to queue up I/O, even if the disk was fatally damaged. Most classes of hardware would behave reasonably well on device removal, but certain classes caused cascading failures in ZFS, all which should be resolved in build 77 or later. - Eric -- Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and
: Big talk from someone who seems so intent on hiding : their credentials. : Say, what? Not that credentials mean much to me since I evaluate people : on their actual merit, but I've not been shy about who I am (when I : responded 'can you guess?' in registering after giving billtodd as my : member name I was being facetious). You're using a web-based interface to a mailing list and the 'billtodd' bit doesn't appear to any users (such as me) subscribed via that mechanism. Then perhaps Sun should make more of a point of this in their Web-based registration procedure. So yes, 'can you guess?' is unhelpful and makes you look as if you're being deliberately unhelpful. Appearances can be deceiving, in large part because they're so subjective. That's why sensible people dig beneath them before forming any significant impressions. ... : If you're still referring to your incompetent alleged research, [...] : [...] right out of the : same orifice from which you've pulled the rest of your crap. It's language like that that is causing the problem. No, it's ignorant loudmouths like cook and al who are causing the problem: I'm simply responding to them as I see fit. IMHO you're being a tad rude. I'm being rude as hell to people who truly earned it, and intend to continue until they shape up or shut up. So if you feel that there's a problem here that you'd like to help fix, I suggest that you try tackling it at its source. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Oops (accidentally deleted replaced drive)
Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering began. But, accidentally deleted the replacement drive on the array via CAM. # zpool status -v ... raidz2 DEGRADED 0 0 0 c0t600A0B800029996605964668CB39d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 replacingUNAVAIL 0 79.14 0 insufficient replicas c0t600A0B8000299966059E4668CBD3d0 UNAVAIL 27 370 0 cannot open c0t600A0B800029996606584741C7C3d0 UNAVAIL 0 82.32 0 cannot open c0t600A0B8000299CCC05D84668F448d0ONLINE 0 0 0 c0t600A0B8000299CCC05B44668CC6Ad0 ONLINE 0 0 0 c0t600A0B800029996605A44668CC3Fd0 ONLINE 0 0 0 c0t600A0B8000299CCC05BA4668CD2Ed0 ONLINE 0 0 0 Is there a way to recover from this? # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 cannot replace c0t600A0B8000299966059E4668CBD3d0 with c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oops (accidentally deleted replaced drive)
You should be able to do a 'zpool detach' of the replacement and then try again. - Eric On Mon, Nov 19, 2007 at 08:20:04PM -0600, Albert Chin wrote: Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering began. But, accidentally deleted the replacement drive on the array via CAM. # zpool status -v ... raidz2 DEGRADED 0 0 0 c0t600A0B800029996605964668CB39d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 replacingUNAVAIL 0 79.14 0 insufficient replicas c0t600A0B8000299966059E4668CBD3d0 UNAVAIL 27 370 0 cannot open c0t600A0B800029996606584741C7C3d0 UNAVAIL 0 82.32 0 cannot open c0t600A0B8000299CCC05D84668F448d0ONLINE 0 0 0 c0t600A0B8000299CCC05B44668CC6Ad0 ONLINE 0 0 0 c0t600A0B800029996605A44668CC3Fd0 ONLINE 0 0 0 c0t600A0B8000299CCC05BA4668CD2Ed0 ONLINE 0 0 0 Is there a way to recover from this? # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 cannot replace c0t600A0B8000299966059E4668CBD3d0 with c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommended many-port SATA controllers for budget ZFS
On Sun, Nov 18, 2007 at 02:18:21PM +0100, Peter Schuller wrote: Right now I have noticed that LSI has recently began offering some lower-budget stuff; specifically I am looking at the MegaRAID SAS 8208ELP/XLP, which are very reasonably priced. I looked up the 8204XLP, which is really quite expensive compared to the Supermicro MV based card. That being said, for a small 1U box that is only going to have two SATA disks, the Supermicro card is way overkill/overpriced for my needs. Does anyone know if there are any PCI-X cards based on the MV88SX6041? I'm not having much luck finding any. Thanks, -brian -- Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly. -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] reslivering
So... issues with reslivering yet again. This is ~3TB pool. I have one raid-z of 5 500GB disks, and a second pool of 3 300GB disks. One of the 300GB disks failed, so I have replaced the drive. After doing the resliver, it takes approximately 5 minutes for it to complete 68.05% of the reslivering... then it appears to just hang. It's been this way for 30 hours now. If I do a zpool status, the command does not finish, it just hangs after presenting the scrub: resliver in progress, 68.05% done If I do a zpool iostat, it shows zero disk activity. If I do a zpool iostat -v, the command hangs as well. There's 0 activity to this pool as I suspended all shares while doing the reslivering in hopes it would speed things up. I've seen previous threads about how slow this can be, but this is a bit ridiculous. At this point I'm afraid to stick anymore data into a zpool if one disk failing will take weeks to rebuild. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oops (accidentally deleted replaced drive)
On Mon, Nov 19, 2007 at 06:23:01PM -0800, Eric Schrock wrote: You should be able to do a 'zpool detach' of the replacement and then try again. Thanks. That worked. - Eric On Mon, Nov 19, 2007 at 08:20:04PM -0600, Albert Chin wrote: Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering began. But, accidentally deleted the replacement drive on the array via CAM. # zpool status -v ... raidz2 DEGRADED 0 0 0 c0t600A0B800029996605964668CB39d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 replacingUNAVAIL 0 79.14 0 insufficient replicas c0t600A0B8000299966059E4668CBD3d0 UNAVAIL 27 370 0 cannot open c0t600A0B800029996606584741C7C3d0 UNAVAIL 0 82.32 0 cannot open c0t600A0B8000299CCC05D84668F448d0ONLINE 0 0 0 c0t600A0B8000299CCC05B44668CC6Ad0 ONLINE 0 0 0 c0t600A0B800029996605A44668CC3Fd0 ONLINE 0 0 0 c0t600A0B8000299CCC05BA4668CD2Ed0 ONLINE 0 0 0 Is there a way to recover from this? # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 cannot replace c0t600A0B8000299966059E4668CBD3d0 with c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Indexing other than hash tables
Hello All, Mike Speirs at Sun in New Zealand pointed me toward you-all. I have several sets of questions, so I plan to group them and send several emails. This question is about the name/attribute mapping layer in ZFS. In the last version of the source-code that I read, it provides hash-tables. They are a good way of finding an exact match for a name. I'm from a database background, so I am thinking of the hash-tables as if they were an index to some data that could have been stored in a different form. Are there any plans to provide the following searches efficiently: - matching prefixes - eg I give it a telephone number, and it gives me a list of file descriptors for the: - suburb - city - province/state/etc - country - hemisphere each of which matches fewer digits at the beginning of the number [[ general tries work for this ]] - upper and lower bounds - eg imagine fixed-length dates and times that are all digits; I give it a date and time, and it gives me the file descriptor for the latest date and time = the one I give it [[ tries can be made to do this, but B-variant-trees are better ]] Regards, James. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] reslivering
After messing around... who knows what's going on with it now. Finally rebooted because I was sick of it hanging. After that, this is what it came back with: root:= zpool status pool: fserv state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 87h56m to go config: NAMESTATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz1ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 raidz1DEGRADED 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 replacing DEGRADED 0 0 0 12544952246745011915 FAULTED 0 0 0 was /dev/dsk/c4t5d0s0/old c4t5d0ONLINE 0 0 0 root:= zpool iostat -v capacity operationsbandwidth pool used avail read write read write -- - - - - - - fserv990G 2.11T397 25 25.3M 101K raidz1 866G 1.42T201 1 804K 5.29K c4t0d0 - -133 1 533K 9.86K c4t1d0 - -133 1 544K 9.89K c4t2d0 - -133 1 541K 9.88K c4t3d0 - -133 1 535K 9.82K c4t4d0 - -132 1 525K 9.86K raidz1 124G 708G196 23 24.5M 95.6K c4t6d0 - -102 31 12.3M 84.1K c4t7d0 - -102 31 12.3M 83.8K replacing - - 0 42 0 1.48M 12544952246745011915 - - 0 0 1.67K 0 c4t5d0- - 0 25 2.04K 1.50M -- - - - - - - This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + DB + fragments
James Cone wrote: Hello All, Here's a possibly-silly proposal from a non-expert. Summarising the problem: - there's a conflict between small ZFS record size, for good random update performance, and large ZFS record size for good sequential read performance Poor sequential read performance has not been quantified. - COW probably makes that conflict worse This needs to be proven with a reproducible, real-world workload before it makes sense to try to solve it. After all, if we cannot measure where we are, how can we prove that we've improved? Note: some block devices will not exhibit the phenomenon which people seem to be worried about in this thread. There are more options than just re-architect ZFS. I'm not saying there aren't situations where there may be a problem, I'm just observing that nobody has brought data to this party. -- richard - re-packing (~= defragmentation) would make it better, but cause problems with the snapshot mechanism Proposed solution: - keep COW - create a new operation that combines snapshots and cloning - when you're cloning, always write a tidy, re-packed layout of the data - if you're using the new operation, keep the existing layout as the clone, and give the new layout to the running file-system Things that have to be done to make this work: - sort out the semantics, because the clone will be in the existing zpool, and the file-system will move to a new zpool (not sure if I have the terminology right) - sort out the transactional properties; the changes made since the start of the operation will have to be copied across into the new layout Regards, James. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] reslivering
That locked up pretty quickly as well, one more reboot and this is what I'm seeing now: root:= zpool status pool: fserv state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 1.81% done, 0h19m to go config: NAME STATE READ WRITE CKSUM fserv DEGRADED 0 0 0 raidz1 ONLINE 0 0 0 c4t0d0ONLINE 0 0 0 c4t1d0ONLINE 0 0 0 c4t2d0ONLINE 0 0 0 c4t3d0ONLINE 0 0 0 c4t4d0ONLINE 0 0 0 raidz1 DEGRADED 0 0 0 c4t6d0ONLINE 0 0 0 c4t7d0ONLINE 0 0 0 replacing DEGRADED 0 0 0 c4t5d0s0/o FAULTED 0 0 0 corrupted data c4t5d0 ONLINE 0 0 0 errors: No known data errors The corrupted data seems a bit scary to me... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool io to 6140 is really slow
Asif Iqbal wrote: I have the following layout A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using A1 anfd B1 controller port 4Gbps speed. Each controller has 2G NVRAM On 6140s I setup raid0 lun per SAS disks with 16K segment size. On 490 I created a zpool with 8 4+1 raidz1s I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in /etc/system Is there a way I can improve the performance. I like to get 1GB/sec IO. I don't believe a V490 is capable of driving 1 GByte/s of I/O. The V490 has two schizos and the schizo is not a full speed bridge. For more information see Section 1.2 of: http://www.sun.com/processors/manuals/External_Schizo_PRM.pdf -- richard Currently each lun is setup as primary A1 and secondary B1 or vice versa I also have write cache eanble according to CAM ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + DB + fragments
Regardless of the merit of the rest of your proposal, I think you have put your finger on the core of the problem: aside from some apparent reluctance on the part of some of the ZFS developers to believe that any problem exists here at all (and leaving aside the additional monkey wrench that using RAID-Z here would introduce, because one could argue that files used in this manner are poor candidates for RAID-Z anyway hence that there's no need to consider reorganizing RAID-Z files), the *only* down-side (other than a small matter of coding) to defragmenting files in the background in ZFS is the impact that would have on run-time performance (which should be minimal if the defragmentation is performed at lower priority) and the impact it would have on the space consumed by a snapshot that existed while the defragmentation was being done. One way to eliminate the latter would be simply not to reorganize while any snapshot (or clone) existed: no worse than the situation today, and better whenever no snapshot or clone is present. That would change the perceived 'expense' of a snapshot, though, since you'd know you were potentially giving up some run-time performance whenever one existed - and it's easy to imagine installations which might otherwise like to run things such that a snapshot was *always* present. Another approach would be just to accept any increased snapshot space overhead. So many sequentially-accessed files are just written once and read-only thereafter that a lot of installations might not see any increased snapshot overhead at all. Some files are never accessed sequentially (or done so only in situations where performance is unimportant), and if they could be marked Don't reorganize then they wouldn't contribute any increased snapshot overhead either. One could introduce controls to limit the times when reorganization was done, though my inclination is to suspect that such additional knobs ought to be unnecessary. One way to eliminate almost completely the overhead of the additional disk accesses consumed by background defragmentation would be to do it as part of the existing background scrubbing activity, but for actively-updated files one might want to defragment more often than one needed to scrub. In any event, background defragmention should be a relatively easy feature to introduce and try out if suitable multi-block contiguous allocation mechanisms already exist to support ZFS's existing batch writes. Use of ZIL to perform opportunistic defragmentation while updated data was still present in the cache might be a bit more complex, but could still be worth investigating. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 19, 2007 1:43 AM, Louwtjie Burger [EMAIL PROTECTED] wrote: On Nov 17, 2007 9:40 PM, Asif Iqbal [EMAIL PROTECTED] wrote: (Including storage-discuss) I have 6 6140s with 96 disks. Out of which 64 of them are Seagate ST337FC (300GB - 1 RPM FC-AL) Those disks are 2Gb disks, so the tray will operate at 2Gb. That is still 256MB/s . I am getting about 194MB/s I created 16k seg size raid0 luns using single fcal disks. Then You showed the single disks as LUN's to the host... if I understand correctly. Yes Q: Why 16K? To avoid segment crossing. It will mainly be used fro oracle db whose block size is 16K created a zpool with 8 4+1 raidz1 using those luns, out of single What is the layout here? Inside 1 tray, over multiple trays? Over multiple trays disks. Also set the zfs nocache flush to `1' to take advantage of the 2G NVRAM cache of the controllers. I am using one port per controller. Rest of them are down (not in use). Each controller port speed is 4Gbps. The 6140 is assymetric and as such the second controller will be available in fail-over mode, it is not actively used for load balancing. So there is no way to create a aggreated channel off of both controllers? You need to hook up more FC links to the primary controller that has the active LUN's assigned, that is the only way to easily get more IOP's. Adding a second loop by adding another non active port I may have to rebuild the FS, no? All luns have one controller as primary and second one as secondary I am getting only 125MB/s according to the zpool IO. Seems a tad low, how are you testing? I should get ~ 512MB/s per IO. Hmmm, how did you get to this total? Keeping in mind that your tray is sitting at 2Gb and your extensions to the CSM trays are all single channel... you will get a 2Gb ceiling. Also have a look at Even for the OS IO? So the controller nvram does not help increase the IO for OS? http://en.wikipedia.org/wiki/Fibre_Channel#History At first glance and not knowing the exact setup I would say that you will not get more than 200MB/s (if that much). I am gettin 194MB/s. Hmm my 490 has 16G memory. I really I could benefit some from OS and controller RAM, atleast for Oracle IO Any reason why you are not using the RAID controller to do the work for you? They are raid0 luns. So raid controller is in use. I get higher IO from zpool off of raid0 luns of single disks then raid5 type lun or raid0 among multilple disks as one lun and then zpool on top Also is it possible to get 2GB/s IO by using the leftover ports of the controllers? Is it also possible to get 4GB/s IO by aggregating the controllers (w/ 8 ports totat)? On Nov 16, 2007 5:30 PM, Asif Iqbal [EMAIL PROTECTED] wrote: I have the following layout A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using A1 anfd B1 controller port 4Gbps speed. Each controller has 2G NVRAM On 6140s I setup raid0 lun per SAS disks with 16K segment size. On 490 I created a zpool with 8 4+1 raidz1s I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in /etc/system Is there a way I can improve the performance. I like to get 1GB/sec IO. Currently each lun is setup as primary A1 and secondary B1 or vice versa I also have write cache eanble according to CAM -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ storage-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/storage-discuss -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss