Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)
On Nov 16, 2007 10:41 PM, Neil Perrin [EMAIL PROTECTED] wrote: Joe Little wrote: On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote: Joe, I don't think adding a slog helped in this case. In fact I believe it made performance worse. Previously the ZIL would be spread out over all devices but now all synchronous traffic is directed at one device (and everything is synchronous in NFS). Mind you 15MB/s seems a bit on the slow side - especially is cache flushing is disabled. It would be interesting to see what all the threads are waiting on. I think the problem maybe that everything is backed up waiting to start a transaction because the txg train is slow due to NFS requiring the ZIL to push everything synchronously. I agree completely. The log (even though slow) was an attempt to isolate writes away from the pool. I guess the question is how to provide for async access for NFS. We may have 16, 32 or whatever threads, but if a single writer keeps the ZIL pegged and prohibiting reads, its all for nought. Is there anyway to tune/configure the ZFS/NFS combination to balance reads/writes to not starve one for the other. Its either feast or famine or so tests have shown. No there's no way currently to give reads preference over writes. All transactions get equal priority to enter a transaction group. Three txgs can be outstanding as we use a 3 phase commit model: open; quiescing; and syncing. anyway to improve the balance? Is would appear that zil_disable is still a requirement to get NFS to behave in an practical real world way with ZFS still. Even with zil_disable, we end up with periods of pausing on the heaviest of writes, and then I think its mostly just ZFS having too much outstanding i/o to commit. If zil_disable is enabled, is the slog disk ignored? Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool io to 6140 is really slow
On Nov 17, 2007 9:12 AM, Louwtjie Burger [EMAIL PROTECTED] wrote: You have a 6140 with SAS drives ?! When did this happen? OOPS! I meant FC-AL On Nov 17, 2007 12:30 AM, Asif Iqbal [EMAIL PROTECTED] wrote: I have the following layout A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using A1 anfd B1 controller port 4Gbps speed. Each controller has 2G NVRAM On 6140s I setup raid0 lun per SAS disks with 16K segment size. On 490 I created a zpool with 8 4+1 raidz1s I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in /etc/system Is there a way I can improve the performance. I like to get 1GB/sec IO. Currently each lun is setup as primary A1 and secondary B1 or vice versa I also have write cache eanble according to CAM -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool io to 6140 is really slow
(Including storage-discuss) I have 6 6140s with 96 disks. Out of which 64 of them are Seagate ST337FC (300GB - 1 RPM FC-AL) I created 16k seg size raid0 luns using single fcal disks. Then created a zpool with 8 4+1 raidz1 using those luns, out of single disks. Also set the zfs nocache flush to `1' to take advantage of the 2G NVRAM cache of the controllers. I am using one port per controller. Rest of them are down (not in use). Each controller port speed is 4Gbps. All luns have one controller as primary and second one as secondary I am getting only 125MB/s according to the zpool IO. I should get ~ 512MB/s per IO. Also is it possible to get 2GB/s IO by using the leftover ports of the controllers? Is it also possible to get 4GB/s IO by aggregating the controllers (w/ 8 ports totat)? On Nov 16, 2007 5:30 PM, Asif Iqbal [EMAIL PROTECTED] wrote: I have the following layout A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using A1 anfd B1 controller port 4Gbps speed. Each controller has 2G NVRAM On 6140s I setup raid0 lun per SAS disks with 16K segment size. On 490 I created a zpool with 8 4+1 raidz1s I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in /etc/system Is there a way I can improve the performance. I like to get 1GB/sec IO. Currently each lun is setup as primary A1 and secondary B1 or vice versa I also have write cache eanble according to CAM -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS
I've been observing two threads on zfs-discuss with the following Subject lines: Yager on ZFS ZFS + DB + fragments and have reached the rather obvious conclusion that the author can you guess? is a professional spinmeister, Ah - I see we have another incompetent psychic chiming in - and judging by his drivel below a technical incompetent as well. While I really can't help him with the former area, I can at least try to educate him in the latter. ... Excerpt 1: Is this premium technical BullShit (BS) or what? Since you asked: no, it's just clearly beyond your grade level, so I'll try to dumb it down enough for you to follow. - BS 301 'grad level technical BS' --- Still, it does drive up snapshot overhead, and if you start trying to use snapshots to simulate 'continuous data protection' rather than more sparingly the problem becomes more significant (because each snapshot will catch any background defragmentation activity at a different point, such that common parent blocks may appear in more than one snapshot even if no child data has actually been updated). Once you introduce CDP into the process (and it's tempting to, since the file system is in a better position to handle it efficiently than some add-on product), rethinking how one approaches snapshots (and COW in general) starts to make more sense. Do you by any chance not even know what 'continuous data protection' is? It's considered a fairly desirable item these days and was the basis for several hot start-ups (some since gobbled up by bigger fish that apparently agreed that they were onto something significant), since it allows you to roll back the state of individual files or the system as a whole to *any* historical point you might want to (unlike snapshots, which require that you anticipate points you might want to roll back to and capture them explicitly - or take such frequent snapshots that you'll probably be able to get at least somewhere near any point you might want to, a second-class simulation of CDP which some vendors offer because it's the best they can do and is precisely the activity which I outlined above, expecting that anyone sufficiently familiar with file systems to be able to follow the discussion would be familiar with it). But given your obvious limitations I guess I should spell it out in words of even fewer syllables: 1. Simulating CDP without actually implementing it means taking very frequent snapshots. 2. Taking very frequent snapshots means that you're likely to interrupt background defragmentation activity such that one child of a parent is moved *before* the snapshot is taken while another is moved *after* the snapshot is taken, resulting in the need to capture a before-image of the parent (because at least one of its pointers is about to change) *and all ancestors of the parent* (because the pointer change will propagate through all the ancestral checksums - and pointers, with COW) in every snapshot that occurs immediately prior to moving *any* of its children rather than just having to capture a single before-image of the parent and all its ancestors after which all its child pointers will likely get changed before the next snapshot is taken. So that's what any competent reader should have been able to glean from the comments that stymied you. The paragraph's concluding comments were considerably more general in nature and thus legitimately harder to follow: had you asked for clarification rather than just assumed that they were BS simply because you couldn't understand them you would not have looked like such an idiot, but since you did call them into question I'll now put a bit more flesh on them for those who may be able to follow a discussion at that level of detail: 3. The file system is in a better position to handle CDP than some external mechanism because a) the file system knows (right down to the byte level if it wants to) exactly what any individual update is changing, b) the file system knows which updates are significant (e.g., there's probably no intrinsic need to capture rollback information for lazy writes because the application didn't care whether they were made persistent at that time, but for any explicitly-forced writes or syncs a rollback point should be established), and c) the file system is already performing log forces (where a log is involved) or batch disk updates (a la ZFS) to honor such application-requested persistence, and can piggyback the required CDP before-image persistence on them rather than requiring separate synchronous log or disk accesses to do so. 4. If you've got full-fledged CDP, it's questionable whether you need snapshots as well (unless you have really, really inflexible requirements for virtually instantaneous rollback and/or for high-performance writable-clone access) - and if CDP turns out to be this decade's important new file
Re: [zfs-discuss] [storage-discuss] zpool io to 6140 is really slow
On Nov 17, 2007 2:55 PM, Torrey McMahon [EMAIL PROTECTED] wrote: Have you tried disabling the zil cache flushing? I already have zfs nocache flush set to 1 to take advantage of NVRAM of the raid controllers set zfs:zfs_nocacheflush = 1 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes Asif Iqbal wrote: (Including storage-discuss) I have 6 6140s with 96 disks. Out of which 64 of them are Seagate ST337FC (300GB - 1 RPM FC-AL) I created 16k seg size raid0 luns using single fcal disks. Then created a zpool with 8 4+1 raidz1 using those luns, out of single disks. Also set the zfs nocache flush to `1' to take advantage of the 2G NVRAM cache of the controllers. I am using one port per controller. Rest of them are down (not in use). Each controller port speed is 4Gbps. All luns have one controller as primary and second one as secondary I am getting only 125MB/s according to the zpool IO. I should get ~ 512MB/s per IO. Also is it possible to get 2GB/s IO by using the leftover ports of the controllers? Is it also possible to get 4GB/s IO by aggregating the controllers (w/ 8 ports totat)? On Nov 16, 2007 5:30 PM, Asif Iqbal [EMAIL PROTECTED] wrote: I have the following layout A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using A1 anfd B1 controller port 4Gbps speed. Each controller has 2G NVRAM On 6140s I setup raid0 lun per SAS disks with 16K segment size. On 490 I created a zpool with 8 4+1 raidz1s I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in /etc/system Is there a way I can improve the performance. I like to get 1GB/sec IO. Currently each lun is setup as primary A1 and secondary B1 or vice versa I also have write cache eanble according to CAM -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS
On Sat, 17 Nov 2007, can you guess? wrote: Ah - I see we have another incompetent psychic chiming in - and judging by his drivel below a technical incompetent as well. While I really can't help him with the former area, I can at least try to educate him in the latter. I should know better than to reply to a troll, but I can't let this personal attack stand. I know Al, and I can tell you for a fact that he is *far* from technically incompentent. Judging from the length of your diatribe (which I didn't bother reading), you seem to subscribe to the if you can't blind 'em with science, baffle them with bullshit school of thought. I'd take the word of any number of people on this list over yours, anyday. HAND, -- Rich Teer, SCSA, SCNA, SCSECA, OGB member CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS
troll bait Rich Teer wrote: I should know better than to reply to a troll, but I can't let this personal attack stand. I know Al, and I can tell you for a fact that he is *far* from technically incompentent. Judging from the length of your diatribe (which I didn't bother reading), you seem to subscribe to the if you can't blind 'em with science, baffle them with bullshit school of thought. I'd take the word of any number of people on this list over yours, anyday. HAND, I'm sure this troll will reply to you as he did to me. I just can't help laughing at his responses anymore. I do find it odd that someone has so much time on their hands to just post such remarks. It's as if they think they are doing themself or the world a flavor. /troll bait ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss