[zfs-discuss] Help with the best layout
Hi everybody, thanks for at very good source of information! I hope maybe you guys can help out a little. I have 3 disk, one usb 300gb and 2x150gb ide. I would like to get the most space out of what ever configuration i apply. So i've been thinking (and testing without success), is it at all possible to stripe the two smaller disks and then mirroring that stripe with the lager one? I've tried all kinds of maneuvers, except destroying everything and start from scratch. The thing is i've already got 20gb data on a mirror containing the two smaller disks. So i've sort of had to zigzag a little bit moving the data around. But i will start from scratch if needed. Any ideas? Thanks in advance. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help with the best layout
Using ZFS ofcourse *g* This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] iscsi connection aborted.
Hi, I'm trying to boot a HP dl360 G5 from via iSCSI from a solaris 10 u4 zfs device but it's failing the login at boot: POST messages from the dl360: Starting iSCSI boot option rom initialization... Connecting.connected. Logging in...error - failing. Interestingly (and correctly) the authentication method is set to none. I've looked at a TCP capture and the box succeeds the first login but immediatly after the second login the sun box (iscsi target) closes the connection (tcp fin). I've compared this to a sucessful session when running in windows and there are some minor differences to the key/value pairs (receive buffer length/CRC etc) and also the ISID is all zero at the boot session but has values when running from windows. But I'm not sure if this is a problem or not. I'm not sure what/where the problem is - all the values appear to be setup correctly (I'm new to iscsi) - after all the first login command succeeds. Anyone any pointers as to what is the problem or where to look to diagnose this further (does solaris log this information anywhere)? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which DTrace provider to use
Le 14 févr. 08 à 02:22, Marion Hakanson a écrit : [EMAIL PROTECTED] said: It's not that old. It's a Supermicro system with a 3ware 9650SE-8LP. Open-E iSCSI-R3 DOM module. The system is plenty fast. I can pretty handily pull 120MB/sec from it, and write at over 100MB/sec. It falls apart more on random I/O. The server/initiator side is a T2000 with Solaris 10u4. It never sees over 25% CPU, ever. Oh yeah, and two 1GB network links to the SAN . . . My opinion is, if when the array got really loaded up, everything slowed down evenly, users wouldn't mind or notice much. But when every 20 or so reads/writes gets delayed my 10s of seconds, the users start to line up at my door. This is the write throttling problem. I've tested code that changes radically the situation for the better. We just need to go through performance validation before putback. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
Le 15 févr. 08 à 03:34, Bob Friesenhahn a écrit : On Thu, 14 Feb 2008, Tim wrote: If you're going for best single file write performance, why are you doing mirrors of the LUNs? Perhaps I'm misunderstanding why you went from one giant raid-0 to what is essentially a raid-10. That decision was made because I also need data reliability. As mentioned before, the write rate peaked at 200MB/second using RAID-0 across 12 disks exported as one big LUN. What was the interlace on the LUN ? Other firmware-based methods I tried typically offered about 170MB/second. Even a four disk firmware-managed RAID-5 with ZFS on top offered about 165MB/second. Given that I would like to achieve 300MB/second, a few tens of MB don't make much difference. It may be that I bought the wrong product, but perhaps there is a configuration change which will help make up some of the difference without sacrificing data reliability. If this is 165MB application rate consider that ZFS sends that much to each side of the mirror. Your data channel rate was 330MB/sec. -r Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS write throttling
Hi everyone, This is my first post to zfs-discuss, so be gentle with me :-) I've been doing some testing with ZFS - in particular, in checkpointing the large, proprietary in-memory database which is a key part of the application I work on. In doing this I've found what seems to be some fairly unhelpful write throttling behaviour from ZFS. In summary, the environment is: * An x4600 with 8 CPUs and 128GBytes of memory * A 50GByte in-memory database * A big, fast disk array (a 6140 with a LUN comprised of 4 SATA drives) * Running Solaris 10 update 4 (problems initially seen on U3 so I got it patched) The problems happen when I checkpoint the database, which involves putting that database on disk as quickly as possible, using the write(2) system call. The first time the checkpoint is run, it's quick - about 160MBytes/sec, even though the disk array is only sustaining 80MBytes/sec. So we're dirtying stuff in the ARC (and growing the ARC) at a pretty impressive rate. After letting the IO subside, running the checkpoint again results in very different behaviour. It starts running very quickly, again at 160MByte/sec (with the underlying device doing 80MBytes/sec), and after a while (presumably once the ARC is full) things go badly wrong. In particular, a write(2) system call hangs for 6-7 minutes, apparently until all the outstanding IO is done. Any reads from that device also take a huge amount of time, making the box very unresponsive. Obviously this isn't good behaviour, but it's particularly unfortunate given that this checkpoint is stuff that I don't want to retain in any kind of cache anyway - in fact, preferably I wouldn't pollute the ARC with it in the first place. But it seems directio(3C) doesn't work with ZFS (unsurprisingly as I guess this is implemented in segmap), and madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I guess, as it's working on segmap/segvn). Of course, limiting the ARC size to something fairly small makes it behave much better. But this isn't really the answer. I also tried using O_DSYNC, which stops the pathological behaviour but makes things pretty slow - I only get a maximum of about 20MBytes/sec, which is obviously much less than the hardware can sustain. It sounds like we could do with different write throttling behaviour to head this sort of thing off. Of course, the ideal would be to have some way of telling ZFS not to bother keeping pages in the ARC. The latter appears to be bug 6429855. But the underlying behaviour doesn't really seem desirable; are there plans afoot to do any work on ZFS write throttling to address this kind of thing? Regards, -- Philip Beevers Fidessa Infrastructure Development mailto:[EMAIL PROTECTED] phone: +44 1483 206571 This message is intended only for the stated addressee(s) and may be confidential. Access to this email by anyone else is unauthorised. Any opinions expressed in this email do not necessarily reflect the opinions of Fidessa. Any unauthorised disclosure, use or dissemination, either whole or in part is prohibited. If you are not the intended recipient of this message, please notify the sender immediately. Fidessa plc - Registered office: Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom Registered in England no. 3781700 VAT registration no. 688 9008 78 Fidessa group plc - Registered Office: Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom Registered in England no. 3234176 VAT registration no. 688 9008 78 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS taking up to 80 seconds to flush a single 8KB O_SYNC block.
Le 10 févr. 08 à 12:51, Robert Milkowski a écrit : Hello Nathan, Thursday, February 7, 2008, 6:54:39 AM, you wrote: NK For kicks, I disabled the ZIL: zil_disable/W0t1, and that made not a NK pinch of difference. :) Have you exported and them imported pool to get zil_disable into effect? I don't think export/import is required. -r -- Best regards, Robert Milkowskimailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write throttling
Le 15 févr. 08 à 11:38, Philip Beevers a écrit : Hi everyone, This is my first post to zfs-discuss, so be gentle with me :-) I've been doing some testing with ZFS - in particular, in checkpointing the large, proprietary in-memory database which is a key part of the application I work on. In doing this I've found what seems to be some fairly unhelpful write throttling behaviour from ZFS. In summary, the environment is: * An x4600 with 8 CPUs and 128GBytes of memory * A 50GByte in-memory database * A big, fast disk array (a 6140 with a LUN comprised of 4 SATA drives) * Running Solaris 10 update 4 (problems initially seen on U3 so I got it patched) The problems happen when I checkpoint the database, which involves putting that database on disk as quickly as possible, using the write(2) system call. The first time the checkpoint is run, it's quick - about 160MBytes/ sec, even though the disk array is only sustaining 80MBytes/sec. So we're dirtying stuff in the ARC (and growing the ARC) at a pretty impressive rate. After letting the IO subside, running the checkpoint again results in very different behaviour. It starts running very quickly, again at 160MByte/sec (with the underlying device doing 80MBytes/sec), and after a while (presumably once the ARC is full) things go badly wrong. In particular, a write(2) system call hangs for 6-7 minutes, apparently until all the outstanding IO is done. Any reads from that device also take a huge amount of time, making the box very unresponsive. Obviously this isn't good behaviour, but it's particularly unfortunate given that this checkpoint is stuff that I don't want to retain in any kind of cache anyway - in fact, preferably I wouldn't pollute the ARC with it in the first place. But it seems directio(3C) doesn't work with ZFS (unsurprisingly as I guess this is implemented in segmap), and madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I guess, as it's working on segmap/segvn). Of course, limiting the ARC size to something fairly small makes it behave much better. But this isn't really the answer. I also tried using O_DSYNC, which stops the pathological behaviour but makes things pretty slow - I only get a maximum of about 20MBytes/sec, which is obviously much less than the hardware can sustain. It sounds like we could do with different write throttling behaviour to head this sort of thing off. Of course, the ideal would be to have some way of telling ZFS not to bother keeping pages in the ARC. The latter appears to be bug 6429855. But the underlying behaviour doesn't really seem desirable; are there plans afoot to do any work on ZFS write throttling to address this kind of thing? Throttling is being addressed. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 BTW, the new code will adjust write speed to disk speed very quickly. You will not see those ultra fast initial checkpoints. Is this a concern ? -r Regards, -- Philip Beevers Fidessa Infrastructure Development mailto:[EMAIL PROTECTED] phone: +44 1483 206571 This message is intended only for the stated addressee(s) and may be confidential. Access to this email by anyone else is unauthorised. Any opinions expressed in this email do not necessarily reflect the opinions of Fidessa. Any unauthorised disclosure, use or dissemination, either whole or in part is prohibited. If you are not the intended recipient of this message, please notify the sender immediately. Fidessa plc - Registered office: Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom Registered in England no. 3781700 VAT registration no. 688 9008 78 Fidessa group plc - Registered Office: Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom Registered in England no. 3234176 VAT registration no. 688 9008 78 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write throttling
Hi Roch, Thanks for the response. Throttling is being addressed. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 BTW, the new code will adjust write speed to disk speed very quickly. You will not see those ultra fast initial checkpoints. Is this a concern ? That's good news. No, the loss of initial performance isn't a big problem - I'd be happy for it to go at spindle speed. Regards, -- Philip Beevers Fidessa Infrastructure Development mailto:[EMAIL PROTECTED] phone: +44 1483 206571 This message is intended only for the stated addressee(s) and may be confidential. Access to this email by anyone else is unauthorised. Any opinions expressed in this email do not necessarily reflect the opinions of Fidessa. Any unauthorised disclosure, use or dissemination, either whole or in part is prohibited. If you are not the intended recipient of this message, please notify the sender immediately. Fidessa plc - Registered office: Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom Registered in England no. 3781700 VAT registration no. 688 9008 78 Fidessa group plc - Registered Office: Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom Registered in England no. 3234176 VAT registration no. 688 9008 78 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help with the best layout
I thought that too, but actually, I'm not sure you can. You can stripe multiple mirror or raid sets with zpool create, but I don't see any documentation or examples for mirroring a raid set. However, in this case even if you could, you might not want to. Creating a stripe that way will restrict the speed of the IDE drives: They will be throttled back to the speed of the USB disk. What I would do instead is create a mirror of the two IDE drives, then use zfs send/receive to send regular backups to the USB drive. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write throttling
On 2/15/08, Roch Bourbonnais [EMAIL PROTECTED] wrote: Le 15 févr. 08 à 11:38, Philip Beevers a écrit : [...] Obviously this isn't good behaviour, but it's particularly unfortunate given that this checkpoint is stuff that I don't want to retain in any kind of cache anyway - in fact, preferably I wouldn't pollute the ARC with it in the first place. But it seems directio(3C) doesn't work with ZFS (unsurprisingly as I guess this is implemented in segmap), and madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I guess, as it's working on segmap/segvn). Of course, limiting the ARC size to something fairly small makes it behave much better. But this isn't really the answer. I also tried using O_DSYNC, which stops the pathological behaviour but makes things pretty slow - I only get a maximum of about 20MBytes/sec, which is obviously much less than the hardware can sustain. It sounds like we could do with different write throttling behaviour to head this sort of thing off. Of course, the ideal would be to have some way of telling ZFS not to bother keeping pages in the ARC. The latter appears to be bug 6429855. But the underlying behaviour doesn't really seem desirable; are there plans afoot to do any work on ZFS write throttling to address this kind of thing? Throttling is being addressed. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 BTW, the new code will adjust write speed to disk speed very quickly. You will not see those ultra fast initial checkpoints. Is this a concern ? I'll wait for more details on how you address this. Maybe a blog? like this one: http://blogs.technet.com/markrussinovich/archive/2008/02/04/2826167.aspx Inside Vista SP1 File Copy Improvements :- One of the biggest problems with the engine's implementation is that for copies involving lots of data, the Cache Manager write-behind thread on the target system often can't keep up with the rate at which data is written and cached in memory. That causes the data to fill up memory, possibly forcing other useful code and data out, and eventually, the target's system's memory to become a tunnel through which all the copied data flows at a rate limited by the disk. Sounds familiar? ;-) Tao ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] Preventing zpool imports on boot
On Thu, Feb 14, 2008 at 11:17 PM, Dave [EMAIL PROTECTED] wrote: I don't want Solaris to import any pools at bootup, even when there were pools imported at shutdown/at crash time. The process to prevent importing pools should be automatic and not require any human intervention. I want to *always* import the pools manually. Hrm... what if I deleted zpool.cache after importing/exporting any pool? Are these the only times zpool.cache is created? I wish zpools had a property of 'atboot' or similar, so that you could mark a zpool to be imported at boot or not. Like this? temporary By default, all pools are persistent and are automati- cally opened when the system is rebooted. Setting this boolean property to on causes the pool to exist only while the system is up. If the system is rebooted, the pool has to be manually imported by using the zpool import command. Setting this property is often useful when using pools on removable media, where the devices may not be present when the system reboots. This pro- perty can also be referred to by its shortened column name, temp. (I am trying to move this thread over to zfs-discuss, since I originally posted to the wrong alias) storage-discuss trimmed in my reply. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help with the best layout
Ross wrote: I thought that too, but actually, I'm not sure you can. You can stripe multiple mirror or raid sets with zpool create, but I don't see any documentation or examples for mirroring a raid set. Split the USB disk in half, then mirror each IDE disk to a USB disk half. However, in this case even if you could, you might not want to. Creating a stripe that way will restrict the speed of the IDE drives: They will be throttled back to the speed of the USB disk. You should be able to get 30 MBytes/s or so to/from a USB disk. For most general purpose use, this is ok. What I would do instead is create a mirror of the two IDE drives, then use zfs send/receive to send regular backups to the USB drive. Yes, this is a safer and longer-term view. I store my USB drives in fire safes, one at my house, one someplace else. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Nathan Kroenert wrote: And something I was told only recently - It makes a difference if you created the file *before* you set the recordsize property. If you created them after, then no worries, but if I understand correctly, if the *file* was created with 128K recordsize, then it'll keep that forever... Assuming I understand correctly. Hopefully someone else on the list will be able to confirm. Yes, that is correct. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to set ZFS metadata copies=3?
Let's say you are paranoid and have built a pool with 40+ disks in a Thumper. Is there a way to set metadata copies=3 manually? After having built RAIDZ2 sets with 7-9 disks and then pooled these together, it just seems like a little bit of extra insurance to increase metadata copies. I don't see a need for extra data copies which is currently the only trigger I see for that. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write throttling
On Fri, 15 Feb 2008, Roch Bourbonnais wrote: The latter appears to be bug 6429855. But the underlying behaviour doesn't really seem desirable; are there plans afoot to do any work on ZFS write throttling to address this kind of thing? Throttling is being addressed. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 I have observed similar behavior when using 'iozone' on a large file to benchmark ZFS on my StorageTek 2540 array. Fsstat shows gaps of up to 30 seconds of no I/O when run on a 10 second update cycle but when I go to look at the lights on the array, I see that it is actually fully busy. It seems that the application is stalled during this load. It also seems that simple operations like 'ls' get stalled under such heavy load. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
Le 15 févr. 08 à 18:24, Bob Friesenhahn a écrit : On Fri, 15 Feb 2008, Roch Bourbonnais wrote: As mentioned before, the write rate peaked at 200MB/second using RAID-0 across 12 disks exported as one big LUN. What was the interlace on the LUN ? The question was about LUN interlace not interface. 128K to 1M works better. There are two 4Gbit FC interfaces on an Emulex LPe11002 card which are supposedly acting in a load-share configuration. If this is 165MB application rate consider that ZFS sends that much to each side of the mirror. Your data channel rate was 330MB/sec. Yes, I am aware of the ZFS RAID write penalty but in fact it has only cost 20MB per second vs doing the RAID using controller firmware (150MB vs 170MB/second). This indicates that there is plenty of communications bandwidth from the host to the array. The measured read rates are in the 470MB to 510MB/second range. Any compression ? Does turn off checksum helps the number (that would point to a CPU limited throughput). -r While writing, it is clear that ZFS does not use all of the drives for writes at once since the drive LEDs show that some remain temporarily idle and ZFS cycles through them. I would be very happy to hear from other StorageTek 2540 owners as to the write rate they were able to achieve. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, 15 Feb 2008, Roch Bourbonnais wrote: What was the interlace on the LUN ? The question was about LUN interlace not interface. 128K to 1M works better. The segment size is set to 128K. The max the 2540 allows is 512K. Unfortunately, the StorageTek 2540 and CAM documentation does not really define what segment size means. Any compression ? Compression is disabled. Does turn off checksum helps the number (that would point to a CPU limited throughput). I have not tried that but this system is loafing during the benchmark. It has four 3GHz Opteron cores. Does this output from 'iostat -xnz 20' help to understand issues? extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 3.00.7 26.43.5 0.0 0.00.04.2 0 2 c1t1d0 0.0 154.20.0 19680.3 0.0 20.70.0 134.2 0 59 c4t600A0B80003A8A0B096147B451BEd0 0.0 211.50.0 26940.5 1.1 33.95.0 160.5 99 100 c4t600A0B800039C9B50A9C47B4522Dd0 0.0 211.50.0 26940.6 1.1 33.95.0 160.4 99 100 c4t600A0B800039C9B50AA047B4529Bd0 0.0 154.00.0 19654.7 0.0 20.70.0 134.2 0 59 c4t600A0B80003A8A0B096647B453CEd0 0.0 211.30.0 26915.0 1.1 33.95.0 160.5 99 100 c4t600A0B800039C9B50AA447B4544Fd0 0.0 152.40.0 19447.0 0.0 20.50.0 134.5 0 59 c4t600A0B80003A8A0B096A47B4559Ed0 0.0 213.20.0 27183.8 0.9 34.14.2 159.9 90 100 c4t600A0B800039C9B50AA847B45605d0 0.0 152.50.0 19453.4 0.0 20.50.0 134.5 0 59 c4t600A0B80003A8A0B096E47B456DAd0 0.0 213.20.0 27177.4 0.9 34.14.2 159.9 90 100 c4t600A0B800039C9B50AAC47B45739d0 0.0 213.20.0 27195.3 0.9 34.14.2 159.9 90 100 c4t600A0B800039C9B50AB047B457ADd0 0.0 154.40.0 19711.8 0.0 20.70.0 134.0 0 59 c4t600A0B80003A8A0B097347B457D4d0 0.0 211.30.0 26958.6 1.1 33.95.0 160.6 99 100 c4t600A0B800039C9B50AB447B4595Fd0 Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, Feb 15, 2008 at 12:30 AM, Bob Friesenhahn [EMAIL PROTECTED] wrote: Under Solaris 10 on a 4 core Sun Ultra 40 with 20GB RAM, I am setting up a Sun StorageTek 2540 with 12 300GB 15K RPM SAS drives and connected via load-shared 4Gbit FC links. This week I have tried many different configurations, using firmware managed RAID, ZFS managed RAID, and with the controller cache enabled or disabled. My objective is to obtain the best single-file write performance. Unfortunately, I am hitting some sort of write bottleneck and I am not sure how to solve it. I was hoping for a write speed of 300MB/second. With ZFS on top of a firmware managed RAID 0 across all 12 drives, I hit a peak of 200MB/second. With each drive exported as a LUN and a ZFS pool of 6 pairs, I see a write rate of 154MB/second. The number of drives used has not had much effect on write rate. May not be relevant, but still worth checking - I have a 2530 (which ought to be that same only SAS instead of FC), and got fairly poor performance at first. Things improved significantly when I got the LUNs properly balanced across the controllers. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write throttling
[EMAIL PROTECTED] said: I also tried using O_DSYNC, which stops the pathological behaviour but makes things pretty slow - I only get a maximum of about 20MBytes/sec, which is obviously much less than the hardware can sustain. I may misunderstand this situation, but while you're waiting for the new code from Sun, you might try O_DSYNC and at the same time tell the 6140 to ignore cache-flush requests from the host. That should get you running at spindle-speed: http://blogs.digitar.com/jjww/?itemid=44 Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, 15 Feb 2008, Peter Tribble wrote: Each LUN is accessed through only one of the controllers (I presume the 2540 works the same way as the 2530 and 61X0 arrays). The paths are active/passive (if the active fails it will relocate to the other path). When I set mine up the first time it allocated all the LUNs to controller B and performance was terrible. I then manually transferred half the LUNs to controller A and it started to fly. I assume that you either altered the Access State shown for the LUN in the output of 'mpathadm show lu DEVICE' or you noticed and observed the pattern: Target Port Groups: ID: 3 Explicit Failover: yes Access State: active Target Ports: Name: 200400a0b83a8a0c Relative ID: 0 ID: 2 Explicit Failover: yes Access State: standby Target Ports: Name: 200500a0b83a8a0c Relative ID: 0 I find this all very interesting and illuminating: for dev in c4t600A0B80003A8A0B096A47B4559Ed0 \ c4t600A0B80003A8A0B096E47B456DAd0 \ c4t600A0B80003A8A0B096147B451BEd0 \ c4t600A0B80003A8A0B096647B453CEd0 \ c4t600A0B80003A8A0B097347B457D4d0 \ c4t600A0B800039C9B50A9C47B4522Dd0 \ c4t600A0B800039C9B50AA047B4529Bd0 \ c4t600A0B800039C9B50AA447B4544Fd0 \ c4t600A0B800039C9B50AA847B45605d0 \ c4t600A0B800039C9B50AAC47B45739d0 \ c4t600A0B800039C9B50AB047B457ADd0 \ c4t600A0B800039C9B50AB447B4595Fd0 \ do echo === $dev === for mpathadm show lu /dev/rdsk/$dev | grep 'Access State' for done === c4t600A0B80003A8A0B096A47B4559Ed0 === Access State: active Access State: standby === c4t600A0B80003A8A0B096E47B456DAd0 === Access State: active Access State: standby === c4t600A0B80003A8A0B096147B451BEd0 === Access State: active Access State: standby === c4t600A0B80003A8A0B096647B453CEd0 === Access State: active Access State: standby === c4t600A0B80003A8A0B097347B457D4d0 === Access State: active Access State: standby === c4t600A0B800039C9B50A9C47B4522Dd0 === Access State: active Access State: standby === c4t600A0B800039C9B50AA047B4529Bd0 === Access State: standby Access State: active === c4t600A0B800039C9B50AA447B4544Fd0 === Access State: standby Access State: active === c4t600A0B800039C9B50AA847B45605d0 === Access State: standby Access State: active === c4t600A0B800039C9B50AAC47B45739d0 === Access State: standby Access State: active === c4t600A0B800039C9B50AB047B457ADd0 === Access State: standby Access State: active === c4t600A0B800039C9B50AB447B4595Fd0 === Access State: standby Access State: active Notice that the first six LUNs are active to one controller while the second six LUNs are active to the other controller. Based on this, I should rebuild my pool by splitting my mirrors across this boundary. I am really happy that ZFS makes such things easy to try out. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, 15 Feb 2008, Peter Tribble wrote: May not be relevant, but still worth checking - I have a 2530 (which ought to be that same only SAS instead of FC), and got fairly poor performance at first. Things improved significantly when I got the LUNs properly balanced across the controllers. What do you mean by properly balanced across the controllers? Are you using the multipath support in Solaris 10 or are you relying on ZFS to balance the I/O load? Do some disks have more affinity for a controller than the other? With the 2540, there is a FC connection to each redundant controller. The Solaris 10 multipathing presumably load-shares the I/O to each controller. The controllers then perform some sort of magic to get the data to and from the SAS drives. The controller stats are below. I notice that it seems that controller B has seen a bit more activity than controller A but the firmware does not provide a controller uptime value so it is possible that one controller was up longer than another: Performance Statistics - A on Storage System Array-1 Timestamp: Fri Feb 15 14:37:39 CST 2008 Total IOPS: 1098.83 Average IOPS: 355.83 Read %: 38.28 Write %:61.71 Total Data Transferred: 139284.41 KBps Read: 53844.26 KBps Average Read: 17224.04 KBps Peak Read: 242232.70 KBps Written:85440.15 KBps Average Written:26966.58 KBps Peak Written: 139918.90 KBps Average Read Size: 639.96 KB Average Write Size: 629.94 KB Cache Hit %:85.32 Performance Statistics - B on Storage System Array-1 Timestamp: Fri Feb 15 14:37:45 CST 2008 Total IOPS: 1526.69 Average IOPS: 497.32 Read %: 34.90 Write %:65.09 Total Data Transferred: 193594.58 KBps Read: 68200.00 KBps Average Read: 24052.61 KBps Peak Read: 339693.55 KBps Written:125394.58 KBps Average Written:37768.40 KBps Peak Written: 183534.66 KBps Average Read Size: 895.80 KB Average Write Size: 883.38 KB Cache Hit %:75.05 If I then go to the performance stats on an individual disk, I see Performance Statistics - Disk-08 on Storage System Array-1 Timestamp: Fri Feb 15 14:43:36 CST 2008 Total IOPS: 196.33 Average IOPS: 72.01 Read %: 9.65 Write %:90.34 Total Data Transferred: 25076.91 KBps Read: 2414.11 KBps Average Read: 3521.44 KBps Peak Read: 48422.00 KBps Written:22662.79 KBps Average Written:5423.78 KBps Peak Written: 28036.43 KBps Average Read Size: 127.29 KB Average Write Size: 127.77 KB Cache Hit %:89.30 Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
Hi Bob, I¹m assuming you¹re measuring sequential write speed posting the iozone results would help guide the discussion. For the configuration you describe, you should definitely be able to sustain 200 MB/s write speed for a single file, single thread due to your use of 4Gbps Fibre Channel interfaces and RAID1. Someone else brought up that with host based mirroring over that interface you will be sending the data twice over the FC-AL link, so since you only have 400 MB/s on the FC-AL interface (load balancing will only work for two writes), then you have to divide that by two. If you do the mirroring on the RAID hardware you¹ll get double that speed on writing, or 400MB/s and the bottleneck is still the single FC-AL interface. By comparison, we get 750 MB/s sequential read using six 15K RPM 300GB disks on an adaptec (Sun OEM) in-host SAS RAID adapter in RAID10 on four streams and I think I saw 350 MB/s write speed on one stream. Each disk is capable of 130 MB/s of read and write speed. - Luke On 2/15/08 10:39 AM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Fri, 15 Feb 2008, Roch Bourbonnais wrote: What was the interlace on the LUN ? The question was about LUN interlace not interface. 128K to 1M works better. The segment size is set to 128K. The max the 2540 allows is 512K. Unfortunately, the StorageTek 2540 and CAM documentation does not really define what segment size means. Any compression ? Compression is disabled. Does turn off checksum helps the number (that would point to a CPU limited throughput). I have not tried that but this system is loafing during the benchmark. It has four 3GHz Opteron cores. Does this output from 'iostat -xnz 20' help to understand issues? extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 3.00.7 26.43.5 0.0 0.00.04.2 0 2 c1t1d0 0.0 154.20.0 19680.3 0.0 20.70.0 134.2 0 59 c4t600A0B80003A8A0B096147B451BEd0 0.0 211.50.0 26940.5 1.1 33.95.0 160.5 99 100 c4t600A0B800039C9B50A9C47B4522Dd0 0.0 211.50.0 26940.6 1.1 33.95.0 160.4 99 100 c4t600A0B800039C9B50AA047B4529Bd0 0.0 154.00.0 19654.7 0.0 20.70.0 134.2 0 59 c4t600A0B80003A8A0B096647B453CEd0 0.0 211.30.0 26915.0 1.1 33.95.0 160.5 99 100 c4t600A0B800039C9B50AA447B4544Fd0 0.0 152.40.0 19447.0 0.0 20.50.0 134.5 0 59 c4t600A0B80003A8A0B096A47B4559Ed0 0.0 213.20.0 27183.8 0.9 34.14.2 159.9 90 100 c4t600A0B800039C9B50AA847B45605d0 0.0 152.50.0 19453.4 0.0 20.50.0 134.5 0 59 c4t600A0B80003A8A0B096E47B456DAd0 0.0 213.20.0 27177.4 0.9 34.14.2 159.9 90 100 c4t600A0B800039C9B50AAC47B45739d0 0.0 213.20.0 27195.3 0.9 34.14.2 159.9 90 100 c4t600A0B800039C9B50AB047B457ADd0 0.0 154.40.0 19711.8 0.0 20.70.0 134.0 0 59 c4t600A0B80003A8A0B097347B457D4d0 0.0 211.30.0 26958.6 1.1 33.95.0 160.6 99 100 c4t600A0B800039C9B50AB447B4595Fd0 Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, 15 Feb 2008, Luke Lonergan wrote: I only managed to get 200 MB/s write when I did RAID 0 across all drives using the 2540's RAID controller and with ZFS on top. Ridiculously bad. I agree. :-( While I agree that data is sent twice (actually up to 8X if striping across four mirrors) Still only twice the data that would otherwise be sent, in other words: the mirroring causes a duplicate set of data to be written. Right. But more little bits of data to be sent due to ZFS striping. Given that you're not even saturating the FC-AL links, the problem is in the hardware RAID. I suggest disabling read and write caching in the hardware RAID. Hardware RAID is not an issue in this case since each disk is exported as a LUN. Performance with ZFS is not much different than when hardware RAID was used. I previously tried disabling caching in the hardware and it did not make a difference in the results. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, Feb 15, 2008 at 09:00:05PM +, Peter Tribble wrote: On Fri, Feb 15, 2008 at 8:50 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Fri, 15 Feb 2008, Peter Tribble wrote: May not be relevant, but still worth checking - I have a 2530 (which ought to be that same only SAS instead of FC), and got fairly poor performance at first. Things improved significantly when I got the LUNs properly balanced across the controllers. What do you mean by properly balanced across the controllers? Are you using the multipath support in Solaris 10 or are you relying on ZFS to balance the I/O load? Do some disks have more affinity for a controller than the other? Each LUN is accessed through only one of the controllers (I presume the 2540 works the same way as the 2530 and 61X0 arrays). The paths are active/passive (if the active fails it will relocate to the other path). When I set mine up the first time it allocated all the LUNs to controller B and performance was terrible. I then manually transferred half the LUNs to controller A and it started to fly. http://groups.google.com/group/comp.unix.solaris/browse_frm/thread/59b43034602a7b7f/0b500afc4d62d434?lnk=stq=#0b500afc4d62d434 -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Nathan Kroenert wrote: And something I was told only recently - It makes a difference if you created the file *before* you set the recordsize property. Actually, it has always been true for RAID-0, RAID-5, RAID-6. If your I/O strides over two sets then you end up doing more I/O, perhaps twice as much. If you created them after, then no worries, but if I understand correctly, if the *file* was created with 128K recordsize, then it'll keep that forever... Files have nothing to do with it. The recordsize is a file system parameter. It gets a little more complicated because the recordsize is actually the maximum recordsize, not the minimum. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, 15 Feb 2008, Bob Friesenhahn wrote: Notice that the first six LUNs are active to one controller while the second six LUNs are active to the other controller. Based on this, I should rebuild my pool by splitting my mirrors across this boundary. I am really happy that ZFS makes such things easy to try out. Now that I have tried this out, I can unhappily say that it made no measurable difference to actual performance. However it seems like a better layout anyway. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, Feb 15, 2008 at 8:50 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Fri, 15 Feb 2008, Peter Tribble wrote: May not be relevant, but still worth checking - I have a 2530 (which ought to be that same only SAS instead of FC), and got fairly poor performance at first. Things improved significantly when I got the LUNs properly balanced across the controllers. What do you mean by properly balanced across the controllers? Are you using the multipath support in Solaris 10 or are you relying on ZFS to balance the I/O load? Do some disks have more affinity for a controller than the other? Each LUN is accessed through only one of the controllers (I presume the 2540 works the same way as the 2530 and 61X0 arrays). The paths are active/passive (if the active fails it will relocate to the other path). When I set mine up the first time it allocated all the LUNs to controller B and performance was terrible. I then manually transferred half the LUNs to controller A and it started to fly. I'm using SAS multipathing for failover and just get ZFS to dynamically stripe across the LUNs. Your figures show asymmetry, but that may just be a reflection of the setup where you just created a single raid-0 LUN which would only use one path. (I don't really understand any of this stuff. Too much fiddling around for my liking.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
Bob Friesenhahn wrote: On Fri, 15 Feb 2008, Luke Lonergan wrote: I only managed to get 200 MB/s write when I did RAID 0 across all drives using the 2540's RAID controller and with ZFS on top. Ridiculously bad. I agree. :-( While I agree that data is sent twice (actually up to 8X if striping across four mirrors) Still only twice the data that would otherwise be sent, in other words: the mirroring causes a duplicate set of data to be written. Right. But more little bits of data to be sent due to ZFS striping. These little bits should be 128kBytes by default, which should be plenty to saturate the paths. There seems to be something else going on here... from the iostat data: extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device ... 0.0 211.50.0 26940.5 1.1 33.95.0 160.5 99 100 c4t600A0B800039C9B50A9C47B4522Dd0 0.0 211.50.0 26940.6 1.1 33.95.0 160.4 99 100 c4t600A0B800039C9B50AA047B4529Bd0 0.0 154.00.0 19654.7 0.0 20.70.0 134.2 0 59 c4t600A0B80003A8A0B096647B453CEd0 ... shows that we have an average of 33.9 iops of 128kBytes each queued to the storage device at a given time. There is an iop queued to the storage device at all times (100% busy). The 59% busy device might not always be 59% busy, but it is difficult to see from this output because you used the z flag. Looks to me like ZFS is keeping the queues full, and the device is slow to service them (asvc_t). This is surprising, to a degree, because we would expect faster throughput to a nonvolatile write cache. It would be interesting to see the response for a stable idle system, start the workload, see the fast response as we hit the write cache, followed by the slowdown as we fill the write cache. This sort of experiment is usually easy to create. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, 15 Feb 2008, Albert Chin wrote: http://groups.google.com/group/comp.unix.solaris/browse_frm/thread/59b43034602a7b7f/0b500afc4d62d434?lnk=stq=#0b500afc4d62d434 This is really discouraging. Based on these newsgroup postings I am thinking that the Sun StorageTek 2540 was not a good investment for me, especially given that the $23K for it came right out of my own paycheck and it took me 6 months of frustration (first shipment was damaged) to receive it. Regardless, this was the best I was able to afford unless I built the drive array myself. The page at http://www.sun.com/storagetek/disk_systems/workgroup/2540/benchmarks.jsp claims 546.22 MBPS for the large file processing benchmark. So I go to look at the actual SPC2 full disclosure report and see that for one stream, the average data rate is 105MB/second (compared with 102MB/second with RAID-5), and rises to 284MB/second with 10 streams. The product obviously performs much better for reads than it does for writes and is better for multi-user performance than single-user. It seems like I am getting a good bit more performance from my own setup than what the official benchmark suggests (they used 72MB drives, with 24-drives total) so it seems that everything is working fine. This is a lesson for me, and I have certainly learned a fair amount about drive arrays, fiber channel, and ZFS, in the process. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes
If you created them after, then no worries, but if I understand correctly, if the *file* was created with 128K recordsize, then it'll keep that forever... Files have nothing to do with it. The recordsize is a file system parameter. It gets a little more complicated because the recordsize is actually the maximum recordsize, not the minimum. Please read the manpage: Changing the file system's recordsize only affects files created afterward; existing files are unaffected. Nothing is rewritten in the file system when you change recordsize so is stays the same for existing files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
The segment size is amount of contiguous space that each drive contributes to a single stripe. So if you have a 5 drive RAID-5 set @ 128k segment size, a single stripe = (5-1)*128k = 512k BTW, Did you tweak the cache sync handling on the array? -Joel This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Cannot do simultaneous read/write to ZFS over smb.
Me again, Thanks for all the previous help my 10 disc RAIDz2 is running mostly great. Just ran into a problem though: I have the RAIDz2 partition mounted to OS X via smb and I can upload OR download data to it just fine, however if I start an upload then start a download the upload fails and stops, iostat reports all write bandwidth - 0 and read bandwidth goes up: capacity operationsbandwidth pool used avail read write read write pile 304G 4.23T 30200 114K 20.7M pile 305G 4.23T 0384255 44.1M pile 305G 4.23T 0343 0 42.7M pile 305G 4.23T 0 32 1022 949K pile 305G 4.23T201347 25.0M 40.1M pile 305G 4.23T271 0 33.6M 0 As you can see, I was writing at 42.7MB/s then bandwidth went to essentially nothing, started to try to do 25/40MB/s R/W (which failed) and went over to 33MB/s read. Anybody encounter this problem before, I tried to google around and search here on sumul. read/write but nothing came up. Sam This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SunMC module for ZFS
Anyone have a pointer to a general ZFS health/monitoring module for SunMC? There isn't one baked into SunMC proper which means I get to write one myself if someone hasn't already done it. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes
What about new blocks written to an existing file? Perhaps we could make that clearer in the manpage too... hm. Mattias Pantzare wrote: If you created them after, then no worries, but if I understand correctly, if the *file* was created with 128K recordsize, then it'll keep that forever... Files have nothing to do with it. The recordsize is a file system parameter. It gets a little more complicated because the recordsize is actually the maximum recordsize, not the minimum. Please read the manpage: Changing the file system's recordsize only affects files created afterward; existing files are unaffected. Nothing is rewritten in the file system when you change recordsize so is stays the same for existing files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes
Hey, Richard - I'm confused now. My understanding was that any files created after the recordsize was set would use that as the new maximum recordsize, but files already created would continue to use the old recordsize. Though I'm now a little hazy on what will happen when the new existing files are updated as well... hm. Cheers! Nathan. Richard Elling wrote: Nathan Kroenert wrote: And something I was told only recently - It makes a difference if you created the file *before* you set the recordsize property. Actually, it has always been true for RAID-0, RAID-5, RAID-6. If your I/O strides over two sets then you end up doing more I/O, perhaps twice as much. If you created them after, then no worries, but if I understand correctly, if the *file* was created with 128K recordsize, then it'll keep that forever... Files have nothing to do with it. The recordsize is a file system parameter. It gets a little more complicated because the recordsize is actually the maximum recordsize, not the minimum. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] Preventing zpool imports on boot
Mike Gerdts wrote: On Feb 15, 2008 2:31 PM, Dave [EMAIL PROTECTED] wrote: This is exactly what I want - Thanks! This isn't in the man pages for zfs or zpool in b81. Any idea when this feature was integrated? Interesting... it is in b76. I checked several other releases both before and after and they didn't have it either. Perhaps it is not part of the committed interface. I stumbled upon it because I thought that I remembered zpool import -R / poolname having the behavior you were looking for. The rather consistent documentation for zpool import -R mentioned the temporary attribute. We actually changed this to make it more robust. Now the property is called 'cachefile' and you can set it to 'none' if you want it to behave like the older 'temporary' property. - George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to set ZFS metadata copies=3?
Vincent Fox wrote: Let's say you are paranoid and have built a pool with 40+ disks in a Thumper. Is there a way to set metadata copies=3 manually? After having built RAIDZ2 sets with 7-9 disks and then pooled these together, it just seems like a little bit of extra insurance to increase metadata copies. I don't see a need for extra data copies which is currently the only trigger I see for that. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS already does something like this for metadata by setting either 2 or 3 copies based on the metadata type. Take a look at dmu_get_replication_level(). - George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 'du' is not accurate on zfs
I have a script which generates a file and then immediately uses 'du -h' to obtain its size. With Solaris 10 I notice that this often returns an incorrect value of '0' as if ZFS is lazy about reporting actual disk use. Meanwhile, 'ls -l' does report the correct size. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss