Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Hello Frank, Tuesday, September 12, 2006, 9:41:05 PM, you wrote: FC> It would be interesting to have a zfs enabled HBA to offload the checksum FC> and parity calculations. How much of zfs would such an HBA have to FC> understand? That won't be end-to-end checksuming anymore, right? That way you can disable ZFS checksuming at all and base only on HW RAID. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
>We're back into the old argument of "put it on a co-processor, then move >it onto the CPU, then move it back onto a co-processor" cycle. >Personally, with modern CPUs being so under-utilized these days, and all >ZFS-bound data having to move through main memory in any case (whether >hardware checksum-assisted or not), use the CPU. Hardware-assist for >checksum sounds nice, but I can't think of it actually being more >efficient that doing it on the CPU (it won't actually help performance), >so why bother with extra hardware? Plus it moves part of the resiliency away from where we knew the data was good (the CPU/computer) across a bus/fabric/whatnot possibly causing checksums to be computed over incorrect data. We already see that with IP checksuming off-loading and broken hardware and broken VLAN switches recomputing the ethernet CRC. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
James C. McPherson wrote: Richard Elling wrote: Frank Cusack wrote: It would be interesting to have a zfs enabled HBA to offload the checksum and parity calculations. How much of zfs would such an HBA have to understand? [warning: chum] Disagree. HBAs are pretty wimpy. It is much less expensive and more efficient to move that (flexible!) function into the main CPUs. I think Richard is in the groove here. All the hba chip implementation documentation that I've seen (publicly available of course) indicates that these chips are already highly optimized engines, and I don't think that adding extra functionality like checksum and parity calculations would be an efficient use of silicon/SoI. cheers, James HBAs work on an entirely different layer than what checksumming data would be efficient at. If we're using the OSI-style model for this type, HBAs work at layer 1. And, as James mentioned, they are highly specialized ASICs for doing just bus-level communications. It's not like there is extra general-purposes compute power available (or, even can possibly be built-in). Checksumming for ZFS requires filesystem-level knowledge, which is effectively up at OSI layer 6 or 7, and well beyond the understanding of a lowly HBA (it's just passing bits back and forth, and has no conception of what they mean). Essentially, moving block checksumming into the HBA would at best be similar to what we see with super-low-cost RAID controllers and the XOR function. Remember how well that works? Now, building ZFS-style checksum capability (or, just hardware checksum capability for ZFS to call) is indeed proper and possible for _real_ hardware RAID controllers, as they are much more akin to standard general-purpose CPUs (indeed, most now use a GP processor anyway). We're back into the old argument of "put it on a co-processor, then move it onto the CPU, then move it back onto a co-processor" cycle. Personally, with modern CPUs being so under-utilized these days, and all ZFS-bound data having to move through main memory in any case (whether hardware checksum-assisted or not), use the CPU. Hardware-assist for checksum sounds nice, but I can't think of it actually being more efficient that doing it on the CPU (it won't actually help performance), so why bother with extra hardware? -Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Richard Elling wrote: Frank Cusack wrote: It would be interesting to have a zfs enabled HBA to offload the checksum and parity calculations. How much of zfs would such an HBA have to understand? [warning: chum] Disagree. HBAs are pretty wimpy. It is much less expensive and more efficient to move that (flexible!) function into the main CPUs. I think Richard is in the groove here. All the hba chip implementation documentation that I've seen (publicly available of course) indicates that these chips are already highly optimized engines, and I don't think that adding extra functionality like checksum and parity calculations would be an efficient use of silicon/SoI. cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Frank Cusack wrote: It would be interesting to have a zfs enabled HBA to offload the checksum and parity calculations. How much of zfs would such an HBA have to understand? [warning: chum] Disagree. HBAs are pretty wimpy. It is much less expensive and more efficient to move that (flexible!) function into the main CPUs. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
On September 12, 2006 11:35:54 AM -0700 UNIX admin <[EMAIL PROTECTED]> wrote: There are also the speed enhancement provided by a HW raid array, and usually RAS too, compared to a native disk drive but the numbers on that are still coming in and being analyzed. (See previous threads.) It would be nice if you would attribute your quotes. Maybe this is a limitation of the web interface? Speed enhancements? What is the baseline of comparison? Hardware RAIDs can be banalized to two features: cache which does data reordering for optimal disk writes and parity calculation which is being offloaded off of the server's CPU. But HW calculations still take time, and the in-between, battery backed cache serves to replace the individual disk caches, because of the traditional file system approach which had to have some assurance that the data made it to disk in one way or another. With ZFS however the in-between cache is obsolete, as individual disk caches can be used directly. I also openly question whether even the dedicated RAID HW is faster than the newest CPUs in modern servers. Unless there is something that I'm missing, I fail to see the benefit of a HW RAID in tandem with ZFS. In my view, this holds especially true when one gets into SAN storage like SE6920, EMC and Hitachi products. I agree with your basic point, that the HW RAID cache is obsoleted by zfs (which seems to be substantiated here by benchmark results), but I think you slightly mischaracterize its use. The speed of the HW RAID CPU is irrelevant; the parity is XOR which is extremely fast with any CPU when compared to disk write speed. What is relevant is, as Anton points out, the CPU cache on the host system. Parity calculations kill the cache and will hurt memory-intensive apps. So in this case, offloading it may help in the ufs case. (Not for zfs, as I understand from reading here, since checksums still have to be done. I would argue that this is *absolutely essential* [and zfs obsoletes all other filesystems] and therefore the gain in the ufs on HW RAID-5 case is worthless due to the correctness tradeoff.) It would be interesting to have a zfs enabled HBA to offload the checksum and parity calculations. How much of zfs would such an HBA have to understand? -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Anton B. Rang writes: > The bigger problem with system utilization for software RAID is the cache, not the CPU cycles proper. Simply preparing to write 1 MB of data will flush half of a 2 MB L2 cache. This hurts overall system performance far more than the few microseconds that XORing the data takes. > With ZFS, on most deployments we'llbring the data into cache for the checksums; so I guess that the raid-z cost will be just incremental. Now would we gain anything at generating ZFS functions for 'checksum+parity', 'checksum+parity+compression' ? -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
On Sep 9, 2006, at 1:32 AM, Frank Cusack wrote: On September 7, 2006 12:25:47 PM -0700 "Anton B. Rang" <[EMAIL PROTECTED]> wrote: The bigger problem with system utilization for software RAID is the cache, not the CPU cycles proper. Simply preparing to write 1 MB of data will flush half of a 2 MB L2 cache. This hurts overall system performance far more than the few microseconds that XORing the data takes. Interesting. So does this in any way invalidate benchmarks recently posted here which showed raidz on jbod to outperform a zfs stripe on HW raid5? No. There are, in fact, two reasons why RAID-Z is likely to outperform hardware RAID 5, at least in certain types of I/O benchmarks. First, RAID-5 requires read-modify-write cycles when full stripes aren't being written; and ZFS tends to issue small and pretty much random I/O (in my experience), which is the worst case for RAID-5. Second, performing RAID on the main CPU is faster, or at least just as fast, as in hardware. There are also cases where hardware RAID 5 will likely outperform ZFS. One is when there is a large RAM cache (which is not being flushed by ZFS -- one issue to be addressed is that the commands ZFS uses to control the write cache on plain disks tend to effectively disable the NVRAM cache on hardware RAID controllers). Another is when the I/O bandwidth being used is near the maximum capacity of the host channel, because doing software RAID requires moving more data over this channel. (If you have sufficient channels to dedicate one per disk, as is the case with SATA, this doesn't come into play.) This is particularly noticeable during reconstruction, since the channels are being used both to transfer data & reconstruct it, where in a hardware RAID-5 box (of moderate cost, at least) they are typically overprovisioned. A third is if the system CPU or memory bandwidth is heavily used by your application; for instance, a database running under heavy load. In this case, the added CPU, cache, and memory bandwidth of software RAID will stress the application. Ultimately, you do want to use your actual application as the benchmark, but certainly generic benchmarks should at least be helpful. They're helpful in measuring what the benchmark measures. ;-) If the benchmark measures how quickly you can get data from host RAM to disk, which is typically the case, it won't tell you anything about how much CPU was used in the process. Real applications, however, often care. There's a reason why we use interrupt-driven controllers, even though you get better performance of the I/O itself with polling. :-) Anton ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
On September 7, 2006 12:25:47 PM -0700 "Anton B. Rang" <[EMAIL PROTECTED]> wrote: The bigger problem with system utilization for software RAID is the cache, not the CPU cycles proper. Simply preparing to write 1 MB of data will flush half of a 2 MB L2 cache. This hurts overall system performance far more than the few microseconds that XORing the data takes. Interesting. So does this in any way invalidate benchmarks recently posted here which showed raidz on jbod to outperform a zfs stripe on HW raid5? (That's my recollection, perhaps it's a mischaracterization or just plain wrong.) I mean, even if raid-z on jbod in a filesystem benchmark is a winner, when you have an actual application with a working set that is more than filesystem data, the benchmark results would be misleading. Ultimately, you do want to use your actual application as the benchmark, but certainly generic benchmarks should at least be helpful. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On September 8, 2006 5:59:47 PM -0700 Richard Elling - PAE <[EMAIL PROTECTED]> wrote: Ed Gould wrote: On Sep 8, 2006, at 11:35, Torrey McMahon wrote: If I read between the lines here I think you're saying that the raid functionality is in the chipset but the management can only be done by software running on the outside. (Right?) No. All that's in the chipset is enough to read a RAID volume for boot. Block layout, RAID-5 parity calculations, and the rest are all done in the software. I wouldn't be surprised if RAID-5 parity checking was absent on read for boot, but I don't actually know. At Sun, we often use the LSI Logic LSISAS1064 series of SAS RAID controllers on motherboards for many products. [LSI claims support for Solaris 2.6!] These controllers have a builtin microcontroller(ARM 926, IIRC), firmware, and nonvolatile memory (NVSRAM) for implementing the RAID features. We manage them through BIOS, OBP, or raidctl(1m). As Torrey says, very much like the A1000. Some of the fancier LSI products offer RAID 5, too. Yes, some (many) of the RAID controllers do all the RAID in the hardware. I don't see where Ed was disputing that. But there will always be a [large] market for cheaper but less capable products and so at least for awhile to come there will be these not-quite- RAID cards. Probably for a very long while. winmodem, anyone? -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
Ed Gould wrote: On Sep 8, 2006, at 11:35, Torrey McMahon wrote: If I read between the lines here I think you're saying that the raid functionality is in the chipset but the management can only be done by software running on the outside. (Right?) No. All that's in the chipset is enough to read a RAID volume for boot. Block layout, RAID-5 parity calculations, and the rest are all done in the software. I wouldn't be surprised if RAID-5 parity checking was absent on read for boot, but I don't actually know. At Sun, we often use the LSI Logic LSISAS1064 series of SAS RAID controllers on motherboards for many products. [LSI claims support for Solaris 2.6!] These controllers have a builtin microcontroller(ARM 926, IIRC), firmware, and nonvolatile memory (NVSRAM) for implementing the RAID features. We manage them through BIOS, OBP, or raidctl(1m). As Torrey says, very much like the A1000. Some of the fancier LSI products offer RAID 5, too. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
> Dunno about eSATA jbods, but eSATA host ports have > appeared on at least two HDTV-capable DVRs for storage > expansion (looks like one model of the Scientific Atlanta > cable box DVR's as well as on the shipping-any-day-now > Tivo Series 3). > > It's strange that they didn't go with firewire since it's > already widely used for digital video. Cost? If you use eSata it's pretty much just a physical connector onto the board, whereas I guess firewire needs a 1394 interface (couple of dollars?) plus a royalty to all the patent holders. It's probably not much, but I can't see how there can be *any* margin in consumer electronics these days... Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Sep 8, 2006, at 14:22, Ed Gould wrote: On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote: I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). It's not clear to me that JBOD is dead. The (S)ATA RAID cards I've seen are really software RAID solutions that know just enough in the controller to let the BIOS boot off a RAID volume. None of the expensive RAID stuff is in the controller. additionally the only RAID many support favor just mirroring and striping (RAID 0, 1, 10, etc) not as many do parity. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Sep 8, 2006, at 11:35, Torrey McMahon wrote: If I read between the lines here I think you're saying that the raid functionality is in the chipset but the management can only be done by software running on the outside. (Right?) No. All that's in the chipset is enough to read a RAID volume for boot. Block layout, RAID-5 parity calculations, and the rest are all done in the software. I wouldn't be surprised if RAID-5 parity checking was absent on read for boot, but I don't actually know. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
Ed Gould wrote: On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote: I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). It's not clear to me that JBOD is dead. The (S)ATA RAID cards I've seen are really software RAID solutions that know just enough in the controller to let the BIOS boot off a RAID volume. None of the expensive RAID stuff is in the controller. If I read between the lines here I think you're saying that the raid functionality is in the chipset but the management can only be done by software running on the outside. (Right?) A1000 anyone? :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote: I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). It's not clear to me that JBOD is dead. The (S)ATA RAID cards I've seen are really software RAID solutions that know just enough in the controller to let the BIOS boot off a RAID volume. None of the expensive RAID stuff is in the controller. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Fri, 2006-09-08 at 09:33 -0700, Richard Elling - PAE wrote: > There has been some recent discussion about eSATA JBODs in the press. I'm not > sure they will gain much market share. iPods and flash drives have a much > larger > market share. Dunno about eSATA jbods, but eSATA host ports have appeared on at least two HDTV-capable DVRs for storage expansion (looks like one model of the Scientific Atlanta cable box DVR's as well as on the shipping-any-day-now Tivo Series 3). It's strange that they didn't go with firewire since it's already widely used for digital video. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
[EMAIL PROTECTED] wrote: I don't quite see this in my crystal ball. Rather, I see all of the SAS/SATA chipset vendors putting RAID in the chipset. Basically, you can't get a "dumb" interface anymore, except for fibre channel :-). In other words, if we were to design a system in a chassis with perhaps 8 disks, then we would also use a controller which does RAID. So, we're right back to square 1. Richard, when I talk about cheap JBOD I think about home users/small servers/small companies. I guess you can sell 100 X4500 and at the same time 1000 (or even more) cheap JBODs to the small companies which for sure will not buy the big boxes. Yes, I know, you earn more selling X4500. But what do you think, how Linux found its way to data centers and become important player in OS space ? Through home users/enthusiasts who become familiar with it and then started using the familiar things in their job. I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). There has been some recent discussion about eSATA JBODs in the press. I'm not sure they will gain much market share. iPods and flash drives have a much larger market share. Proven way to achieve "world domination". ;-)) Dang! I was planning to steal a cobalt bomb and hold the world hostage while I relax in my space station... zero-G whee! :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
zfs "hogs all the ram" under a sustained heavy write load. This is being tracked by: 6429205 each zpool needs to monitor it's throughput and throttle heavy writers -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Hello James, Thursday, September 7, 2006, 8:58:10 PM, you wrote: JD> with ZFS I have found that memory is a much greater limitation, even JD> my dual 300mhz u2 has no problem filling 2x 20MB/s scsi channels, even JD> with compression enabled, using raidz and 10k rpm 9GB drives, thanks JD> to its 2GB of ram it does great at everything I throw at it. On the JD> other hand my blade 1500 ram 512MB with 3x 18GB 10k rpm drives using JD> 2x 40MB/s scsi channels , os is on a 80GB ide drive, has problems JD> interactively because as soon as you push zfs hard it hogs all the ram JD> and may take 5 or 10 seconds to get response on xterms while the JD> machine clears out ram and loads its applications/data back into ram. IIRC correctly there's is a bug in SPARC ata driver which when combined with ZFS expresses itself. Unless you use only ZFS on those SCSI drives...? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Fri, Sep 08, 2006 at 09:41:58AM +0100, Darren J Moffat wrote: > [EMAIL PROTECTED] wrote: > >Richard, when I talk about cheap JBOD I think about home users/small > >servers/small companies. I guess you can sell 100 X4500 and at the same > >time 1000 (or even more) cheap JBODs to the small companies which for sure > >will not buy the big boxes. Yes, I know, you earn more selling > >X4500. But what do you think, how Linux found its way to data centers > >and become important player in OS space ? Through home users/enthusiasts > >who > >become familiar with it and then started using the familiar things in > >their job. > > But Linux isn't a hardware vendor and doesn't make cheap JBOD or > multipack for the home user. Linux is used as a symbol. > So I don't see how we get from "Sun should make cheap home user JBOD" > (which BTW we don't really have the channel to sell for anyway) to "but > Linux dominated this way". "Home user" = tech/geek/enthusiasts who is an admin in job [ Linux ] "Home user" is using linux at home and is satisfied with it. He/she then goes to job and says "Let's install/use it on less important servers". He/she (and management) is again satisfied with it. So lets use it at more important servers ... etc. [ ZFS ] "Home user" is using ZFS (Solaris) at home (remember easiness and even WEB interface to ZFS operations !,) to keep photos, musics, etc. and is satisfied with it. He/she the goes to his/her job and says "I use for a while a fantastic filesystem". Lets use it on less important servers". Ok. Later on "Works ok. Let's use on more important ". Etc... Yes, I know, a bit naive. But remember that not only Linux spreads this way but also Solaris as well. I guess most of downloaded Solaris CD/DVD are for x86. You as a company "attack" at high end/midrange level. Let users/admins/fans "attack" at lower end level. przemol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
[EMAIL PROTECTED] wrote: Richard, when I talk about cheap JBOD I think about home users/small servers/small companies. I guess you can sell 100 X4500 and at the same time 1000 (or even more) cheap JBODs to the small companies which for sure will not buy the big boxes. Yes, I know, you earn more selling X4500. But what do you think, how Linux found its way to data centers and become important player in OS space ? Through home users/enthusiasts who become familiar with it and then started using the familiar things in their job. But Linux isn't a hardware vendor and doesn't make cheap JBOD or multipack for the home user. So I don't see how we get from "Sun should make cheap home user JBOD" (which BTW we don't really have the channel to sell for anyway) to "but Linux dominated this way". -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Torrey McMahon writes: > Nicolas Dorfsman wrote: > >> The hard part is getting a set of simple > >> requirements. As you go into > >> more complex data center environments you get hit > >> with older Solaris > >> revs, other OSs, SOX compliance issues, etc. etc. > >> etc. The world where > >> most of us seem to be playing with ZFS is on the > >> lower end of the > >> complexity scale. Sure, throw your desktop some fast > >> SATA drives. No > >> problem. Oh wait, you've got ten Oracle DBs on three > >> E25Ks that need to > >> be backed up every other blue moon ... > >> > > > > Another fact is CPU use. > > > > Does anybody really know what will be effects of intensive CPU workload > > on ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU > > workload ? > > > > I heard a story about a customer complaining about his higend server > > performances; when a guy came on site...and discover beautiful SVM RAID-5 > > volumes, the solution was almost found. > > > > Raid calculations take CPU time but I haven't seen numbers on ZFS usage. > SVM is known for using a fair bit of CPU when performing R5 calculations > and I'm sure other OS have the same issue. EMC used to go around saying > that offloading raid calculations to their storage arrays would increase > application performance because you would free up CPU time to do other > stuff. The "EMC effect" is how they used to market it. > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I just measured quickly that a 1.2Ghz sparc can do [400-500]MB/sec ofencoding(time spent in misnamed function vdev_raidz_reconstruct) for a 3 disk raid-z group. Bigger groups, should cost more but I'd also expect the cost to decrease with increase CPU frequency. Note that, the raidz cost is impacted by this: 6460622 zio_nowait() doesn't live up to its name -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Thu, Sep 07, 2006 at 12:14:20PM -0700, Richard Elling - PAE wrote: > [EMAIL PROTECTED] wrote: > >This is the case where I don't understand Sun's politics at all: Sun > >doesn't offer really cheap JBOD which can be bought just for ZFS. And > >don't even tell me about 3310/3320 JBODs - they are horrible expansive :-( > > Yep, multipacks are EOL for some time now -- killed by big disks. Back when > disks were small, people would buy multipacks to attach to their > workstations. > There was a time when none of the workstations had internal disks, but I'd > be dating myself :-) > > For datacenter-class storage, multipacks were not appropriate. They only > had single-ended SCSI interfaces which have a limited cable budget which > limited their use in racks. Also, they weren't designed to be used in a > rack environment, so they weren't mechanically appropriate either. I > suppose > you can still find them on eBay. > >If Sun wants ZFS to be absorbed quicker it should have such _really_ cheap > >JBOD. > > I don't quite see this in my crystal ball. Rather, I see all of the > SAS/SATA > chipset vendors putting RAID in the chipset. Basically, you can't get a > "dumb" interface anymore, except for fibre channel :-). In other words, if > we were to design a system in a chassis with perhaps 8 disks, then we would > also use a controller which does RAID. So, we're right back to square 1. Richard, when I talk about cheap JBOD I think about home users/small servers/small companies. I guess you can sell 100 X4500 and at the same time 1000 (or even more) cheap JBODs to the small companies which for sure will not buy the big boxes. Yes, I know, you earn more selling X4500. But what do you think, how Linux found its way to data centers and become important player in OS space ? Through home users/enthusiasts who become familiar with it and then started using the familiar things in their job. Proven way to achieve "world domination". ;-)) przemol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
The bigger problem with system utilization for software RAID is the cache, not the CPU cycles proper. Simply preparing to write 1 MB of data will flush half of a 2 MB L2 cache. This hurts overall system performance far more than the few microseconds that XORing the data takes. (A similar effect occurs with file system buffering, and this is one reason why direct I/O is attractive for databases — there’s no pollution of the system cache.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
[EMAIL PROTECTED] wrote: This is the case where I don't understand Sun's politics at all: Sun doesn't offer really cheap JBOD which can be bought just for ZFS. And don't even tell me about 3310/3320 JBODs - they are horrible expansive :-( Yep, multipacks are EOL for some time now -- killed by big disks. Back when disks were small, people would buy multipacks to attach to their workstations. There was a time when none of the workstations had internal disks, but I'd be dating myself :-) For datacenter-class storage, multipacks were not appropriate. They only had single-ended SCSI interfaces which have a limited cable budget which limited their use in racks. Also, they weren't designed to be used in a rack environment, so they weren't mechanically appropriate either. I suppose you can still find them on eBay. If Sun wants ZFS to be absorbed quicker it should have such _really_ cheap JBOD. I don't quite see this in my crystal ball. Rather, I see all of the SAS/SATA chipset vendors putting RAID in the chipset. Basically, you can't get a "dumb" interface anymore, except for fibre channel :-). In other words, if we were to design a system in a chassis with perhaps 8 disks, then we would also use a controller which does RAID. So, we're right back to square 1. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
On 9/7/06, Torrey McMahon <[EMAIL PROTECTED]> wrote: Nicolas Dorfsman wrote: >> The hard part is getting a set of simple >> requirements. As you go into >> more complex data center environments you get hit >> with older Solaris >> revs, other OSs, SOX compliance issues, etc. etc. >> etc. The world where >> most of us seem to be playing with ZFS is on the >> lower end of the >> complexity scale. Sure, throw your desktop some fast >> SATA drives. No >> problem. Oh wait, you've got ten Oracle DBs on three >> E25Ks that need to >> be backed up every other blue moon ... >> > > Another fact is CPU use. > > Does anybody really know what will be effects of intensive CPU workload on ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU workload ? > with ZFS I have found that memory is a much greater limitation, even my dual 300mhz u2 has no problem filling 2x 20MB/s scsi channels, even with compression enabled, using raidz and 10k rpm 9GB drives, thanks to its 2GB of ram it does great at everything I throw at it. On the other hand my blade 1500 ram 512MB with 3x 18GB 10k rpm drives using 2x 40MB/s scsi channels , os is on a 80GB ide drive, has problems interactively because as soon as you push zfs hard it hogs all the ram and may take 5 or 10 seconds to get response on xterms while the machine clears out ram and loads its applications/data back into ram. James Dickens uadmin.blogspot.com > I heard a story about a customer complaining about his higend server performances; when a guy came on site...and discover beautiful SVM RAID-5 volumes, the solution was almost found. > Raid calculations take CPU time but I haven't seen numbers on ZFS usage. SVM is known for using a fair bit of CPU when performing R5 calculations and I'm sure other OS have the same issue. EMC used to go around saying that offloading raid calculations to their storage arrays would increase application performance because you would free up CPU time to do other stuff. The "EMC effect" is how they used to market it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Richard Elling - PAE wrote: Torrey McMahon wrote: Raid calculations take CPU time but I haven't seen numbers on ZFS usage. SVM is known for using a fair bit of CPU when performing R5 calculations and I'm sure other OS have the same issue. EMC used to go around saying that offloading raid calculations to their storage arrays would increase application performance because you would free up CPU time to do other stuff. The "EMC effect" is how they used to market it. In all modern processors, and most ancient processors, XOR takes 1 CPU cycle and is easily pipelined. Getting the data from the disk to the registers takes thousands or hundreds of thousands of CPU cycles. You will more likely feel the latency of the read-modify-write for RAID-5 than the CPU time needed for XOR. ZFS avoids the read-modify-write, but does compression, so it is possible that a few more CPU cycles will be used. But it should still be a big win because CPU cycles are less expensive than disk I/O. Meanwhile, I think we're all looking for good data on this. -- richard I believe the true answer is (wait for it...) It Depends(TM) on what you're limited on. If your system under your load is CPU constrained, ZFS calculating the RAIDZ parity (and checksum) is going to hurt; if you are IO constrained then having the otherwise idle CPU do (which is, of course, more than just an XOR instruction, but we all know that) the work may help. The ZFS design center of mostly-idle CPUs is not always accurate, although most customers don't dare push the system to 100% utilization. It's when you _do_ hit that point, or when the extra overhead unexpectedly makes you hit or go beyond that point that things can get interesting quickly. - Pete ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Torrey McMahon wrote: Raid calculations take CPU time but I haven't seen numbers on ZFS usage. SVM is known for using a fair bit of CPU when performing R5 calculations and I'm sure other OS have the same issue. EMC used to go around saying that offloading raid calculations to their storage arrays would increase application performance because you would free up CPU time to do other stuff. The "EMC effect" is how they used to market it. In all modern processors, and most ancient processors, XOR takes 1 CPU cycle and is easily pipelined. Getting the data from the disk to the registers takes thousands or hundreds of thousands of CPU cycles. You will more likely feel the latency of the read-modify-write for RAID-5 than the CPU time needed for XOR. ZFS avoids the read-modify-write, but does compression, so it is possible that a few more CPU cycles will be used. But it should still be a big win because CPU cycles are less expensive than disk I/O. Meanwhile, I think we're all looking for good data on this. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Nicolas Dorfsman wrote: The hard part is getting a set of simple requirements. As you go into more complex data center environments you get hit with older Solaris revs, other OSs, SOX compliance issues, etc. etc. etc. The world where most of us seem to be playing with ZFS is on the lower end of the complexity scale. Sure, throw your desktop some fast SATA drives. No problem. Oh wait, you've got ten Oracle DBs on three E25Ks that need to be backed up every other blue moon ... Another fact is CPU use. Does anybody really know what will be effects of intensive CPU workload on ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU workload ? I heard a story about a customer complaining about his higend server performances; when a guy came on site...and discover beautiful SVM RAID-5 volumes, the solution was almost found. Raid calculations take CPU time but I haven't seen numbers on ZFS usage. SVM is known for using a fair bit of CPU when performing R5 calculations and I'm sure other OS have the same issue. EMC used to go around saying that offloading raid calculations to their storage arrays would increase application performance because you would free up CPU time to do other stuff. The "EMC effect" is how they used to market it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
> The hard part is getting a set of simple > requirements. As you go into > more complex data center environments you get hit > with older Solaris > revs, other OSs, SOX compliance issues, etc. etc. > etc. The world where > most of us seem to be playing with ZFS is on the > lower end of the > complexity scale. Sure, throw your desktop some fast > SATA drives. No > problem. Oh wait, you've got ten Oracle DBs on three > E25Ks that need to > be backed up every other blue moon ... Another fact is CPU use. Does anybody really know what will be effects of intensive CPU workload on ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU workload ? I heard a story about a customer complaining about his higend server performances; when a guy came on site...and discover beautiful SVM RAID-5 volumes, the solution was almost found. Nicolas This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Roch - PAE wrote: Thinking some more about this. If your requirements does mandate some form of mirroring, then it truly seems that ZFS should take that in charge if only because of the self-healing characteristics. So I feel the storage array's job is to export low latency Luns to ZFS. The hard part is getting a set of simple requirements. As you go into more complex data center environments you get hit with older Solaris revs, other OSs, SOX compliance issues, etc. etc. etc. The world where most of us seem to be playing with ZFS is on the lower end of the complexity scale. Sure, throw your desktop some fast SATA drives. No problem. Oh wait, you've got ten Oracle DBs on three E25Ks that need to be backed up every other blue moon ... I agree with the general idea that an array, be it one disk or some raid combination, should simply export low latency LUNs. However, its the features offered by the array - Like site to site replication - used to meet more complex requirements that literally slow things down. In many cases you'll see years old operational procedures causing those low latency LUNs to slow down even more. Something really hard to get a customer to undo because a new fangled file system is out. ;) I'd be happy to live with those simple Luns but I guess some storage will just refuse to export non-protected luns. Now we can definitively take advantage of the Array's capability of exporting highly resilient Luns; RAID-5 seems to fit the bill rather well here. Even an 9+1 luns will be quite resilient and have a low block overhead. I think 99x0 used to do 3+1 only. Now it's 7+1 if I recall. Close enough I suppose. So we benefit from the arrays resiliency as well as it's low latency characteristics. And we mirror data at the ZFS level which means great performance and great data integrity and great availability. Note that ZFS write characteristics (all sequential) means that we will commonly be filling full stripes on the luns thus avoiding the partial stripe performance pitfall. One thing comes to mind in that case. Many arrays do sequential detect on the blocks that come in to the front end ports. If things get split up to much or out of order or array characteristic here> then you could induce more latency as the array does cartwheels trying to figure out whats going on. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Wee Yeh Tan writes: > On 9/5/06, Torrey McMahon <[EMAIL PROTECTED]> wrote: > > This is simply not true. ZFS would protect against the same type of > > errors seen on an individual drive as it would on a pool made of HW raid > > LUN(s). It might be overkill to layer ZFS on top of a LUN that is > > already protected in some way by the devices internal RAID code but it > > does not "make your data susceptible to HW errors caused by the storage > > subsystem's RAID algorithm, and slow down the I/O". > > & Roch's recommendation to leave at least 1 layer of redundancy to ZFS > allows the extension of ZFS's own redundancy features for some truely > remarkable data reliability. > > Perhaps, the question should be how one could mix them to get the best > of both worlds instead of going to either extreme. > > > True, ZFS can't manage past the LUN into the array. Guess what? ZFS > > can't get past the disk drive firmware eitherand thats a good thing > > for all parties involved. > > > -- > Just me, > Wire ... > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Thinking some more about this. If your requirements does mandate some form of mirroring, then it truly seems that ZFS should take that in charge if only because of the self-healing characteristics. So I feel the storage array's job is to export low latency Luns to ZFS. I'd be happy to live with those simple Luns but I guess some storage will just refuse to export non-protected luns. Now we can definitively take advantage of the Array's capability of exporting highly resilient Luns; RAID-5 seems to fit the bill rather well here. Even an 9+1 luns will be quite resilient and have a low block overhead. So we benefit from the arrays resiliency as well as it's low latency characteristics. And we mirror data at the ZFS level which means great performance and great data integrity and great availability. Note that ZFS write characteristics (all sequential) means that we will commonly be filling full stripes on the luns thus avoiding the partial stripe performance pitfall. If you must shy away from any form of mirroring, then it's either stripe your raid-5 luns (performance edge for those who live dangerously) or raid-z around those raid-5 luns (lower cost, survives lun failures). -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Jonathan Edwards wrote: Here's 10 options I can think of to summarize combinations of zfs with hw redundancy: # ZFS ARRAY HWCAPACITYCOMMENTS -- --- 1 R0 R1 N/2 hw mirror - no zfs healing (XXX) 2 R0 R5 N-1 hw R5 - no zfs healing (XXX) 3 R1 2 x R0 N/2 flexible, redundant, good perf 4 R1 2 x R5 (N/2)-1 flexible, more redundant, decent perf 5 R1 1 x R5 (N-1)/2 parity and mirror on same drives (XXX) 6 RZ R0 N-1 standard RAIDZ - no array RAID (XXX) 7 RZ R1 (tray) (N/2)-1 RAIDZ+1 8 RZ R1 (drives) (N/2)-1 RAID1+Z (highest redundancy) 9 RZ 2 x R5 N-3 triple parity calculations (XXX) 10 RZ 1 x R5 N-2 double parity calculations (XXX) If you've invested in a RAID controller on an array, you might as well take advantage of it, otherwise you could probably get an old D1000 chassis somewhere and just run RAIDZ on JBOD. I think it would be good if RAIDoptimizer could be expanded to show these cases, too. Right now, the availability and performance models are simple. To go to this level, the models get more complex and there are many more tunables. However, for a few representative cases, it might make sense to do deep analysis, even if that analysis does not get translated into a tool directly. We have the tools to do the deep analysis, but the models will need to be written and verified. That said, does anyone want to see this sort of analysis? If so, what configurations should we do first (keep in mind that each config may take a few hours, maybe more depending on the performance model) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Wee Yeh Tan wrote: Perhaps, the question should be how one could mix them to get the best of both worlds instead of going to either extreme. In the specific case of a 3320 I think Jonathan's chart has a lot of good info that can be put to use. In the general case, well, I hate to say this but it depends. From what I've seen the general discussions on this list tend toward the "Make my small direct connected desktop/server go as fast as possible". Once you leave that space and move to the opposite end of the spectrum, a large heterogeneous datacenter, you have to start looking at the overall data management strategy and how different pieces of technology get implemented. (Site to site array replication being a good example.) Thats where I think you'll find more interesting cases where raid setups will be used with ZFS on top more then not. There are also the speed enhancement provided by a HW raid array, and usually RAS too, compared to a native disk drive but the numbers on that are still coming in and being analyzed. (See previous threads.) -- Torrey McMahon Sun Microsystems Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
On Sep 5, 2006, at 06:45, Robert Milkowski wrote:Hello Wee,Tuesday, September 5, 2006, 10:58:32 AM, you wrote:WYT> On 9/5/06, Torrey McMahon <[EMAIL PROTECTED]> wrote: This is simply not true. ZFS would protect against the same type oferrors seen on an individual drive as it would on a pool made of HW raidLUN(s). It might be overkill to layer ZFS on top of a LUN that isalready protected in some way by the devices internal RAID code but itdoes not "make your data susceptible to HW errors caused by the storagesubsystem's RAID algorithm, and slow down the I/O". WYT> & Roch's recommendation to leave at least 1 layer of redundancy to ZFSWYT> allows the extension of ZFS's own redundancy features for some truelyWYT> remarkable data reliability.WYT> Perhaps, the question should be how one could mix them to get the bestWYT> of both worlds instead of going to either extreme.Depends on your data but sometime it could be useful to create HW RAIDand then do just striping on ZFS side between at least two LUNs. Thatway you do not get data protection but fs/pool protection with dittoblock. Of course each LUN is HW RAID made of different physical disks.i remember working up a chart on this list about 2 months ago:Here's 10 options I can think of to summarize combinations of zfs with hw redundancy:# ZFS ARRAY HW CAPACITY COMMENTS-- --- 1 R0 R1 N/2 hw mirror - no zfs healing (XXX)2 R0 R5 N-1 hw R5 - no zfs healing (XXX)3 R1 2 x R0 N/2 flexible, redundant, good perf4 R1 2 x R5 (N/2)-1 flexible, more redundant, decent perf5 R1 1 x R5 (N-1)/2 parity and mirror on same drives (XXX)6 RZ R0 N-1 standard RAIDZ - no array RAID (XXX)7 RZ R1 (tray) (N/2)-1 RAIDZ+18 RZ R1 (drives) (N/2)-1 RAID1+Z (highest redundancy)9 RZ 2 x R5 N-3 triple parity calculations (XXX)10 RZ 1 x R5 N-2 double parity calculations (XXX)If you've invested in a RAID controller on an array, you might as well take advantage of it, otherwise you could probably get an old D1000 chassis somewhere and just run RAIDZ on JBOD. If you're more concerned about redundancy than space, with the SUN/STK 3000 series dual controller arrays I would either create at least 2 x RAID5 luns balanced across controllers and zfs mirror, or create at least 4 x RAID1 luns balanced across controllers and use RAIDZ. RAID0 isn't going to make that much sense since you've got a 128KB txg commit on zfs which isn't going to be enough to do a full stripe in most cases..je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
Hello Wee, Tuesday, September 5, 2006, 10:58:32 AM, you wrote: WYT> On 9/5/06, Torrey McMahon <[EMAIL PROTECTED]> wrote: >> This is simply not true. ZFS would protect against the same type of >> errors seen on an individual drive as it would on a pool made of HW raid >> LUN(s). It might be overkill to layer ZFS on top of a LUN that is >> already protected in some way by the devices internal RAID code but it >> does not "make your data susceptible to HW errors caused by the storage >> subsystem's RAID algorithm, and slow down the I/O". WYT> & Roch's recommendation to leave at least 1 layer of redundancy to ZFS WYT> allows the extension of ZFS's own redundancy features for some truely WYT> remarkable data reliability. WYT> Perhaps, the question should be how one could mix them to get the best WYT> of both worlds instead of going to either extreme. Depends on your data but sometime it could be useful to create HW RAID and then do just striping on ZFS side between at least two LUNs. That way you do not get data protection but fs/pool protection with ditto block. Of course each LUN is HW RAID made of different physical disks. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
On 9/5/06, Torrey McMahon <[EMAIL PROTECTED]> wrote: This is simply not true. ZFS would protect against the same type of errors seen on an individual drive as it would on a pool made of HW raid LUN(s). It might be overkill to layer ZFS on top of a LUN that is already protected in some way by the devices internal RAID code but it does not "make your data susceptible to HW errors caused by the storage subsystem's RAID algorithm, and slow down the I/O". & Roch's recommendation to leave at least 1 layer of redundancy to ZFS allows the extension of ZFS's own redundancy features for some truely remarkable data reliability. Perhaps, the question should be how one could mix them to get the best of both worlds instead of going to either extreme. True, ZFS can't manage past the LUN into the array. Guess what? ZFS can't get past the disk drive firmware eitherand thats a good thing for all parties involved. -- Just me, Wire ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
UNIX admin wrote: My question is how efficient will ZFS be, given that it will be layered on top of the hardware RAID and write cache? ZFS delivers best performance when used standalone, directly on entire disks. By using ZFS on top of a HW RAID, you make your data susceptible to HW errors caused by the storage subsystem's RAID algorithm, and slow down the I/O. This is simply not true. ZFS would protect against the same type of errors seen on an individual drive as it would on a pool made of HW raid LUN(s). It might be overkill to layer ZFS on top of a LUN that is already protected in some way by the devices internal RAID code but it does not "make your data susceptible to HW errors caused by the storage subsystem's RAID algorithm, and slow down the I/O". True, ZFS can't manage past the LUN into the array. Guess what? ZFS can't get past the disk drive firmware eitherand thats a good thing for all parties involved. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Mon, Sep 04, 2006 at 01:59:53AM -0700, UNIX admin wrote: > > My question is how efficient will ZFS be, given that > > it will be layered on top of the hardware RAID and > > write cache? > > ZFS delivers best performance when used standalone, directly on entire disks. > By using ZFS on top of a HW RAID, you make your data susceptible to HW errors > caused by the storage subsystem's RAID algorithm, and slow down the I/O. > > You should see much better performance by not creating a HW RAID, then adding > all the disks in the 3320' enclosures to a ZFS RAIDZ pool. This is the case where I don't understand Sun's politics at all: Sun doesn't offer really cheap JBOD which can be bought just for ZFS. And don't even tell me about 3310/3320 JBODs - they are horrible expansive :-( If Sun wants ZFS to be absorbed quicker it should have such _really_ cheap JBOD. przemol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320
> My question is how efficient will ZFS be, given that > it will be layered on top of the hardware RAID and > write cache? ZFS delivers best performance when used standalone, directly on entire disks. By using ZFS on top of a HW RAID, you make your data susceptible to HW errors caused by the storage subsystem's RAID algorithm, and slow down the I/O. You should see much better performance by not creating a HW RAID, then adding all the disks in the 3320' enclosures to a ZFS RAIDZ pool. Additionally, given enough disks, it might be possible to squeeze even better performance by creating several RAIDZ vdevs and striping them. For a discussion on this aspect, please see "WHEN TO (AND NOT TO) USE RAID-Z" treatise at http://blogs.sun.com/roch/entry/when_to_and_not_to. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss