[zfs-discuss] Maximum zfs send/receive throughput
It seems we are hitting a boundary with zfs send/receive over a network link (10Gb/s). We can see peak values of up to 150MB/s, but on average about 40-50MB/s are replicated. This is far away from the bandwidth that a 10Gb link can offer. Is it possible, that ZFS is giving replication a too low priority/throttling it too much? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS/ZFS slow on parallel writes
Bob Friesenhahn wrote: Striping across two large raidz2s is not ideal for multi-user use. You are getting the equivalent of two disks worth of IOPS, which does not go very far. More smaller raidz vdevs or mirror vdevs would be better. Also, make sure that you have plenty of RAM installed. For small files I would definitely go mirrored. What disk configuration (number of disks, and RAID topology) is the NetApp using? On NetApp you only can choose between RAID-DP and RAID-DP :-) With mirroring you will certainly loose space-wise against NetApp, but if your data compresses well, you will still end up with more space available. Our 7410 system currently compresses with a CPU utilisation of around 3% for compression. This while using gzip-2 and getting a compression ratio of 1.96. So far, I'm very happy with the system. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on SAN?
Andras Spitzer wrote: Is it worth to move the redundancy from the SAN array layer to the ZFS layer? (configuring redundancy on both layers is sounds like a waste to me) There are certain advantages on the array to have redundancy configured (beyond the protection against simple disk failure). Can we compare the advantages of having (for example) RAID5 configured on a high-end SAN with no redundancy at the ZFS layer versus no redundant RAID configuration on the high-end SAN but having raidz or raidz2 on the ZFS layer? Any tests, experience or best practices regarding this topic? Would also like to hear about experiences with ZFS on EMC's Symmetrix. Currently we are using VxFS with Powerpath for multipathing, and synchronous SRDF for replication to our other datacenter. At some point we will move to ZFS, but there are so many options how to implement this. From a sysadmin point of view (simplicity), I would like to use mpxio and host based mirroring. ZFS self-healing would be available in this configuration. Asking EMC guys for their opinion is not an option. They will push you to buy SRDF and Powerpath licenses... :-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Crazy Problem with
Mika Borner wrote: > > You're lucky. Ben just wrote about it :-) > > http://www.cuddletech.com/blog/pivot/entry.php?id=1013 > > > Oops, should have read your message completly :-) Anyway you can "lernen" something from it... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Crazy Problem with
Henri Meddox wrote: > Hi Folks, > call me a lernen ;-) > > I got a crazy Problem with "zpool list" and the size of my pool: > > created "zpool create raidz2 hdd1 hdd2 hdd3" - each hdd is about 1GB. > > zpool list shows me a size of 2.95GB - shouldn't this bis online 1GB? > > After creating a file about 500MB -> Capacity is shown as 50 % -> The right > value? > > Is this a known bug / feature? > > You're lucky. Ben just wrote about it :-) http://www.cuddletech.com/blog/pivot/entry.php?id=1013 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send / zfs receive hanging
Hi Updated today from snv 101 to 105 today. I wanted to do zfs send/receive to a new zpool while forgetting that the new pool was a newer version. zfs send timed out after a while, but it was impossible to kill the receive process. Shouldn't the zfs receive command just fail with a "wrong version" error? In the end I had to reboot... It would be also nice to be able to specify the zpool version during pool creation. E.g. If I have a newer machine and I want to move data to an older one, I should be able to specify the pool version, otherwise it's a one-way street. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?
Ulrich Graef wrote: > You need not to wade through your paper... > ECC theory tells, that you need a minimum distance of 3 > to correct one error in a codeword, ergo neither RAID-5 or RAID-6 > are enough: you need RAID-2 (which nobody uses today). > > Raid-Controllers today take advantage of the fact that they know, > which disk is returning the bad block, because this disk returns > a read error. > > ZFS is even able to correct, when an error in the data exist, > but no disk is reporting a read error, > because ZFS ensures the integrity from root-block to the data blocks > with a long checksum accompanying the block pointers. > > The Netapp paper mentioned by JZ (http://pages.cs.wisc.edu/~krioukov/ParityLostAndParityRegained-FAST08.ppt) talks about write verify. Would this feature make sense in a ZFS environment? I'm not sure if there is any advantage. It seems quite unlikely, when data is written in a redundant way to two different disks, that both disks lose or misdirect the same writes. Maybe ZFS could have an option to enable instant readback of written blocks, if one wants to be absolutely sure, data is written correctly to disk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Storage 7000
Adam Leventhal wrote: > Yes. The Sun Storage 7000 Series uses the same ZFS that's in OpenSolaris > today. A pool created on the appliance could potentially be imported on an > OpenSolaris system; that is, of course, not explicitly supported in the > service contract. > Would be interesting to hear more about how Fishworks differs from Opensolaris, what build it is based on, what package mechanism you are using (IPS already?), and other differences... A little off topic: Do you know when the SSDs used in the Storage 7000 are available for the rest of us? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
> Leave the default recordsize. With 128K recordsize, > files smaller than If I turn zfs compression on, does the recordsize influence the compressratio in anyway? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] EMC - top of the table for efficiency, how well would ZFS do?
I've read the same log entry, and was also thinking about ZFS... Pillar Data Systems is also answering to the call http://blog.pillardata.com/pillar_data_blog/2008/08/blog-i-love-a-p.html BTW: Would transparent compression be considered as cheating? :-) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Current status of a ZFS root
>Unfortunately, the T1000 only has a > single drive bay (!) which makes it impossible to > follow our normal practice of mirroring the root file You can replace the existing 3.5" disk with two 2.5" disks (quite cheap) //Mika This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Oracle 11g Performace
Here's an interesting read about forthcoming Oracle 11g file system performance. Sadly, there is now information about how this works. It will be interesting to compare it with ZFS Performance, as soon as ZFS is tuned for Databases. "Speed and performance will be the hallmark of the 11g, said Chuck Rozwat, executive vice president for server technologies. The new database will run fast enough so that for the first time it will beat specialized filed systems for transferring large blocks of data. Rozwat displayed test results that showed that the 11g beta is capable of transferring 1GB in just under 9 seconds compared to 12 seconds for a file system. This level of performance is important to customers who are demanding instant access to data, Rozwat said. "If systems can't perform fast enough and deliver information in real time, we are in real trouble," Rozwat said." http://www.eweek.com/article2/0,1895,2036136,00.asp This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Physical Clone of zpool
Hi We have following scenario/problem: Our zpool resides on a single LUN on a Hitachi Storage Array. We are thinking about making a physical clone of the zpool with the ShadowImage functionality. ShadowImage takes a snapshot of the LUN, and copies all the blocks to a new LUN (physical copy). In our case the new LUN is then made available on the same host as the original LUN. After the ShadowImage is taken, we can see the snapshop using the format(1M) command as an additional disk. But when running "zpool import" , it only says: "no pools available to import". I think this is a bug. At least it should say something like "pool with the same name already imported". I have only tested this on 10 06/06, but I haven't found anything similar in the bug database, so it has to be in OpenSolaris as well. Mika # mv Disclaimer.txt /dev/null - This message is intended for the addressee only and may contain confidential or privileged information. If you are not the intended receiver, any disclosure, copying to any person or any action taken or omitted to be taken in reliance on this e-mail, is prohibited and may be un- lawful. You must therefore delete this e-mail. Internet communications may not be secure or error-free and may contain viruses. They may be subject to possible data corruption, accidental or on purpose. This e-mail is not and should not be construed as an offer or the solicitation of an offer to purchase or subscribe or sell or redeem any investments. - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Archiving on ZFS
Hi We are thinking about moving away from our Magneto-Optical based archive system (WORM technology). At the moment, we use a volume manager, which virtualizes the WORM's in the jukebox and presents them as UFS Filesystems. The volume manager automatically does asynchronous replication to an identical system in another datacenter. To speed up slow WORM access, the volume manager has a read cache. Because of this cache, we did not find out until we checked the WORMs directly, that we have a silent data corruption on some WORMs (Surprise! Surprise!). Mainly because of this, I was thinking about replacing to whole bunch with something more robust and modern... (guess what :-) Anyway, there are still some points, that came to into my mind: -The mechanism to asynchronously replicate to another host could be simulated using zfs send/receive. Still, I would prefer having a replication, that is automatically triggered, like Sun's StorEdge Network Data Replicator does this for UFS. This could be easily implemented in ZFS, I guess. -We have a lot of small files (about 7 millions ~4-32k Files). Like everyone, we want to be SOX compliant. So I tried to run BART over those files, to get a fingerprint. I remember it took a couple of hours to complete. At least it was much faster than on UFS. How can this be speeded up? Maybe we have to split those files to seperate filesystems. This leads to my next point: -I want to be sure, that nobody (maybe not even root) changes my filesystems for the next couple of years.. I know there is a read-only property, but it might not be enough. On our Hitachi array, we have a WORM functionality, which blocks write access to a LUN until a specified date. While this works, it is not as flexible as we want it, as the LUNs are too big for our use. Every day we are archiving documents. At the end of the day we want to freeze the filesystem. Would it be possible to add a time-lock property to ZFS? Could this be extended to still allow new files to be added to the locked file system , but not allowing to add/modify files (ZFS ACL's could handle this)? Would something like this make sense? Thanks for your thoughts... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Storage
>given that zfs always does copy-on-write for any updates, it's not clear >why this would necessarily degrade performance.. Writing should be no problem, as it is serialized... but when both database instances are reading a lot of different blocks at the same time, the spindles might "heat up". >If you want a full copy you can use zfs send/zfs receive -- either >within the same pool or between two different pools. Ok. But then again, it might be necessary to throttle zfs send/receive replication between pools. Otherwise the replication process might be influencing the production environment performance too much. Or is there already some kind of prioritization, that I have overlooked? //Mika # mv Disclaimer.txt /dev/null - This message is intended for the addressee only and may contain confidential or privileged information. If you are not the intended receiver, any disclosure, copying to any person or any action taken or omitted to be taken in reliance on this e-mail, is prohibited and may be un- lawful. You must therefore delete this e-mail. Internet communications may not be secure or error-free and may contain viruses. They may be subject to possible data corruption, accidental or on purpose. This e-mail is not and should not be construed as an offer or the solicitation of an offer to purchase or subscribe or sell or redeem any investments. - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS and Storage
>RAID5 is not a "nice" feature when it breaks. Let me correct myself... RAID5 is a "nice" feature for systems without ZFS... >Are huge write caches really a advantage? Or are you taking about huge >write caches with non-volatile storage? Yes, you are right. The huge cache is needed mostly because of poor write performance for RAID5 (of course battery backuped)... // Mika # mv Disclaimer.txt /dev/null - This message is intended for the addressee only and may contain confidential or privileged information. If you are not the intended receiver, any disclosure, copying to any person or any action taken or omitted to be taken in reliance on this e-mail, is prohibited and may be un- lawful. You must therefore delete this e-mail. Internet communications may not be secure or error-free and may contain viruses. They may be subject to possible data corruption, accidental or on purpose. This e-mail is not and should not be construed as an offer or the solicitation of an offer to purchase or subscribe or sell or redeem any investments. - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Storage
>but there may not be filesystem space for double the data. >Sounds like there is a need for a zfs-defragement-file utility perhaps? >Or if you want to be politically cagey about naming choice, perhaps, >zfs-seq-read-optimize-file ? :-) For Datawarehouse and streaming applications a "seq-read-omptimization" could bring additional performance. For "normal" databases this should be benchmarked... This brings me back to another question. We have a production database, that is cloned on every end of month for end-of-month processing (currently with a feature on our storage array). I'm thinking about a ZFS version of this task. Requirements: the production database should not suffer from performance degradation, whilst running the clone in parallel. As ZFS does not clone all the blocks, I wonder how much the procution database will suffer from sharing most of the data with the clone (concurrent access vs. caching) Maybe we need a feature in ZFS to do a full clone (speak: copy all blocks) inside the pool, if performance is an issue just like the "Quick Copy" vs. "Shadow Image" -features on HDS Arrays... - This message is intended for the addressee only and may contain confidential or privileged information. If you are not the intended receiver, any disclosure, copying to any person or any action taken or omitted to be taken in reliance on this e-mail, is prohibited and may be un- lawful. You must therefore delete this e-mail. Internet communications may not be secure or error-free and may contain viruses. They may be subject to possible data corruption, accidental or on purpose. This e-mail is not and should not be construed as an offer or the solicitation of an offer to purchase or subscribe or sell or redeem any investments. - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS and Storage
>I'm a little confused by the first poster's message as well, but you lose some benefits of ZFS if you don't create >your pools with either RAID1 or RAIDZ, such as data corruption detection. The array isn't going to detect that >because all it knows about are blocks. That's the dilemma, the array provides nice features like RAID1 and RAID5, but those are of no real use when using ZFS. The advantages to use ZFS on such array are e.g. the sometimes huge write cache available, use of consolidated storage and in SAN configurations, cloning and sharing storage between hosts. The price comes of course in additional administrative overhead (lots of microcode updates, more components that can fail in between, etc). Also, in bigger companies there usually is a team of storage specialist, that mostly do not know about the applications running on top of it, or do not care... (like: "here you have your bunch of gigabytes...") //Mika # mv Disclaimer.txt /dev/null - This message is intended for the addressee only and may contain confidential or privileged information. If you are not the intended receiver, any disclosure, copying to any person or any action taken or omitted to be taken in reliance on this e-mail, is prohibited and may be un- lawful. You must therefore delete this e-mail. Internet communications may not be secure or error-free and may contain viruses. They may be subject to possible data corruption, accidental or on purpose. This e-mail is not and should not be construed as an offer or the solicitation of an offer to purchase or subscribe or sell or redeem any investments. - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Storage
>The vdev can handle dynamic lun growth, but the underlying VTOC or >EFI label >may need to be zero'd and reapplied if you setup the initial vdev on >a slice. If >you introduced the entire disk to the pool you should be fine, but I >believe you'll >still need to offline/online the pool. Fine, at least the vdev can handle this... I asked about this feature in October and hoped that it would be implemented when integrating ZFS into Sol10U2 ... http://www.opensolaris.org/jive/thread.jspa?messageID=11646 Does anybody know something about when this feature is finally coming? This would keep the number of LUNs low on the host. Especially as devicenames can be really ugly (long!). //Mika # mv Disclaimer.txt /dev/null - This message is intended for the addressee only and may contain confidential or privileged information. If you are not the intended receiver, any disclosure, copying to any person or any action taken or omitted to be taken in reliance on this e-mail, is prohibited and may be un- lawful. You must therefore delete this e-mail. Internet communications may not be secure or error-free and may contain viruses. They may be subject to possible data corruption, accidental or on purpose. This e-mail is not and should not be construed as an offer or the solicitation of an offer to purchase or subscribe or sell or redeem any investments. - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Storage
Hi Now that Solaris 10 06/06 is finally downloadable I have some questions about ZFS. -We have a big storage sytem supporting RAID5 and RAID1. At the moment, we only use RAID5 (for non-solaris systems as well). We are thinking about using ZFS on those LUNs instead of UFS. As ZFS on Hardware RAID5 seems like overkill, an option would be to use RAID1 with RAID-Z. Then again, this is a waist of space, as it needs more disks, due to the mirroring. Later on, we might be using asynchronous replication to another storage system using SAN, even more waste of space. This looks somehow like storage virtualization as of today just doesn't work nicely together. What we need, would be the feature to use JBODs. -Does ZFS in the current version support LUN extension? With UFS, we have to zero the VTOC, and then adjust the new disk geometry. How does it look like with ZFS? -I've read the threads about zfs and databases. Still I'm not 100% convenienced about read performance. Doesn't the fragmentation of the large database files (because of the concept of COW) impact read-performance? -Does anybody have any experience in database cloning using the ZFS mechanism? What factors influence the performance, when running the cloned database in parallel? -I really like the idea to keep all needed databasefiles together, to allow fast and consistent cloning. Thanks Mika # mv Disclaimer.txt /dev/null - This message is intended for the addressee only and may contain confidential or privileged information. If you are not the intended receiver, any disclosure, copying to any person or any action taken or omitted to be taken in reliance on this e-mail, is prohibited and may be un- lawful. You must therefore delete this e-mail. Internet communications may not be secure or error-free and may contain viruses. They may be subject to possible data corruption, accidental or on purpose. This e-mail is not and should not be construed as an offer or the solicitation of an offer to purchase or subscribe or sell or redeem any investments. - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss