Re: [zfs-discuss] [zfs] Petabyte pool?
On Sat, Mar 16, 2013 at 2:27 PM, Jim Klimov jimkli...@cos.ru wrote: On 2013-03-16 15:20, Bob Friesenhahn wrote: On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote: Well, off the top of my head: 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's 8 x 60-Bay JBOD's with 60 x 4TB SAS drives RAIDZ2 stripe over the 8 x JBOD's That should fit within 1 rack comfortably and provide 1 PB storage.. What does one do for power? What are the power requirements when the system is first powered on? Can drive spin-up be staggered between JBOD chassis? Does the server need to be powered up last so that it does not time out on the zfs import? I guess you can use managed PDUs like those from APC (many models for varied socket types and amounts); they can be scripted on an advanced level, and on a basic level I think delays can be just configured per-socket to make the staggered startup after giving power from the wall (UPS) regardless of what the boxes' individual power sources can do. Conveniently, they also allow to do a remote hard-reset of hung boxes without walking to the server room ;) My 2c, //Jim Klimov Any modern JBOD should have the intelligence built in to stagger drive spin-up. I wouldn't spend money on one that didn't. There's really no need to stagger the JBOD power-up at the PDU. As for the head, yes it should have a delayed power on which you can typically set in the BIOS. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Wed, Feb 27, 2013 at 2:57 AM, Dan Swartzendruber dswa...@druber.comwrote: I've been using it since rc13. It's been stable for me as long as you don't get into things like zvols and such... Then it definitely isn't at the level of FreeBSD, and personally I would not consider that production ready. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Mon, Feb 25, 2013 at 10:33 PM, Tiernan OToole lsmart...@gmail.comwrote: Thanks all! I will check out FreeNAS and see what it can do... I will also check my RAID Card and see if it can work with JBOD... fingers crossed... The machine has a couple internal SATA ports (think there are 2, could be 4) so i was thinking of using those for boot disks and SSDs later... As a follow up question: Data Deduplication: The machine, to start, will have about 5Gb RAM. I read somewhere that 20TB storage would require about 8GB RAM, depending on block size... Since i dont know block sizes, yet (i store a mix of VMs, TV Shows, Movies and backups on the NAS) I am not sure how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough memory now, can i enable DeDupe at a later stage when i add memory? Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible? Assuming the drives are just JBOD drives (to be confirmed) could they just get imported? Thanks. Yes, you can move between FreeBSD and Illumos based distros as long as you are at a compatible zpool version (which they currently are). I'd avoid deduplication unless you absolutely need it... it's still a bit of a kludge. Stick to compression and your world will be a much happier place. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Mon, Feb 25, 2013 at 4:57 AM, Tiernan OToole lsmart...@gmail.com wrote: Good morning all. My home NAS died over the weekend, and it leaves me with a lot of spare drives (5 2Tb and 3 1Tb disks). I have a Dell Poweredge 2900 Server sitting in the house, which has not been doing much over the last while (bought it a few years back with the intent of using it as a storage box, since it has 8 Hot Swap drive bays) and i am now looking at building the NAS using ZFS... But, now i am confused as to what OS to use... OpenIndiana? Nexenta? FreeNAS/FreeBSD? I need something that will allow me to share files over SMB (3 if possible), NFS, AFP (for Time Machine) and iSCSI. Ideally, i would like something i can manage easily and something that works with the Dell... Any recommendations? Any comparisons to each? Thanks. All of them should provide the basic functionality you're looking for. None of them will provide SMB3 (at all) or AFP (without a third party package). --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Mon, Feb 25, 2013 at 7:57 AM, Volker A. Brandt v...@bb-c.de wrote: Tim Cook writes: I need something that will allow me to share files over SMB (3 if possible), NFS, AFP (for Time Machine) and iSCSI. Ideally, i would like something i can manage easily and something that works with the Dell... All of them should provide the basic functionality you're looking for. None of them will provide SMB3 (at all) or AFP (without a third party package). FreeNAS has AFP built-in, including a Time Machine discovery method. The latest FreeNAS is still based on Samba 3.x, but they are aware of 4.x and will probably integrate it at some point in the future. Then you should have SMB3. I don't know how far along they are... Best regards -- Volker FreeNAS comes with a package pre-installed to add AFP support. There is no native AFP support in FreeBSD and by association FreeNAS. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Feature Request for zfs pool/filesystem protection?
On Thu, Feb 21, 2013 at 8:34 AM, Jan Owoc jso...@gmail.com wrote: Hi Markus, On Thu, Feb 21, 2013 at 6:44 AM, Markus Grundmann mar...@freebsduser.eu wrote: I think the zfs allow|deny feature is only for filesystems. I wish me a feature to protect the complete pool. The property is restricted to zpool commands. On my notebook I have created a pool with simulated drives (gpt/drive1..n) and without any warnings or you are sure (y/n) I can destroy them after one second. [SNIP] For my personal reasons I will try to rewrite some pieces of the current source code in FreeBSD to get the wanted functionality for me. Please wish me good luck *g* I think Mike's solution is exactly what you are looking for. You can make a snapshot, hold it, and then zfs destroy (and even zfs destroy -r) will fail. The only thing you can do is run the command(s) to un-hold the snapshot. On Wed, Feb 20, 2013 at 4:08 PM, Mike Gerdts mger...@gmail.com wrote: # zfs create a/1 # zfs create a/1/hold # zfs snapshot a/1/hold@hold # zfs hold 'saveme!' a/1/hold@hold # zfs holds a/1/hold@hold NAME TAG TIMESTAMP a/1/hold@hold saveme! Wed Feb 20 15:06:29 2013 # zfs destroy -r a/1 cannot destroy 'a/1/hold@hold': snapshot is busy Does this do what you want? (zpool destroy is already undo-able) Jan That suggestion makes the very bold assumption that you want a long-standing snapshot of the dataset. If it's a rapidly changing dataset, the snapshot will become an issue very quickly. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Feature Request for zfs pool/filesystem protection?
On Wed, Feb 20, 2013 at 4:49 PM, Markus Grundmann mar...@freebsduser.euwrote: Hi! My name is Markus and I living in germany. I'm new to this list and I have a simple question related to zfs. My favorite operating system is FreeBSD and I'm very happy to use zfs on them. It's possible to enhance the properties in the current source tree with an entry like protected? I find it seems not to be difficult but I'm not an professional C programmer. For more information please take a little bit of time and read my short post at http://forums.freebsd.org/**showthread.php?t=37895http://forums.freebsd.org/showthread.php?t=37895 I have reviewed some pieces of the source code in FreeBSD 9.1 to find out how difficult it was to add an pool / filesystem property as an additional security layer for administrators. Whenever I modify zfs pools or filesystems it's possible to destroy [on a bad day :-)] my data. A new property protected=on|off in the pool and/or filesystem can help the administrator for datalost (e.g. zpool destroy tank or zfs destroy tank/filesystem command will be rejected when protected=on property is set). It's anywhere here on this list their can discuss/forward this feature request? I hope you have understand my post ;-) Thanks and best regards, Markus I think you're underestimating your English, it's quite good :) In any case, I think the proposal is a good one. With the default behavior being off, it won't break anything for existing datasets, and it can absolutely help prevent a fat finger or a lack of caffeine ruining someone's day. If the feature is already there somewhere, I'm sure someone will chime in. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Feature Request for zfs pool/filesystem protection?
On Wed, Feb 20, 2013 at 5:09 PM, Richard Elling richard.ell...@gmail.comwrote: On Feb 20, 2013, at 2:49 PM, Markus Grundmann mar...@freebsduser.eu wrote: Hi! My name is Markus and I living in germany. I'm new to this list and I have a simple question related to zfs. My favorite operating system is FreeBSD and I'm very happy to use zfs on them. It's possible to enhance the properties in the current source tree with an entry like protected? I find it seems not to be difficult but I'm not an professional C programmer. For more information please take a little bit of time and read my short post at http://forums.freebsd.org/showthread.php?t=37895 I have reviewed some pieces of the source code in FreeBSD 9.1 to find out how difficult it was to add an pool / filesystem property as an additional security layer for administrators. Whenever I modify zfs pools or filesystems it's possible to destroy [on a bad day :-)] my data. A new property protected=on|off in the pool and/or filesystem can help the administrator for datalost (e.g. zpool destroy tank or zfs destroy tank/filesystem command will be rejected when protected=on property is set). Look at the delegable properties (zfs allow). For example, you can delegate a user to have specific privileges and then not allow them to destroy. Note: I'm only 99% sure this is implemented in FreeBSD, hopefully someone can verify. -- richard With the version of allow I'm looking at, unless I'm missing a setting, it looks like it'd be a complete nightmare. I see no concept of deny, so that means you either have to give *everyone* all permissions besides delete, or you have to go through every user/group on the box and give specific permissions and on top of not allowing destroy. And then if you change your mind later you have to go back through and give everyone you want to have that feature access to it. That seems like a complete PITA to me. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool
On Wed, Feb 20, 2013 at 5:46 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Thu, 21 Feb 2013, Sašo Kiselkov wrote: On 02/21/2013 12:27 AM, Peter Wood wrote: Will adding another vdev hurt the performance? In general, the answer is: no. ZFS will try to balance writes to top-level vdevs in a fashion that assures even data distribution. If your data is equally likely to be hit in all places, then you will not incur any performance penalties. If, OTOH, newer data is more likely to be hit than old data , then yes, newer data will be served from fewer spindles. In that case it is possible to do a send/receive of the affected datasets into new locations and then renaming them. You have this reversed. The older data is served from fewer spindles than data written after the new vdev is added. Performance with the newer data should be improved. Bob That depends entirely on how full the pool is when the new vdev is added, and how frequently the older data changes, snapshots, etc. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Feature Request for zfs pool/filesystem protection?
On Wed, Feb 20, 2013 at 6:47 PM, Richard Elling richard.ell...@gmail.comwrote: On Feb 20, 2013, at 3:27 PM, Tim Cook t...@cook.ms wrote: On Wed, Feb 20, 2013 at 5:09 PM, Richard Elling richard.ell...@gmail.comwrote: On Feb 20, 2013, at 2:49 PM, Markus Grundmann mar...@freebsduser.eu wrote: Hi! My name is Markus and I living in germany. I'm new to this list and I have a simple question related to zfs. My favorite operating system is FreeBSD and I'm very happy to use zfs on them. It's possible to enhance the properties in the current source tree with an entry like protected? I find it seems not to be difficult but I'm not an professional C programmer. For more information please take a little bit of time and read my short post at http://forums.freebsd.org/showthread.php?t=37895 I have reviewed some pieces of the source code in FreeBSD 9.1 to find out how difficult it was to add an pool / filesystem property as an additional security layer for administrators. Whenever I modify zfs pools or filesystems it's possible to destroy [on a bad day :-)] my data. A new property protected=on|off in the pool and/or filesystem can help the administrator for datalost (e.g. zpool destroy tank or zfs destroy tank/filesystem command will be rejected when protected=on property is set). Look at the delegable properties (zfs allow). For example, you can delegate a user to have specific privileges and then not allow them to destroy. Note: I'm only 99% sure this is implemented in FreeBSD, hopefully someone can verify. -- richard With the version of allow I'm looking at, unless I'm missing a setting, it looks like it'd be a complete nightmare. I see no concept of deny, so that means you either have to give *everyone* all permissions besides delete, or you have to go through every user/group on the box and give specific permissions and on top of not allowing destroy. And then if you change your mind later you have to go back through and give everyone you want to have that feature access to it. That seems like a complete PITA to me. :-) they don't call it idiot-proofing for nothing! :-) But seriously, one of the first great zfs-discuss wars was over the request for a -f flag for destroy. The result of the research showed that if you typed destroy then you meant it, and adding a -f flag just teaches you to type destroy -f instead. See also kill -9 -- richard I hear you, but in his scenario of using scripts for management, there isn't necessarily human interaction to confirm the operation (appropriately or not). Having a pool property seems like an easy way to prevent a mis-parsed or outright incorrect script from causing havoc on the system. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL
On Sun, Feb 17, 2013 at 8:58 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: Tim Cook [mailto:t...@cook.ms] Why would I spend all that time and energy participating in ANOTHER list controlled by Oracle, when they have shown they have no qualms about eliminating it with basically 0 warning, at their whim? From an open source, community perspective, I understand and agree with this sentiment. If OSS projects behave this way, they die. The purpose of an oracle-hosted mailing list is not for the sake of being open in any way. It's for the sake of allowing public discussions about their product. While a certain amount of knowledge will exist with or without the list (people can still download solaris 11 for evaluation purposes and test it out on the honor system) there will be less oracle-specific knowledge in existence without the list. For anyone who's 100% dedicated to OSS and/or illumos and doesn't care about oracle-specific stuff, there's no reason to use that list. But for those of us who are sysadmins, developers using eval-licensed solaris, or in any way not completely closed to the possibility of using oracle zfs / solaris... For those of us, it makes sense. Guess what, I formerly subscribed to netapp-toasters as well. Until zfs came along and I was able to happily put netapp in my past. Perhaps someday I'll leave zfs behind in favor of btrfs. But not yet. Guess what also, there is a very active thriving Microsoft forum out there too. And they don't even let you download MS Office or Windows for evaluation purposes - they're even more closed than Oracle in this regard. They learned their lesson about piracy and the honor system. ;-) We can agree to disagree. I think you're still operating under the auspices of Oracle wanting to have an open discussion. This is patently false. There's a reason why anytime someone has an issue the response from the Oracle team that posts here is almost always open a support ticket and give me the number. And then we never hear about it again/get the fix unless the end-user happens to come back and update us. If you think that Oracle is going to change that stance with a list hosted on Java.net, you're sadly mistaken. Their (collectively, I'm not speaking of any individual) only goal is to help paying customers. Period. The way they've decided to go about that is by hoarding knowledge. I've dealt with the company for over a decade, there will be no open discussions. NetApp has historically been open with their user community (although at times in recent history they have made the mistake of turtling up), which is why the toasters mailing list did as well as it did. Hell, Dave Hitz used to be a regular poster. MS forums are active and thriving because they've got a massive user base full of extremely experienced admins. If there was an open and free version of the MS products, I'm willing to bet that you'd find the closed source version a ghost town. For all the bashing MS has taken throughout history, they're a very open company relatively speaking. I can both browse their knowledge base and download hotfixes without any support contract. If you're going to have to open a support ticket to get help with issues anyways, why bother with a mailing list/forum? Just go straight to support. The reason THESE lists have done so well is because the guys who wrote the code actively participate and give detailed help in the open. If the only responses that ever came here were the Oracle responses to open up a ticket beyond anything but basic problems, this place would've died a long time ago. I think the saddest part of the whole situation is Oracle is so backwards and broken they don't even allow their employees to tell us what they aren't allowed to talk to us about. THAT is f-ed. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL
On Sat, Feb 16, 2013 at 10:47 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) opensolarisisdeadlongliveopensola...@nedharvey.com wrote: In the absence of any official response, I guess we just have to assume this list will be shut down, right? So I guess we just have to move to the illumos mailing list, as Deirdre suggests? ** ** ** ** ** ** *From:* zfs-discuss-boun...@opensolaris.org [mailto: zfs-discuss-boun...@opensolaris.org] *On Behalf Of *Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) *Sent:* Friday, February 15, 2013 11:00 AM *To:* zfs-discuss@opensolaris.org *Subject:* [zfs-discuss] zfs-discuss mailing list opensolaris EOL ** ** So, I hear, in a couple weeks' time, opensolaris.org is shutting down. What does that mean for this mailing list? Should we all be moving over to something at illumos or something? ** ** I'm going to encourage somebody in an official capacity at opensolaris to respond... I'm going to discourage unofficial responses, like, illumos enthusiasts etc simply trying to get people to jump this list. ** ** Thanks for any info ... That would be the logical decision, yes. Not to poke fun, but did you really expect an official response after YEARS of nothing from Oracle? This is the same company that refused to release any Java patches until the DHS issued a national warning suggesting that everyone uninstall Java. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL
On Sat, Feb 16, 2013 at 11:21 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: Tim Cook [mailto:t...@cook.ms] That would be the logical decision, yes. Not to poke fun, but did you really expect an official response after YEARS of nothing from Oracle? This is the same company that refused to release any Java patches until the DHS issued a national warning suggesting that everyone uninstall Java. Well, yes. We do have oracle employees who contribute to this mailing list. It is not accurate or fair to stereotype the whole company. Oracle by itself is as large as some cities or countries. I can understand a company policy of secrecy about development direction and stuff like that. I would think somebody would be able to officially confirm or deny that this mailing list is going to stop. At least one of their system administrators lurks here... We've got Oracle employees on the mailing list, that while helpful, in no way have the authority to speak for company policy. They've made that clear on numerous occasions And that doesn't change the fact that we literally have heard NOTHING from Oracle since the closing of OpensSolaris. 0 official statements, so I once gain ask: what do you think you were going to get in response to your questions? The reason you hear nothing from them on anything official is because it's a good way to lose your job. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL
On Sat, Feb 16, 2013 at 3:42 PM, cindy swearingen cindy.swearin...@gmail.com wrote: Hey Ned and Everyone, This was new news to use too and we're just talking over some options yesterday afternoon so please give us a chance to regroup and provide some alternatives. This list will be shutdown but we can start a new one on java.net. There is a huge ecosystem around Solaris and ZFS, particularly within Oracle. Many of us are still here because we are passionate about ZFS, Solaris 11 and even Solaris 10. I think we have a great product and a lot of info to share. If you're interested in a rejuvenated ZFS discuss list on java.net, then drop me a note: cindy.swearin...@oracle.com We are also considering a new ZFS page in that community as well. Oracle is very committed to Solaris and ZFS, but they want to consolidate their community efforts on java.net, retire some old hardware, and so on. If you are an Oracle customer with a support contract and you are using Solaris and ZFS, and you want to discuss support issues, you should consider that list as well: https://communities.oracle.com/portal/server.pt/community/oracle_solaris_zfs_file_system/526 Thanks, Cindy While I'm sure many appreciate the offer as I do, I can tell you for me personally: never going to happen. Why would I spend all that time and energy participating in ANOTHER list controlled by Oracle, when they have shown they have no qualms about eliminating it with basically 0 warning, at their whim? I'll be sticking to the illumos lists that I'm confident will be turned over to someone else should the current maintainers decide they no longer wish to contribute to the project. On the flip side, I think we welcome all Oracle employees to participate in that list should corporate policy allow you to. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] maczfs / ZEVO
On Fri, Feb 15, 2013 at 10:08 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) opensolarisisdeadlongliveopensola...@nedharvey.com wrote: Anybody using maczfs / ZEVO? Have good or bad things to say, in terms of reliability, performance, features? ** ** My main reason for asking is this: I have a mac, I use Time Machine, and I have VM's inside. Time Machine, while great in general, has the limitation of being unable to intelligently identify changed bits inside a VM file. So you have to exclude the VM from Time Machine, and you have to run backup software inside the VM. ** ** I would greatly prefer, if it's reliable, to let the VM reside on ZFS and use zfs send to backup my guest VM's. ** ** I am not looking to replace HFS+ as the primary filesystem of the mac; although that would be cool, there's often a reliability benefit to staying on the supported, beaten path, standard configuration. But if ZFS can be used to hold the guest VM storage reliably, I would benefit from that. ** ** Thanks... I have a few coworkers using it. No horror stories and it's been in use about 6 months now. If there were any showstoppers I'm sure I'd have heard loud complaints by now :) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS monitoring
On Mon, Feb 11, 2013 at 9:53 AM, Borja Marcos bor...@sarenet.es wrote: Hello, I'n updating Devilator, the performance data collector for Orca and FreeBSD to include ZFS monitoring. So far I am graphing the ARC and L2ARC size, L2ARC writes and reads, and several hit/misses data pairs. Any suggestions to improve it? What other variables can be interesting? An example of the current state of the program is here: http://devilator.frobula.com Thanks, Borja. The zpool iostat output has all sorts of statistics I think would be useful/interesting to record over time. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
On Sun, Jan 20, 2013 at 6:19 PM, Richard Elling richard.ell...@gmail.comwrote: On Jan 20, 2013, at 8:16 AM, Edward Harvey imaginat...@nedharvey.com wrote: But, by talking about it, we're just smoking pipe dreams. Cuz we all know zfs is developmentally challenged now. But one can dream... I disagree the ZFS is developmentally challenged. There is more development now than ever in every way: # of developers, companies, OSes, KLOCs, features. Perhaps the level of maturity makes progress appear to be moving slower than it seems in early life? -- richard Well, perhaps a part of it is marketing. Maturity isn't really an excuse for not having a long-term feature roadmap. It seems as though maturity in this case equals stagnation. What are the features being worked on we aren't aware of? The big ones that come to mind that everyone else is talking about for not just ZFS but openindiana as a whole and other storage platforms would be: 1. SMB3 - hyper-v WILL be gaining market share over the next couple years, not supporting it means giving up a sizeable portion of the market. Not to mention finally being able to run SQL (again) and Exchange on a fileshare. 2. VAAI support. 3. the long-sought bp-rewrite. 4. full drive encryption support. 5. tiering (although I'd argue caching is superior, it's still a checkbox). There's obviously more, but those are just ones off the top of my head that others are supporting/working on. Again, it just feels like all the work is going into fixing bugs and refining what is there, not adding new features. Obviously Saso personally added features, but overall there don't seem to be a ton of announcements to the list about features that have been added or are being actively worked on. It feels like all these companies are just adding niche functionality they need that may or may not be getting pushed back to mainline. /debbie-downer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 08.01.2013 20:43, Krunal Desai wrote: On Mon, Jan 7, 2013 at 4:16 PM, Sašo Kiselkov skiselkov...@gmail.com wrote: PERC H200 are well behaved cards that are easy to reflash and work well (even in JBOD mode) on Illumos - they are essentially a LSI SAS 9211. If you can get them, they're one heck of a reliable beast, and cheap too! That method that was linked seemed very specific to Dell servers; from my experience with reflashing various LSI cards, can't I just USB boot to a FreeDOS environment in any system, and then run sasflash/sas2flsh with the appropriate IT-mode firmware? It is indeed very specific to Dell cards and I've actually tried to use the generic instructions for the M1015 and it failed because it the card didn't match the firmware version. The normal method with an M1015 is wipe raid firmware with the megarec (Megaraid recovery tool), reboot and flash on IT firmware. The Dell method is more involved but it's the only why that I've managed to got a Dell H200 cross flashed. Seems like the M1015 has spiked in price again on eBay (US) whilst the H200 is still under $100. -- Tim Fletcher t...@night-shade.org.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HP Proliant DL360 G7
On 08.01.2013 18:30, Edmund White wrote: The D2600 and D2700 enclosures are fully supported as Nexenta JBODs. [1] I run them in multiple production environments [2]. [2] I *could* use an HP-branded LSI controller (SC08Ge [3]), but I prefer the higher performance of the LSI 9211 and 9205e HBA's. The HP H221 is the newer SAS2008 based HBA that replaces the SC08Ge, it's definitely a pure HBA as I have one but I don't have any external disk shelves to test with currently. http://h18004.www1.hp.com/products/quickspecs/14222_div/14222_div.html -- Tim Fletcher t...@night-shade.org.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 07/01/13 14:01, Andrzej Sochon wrote: Hello *Sašo*! I found you here: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-May/051546.html “How about reflashing LSI firmware to the card? I read on Dell's spec sheets that the card runs an LSISAS2008 chip, so chances are that standard LSI firmware will work on it. I can send you all the required bits to do the reflash, if you like.” I got Dell Perc H310 controller for do-it-yourself experiments, I tried to run it on non-Dell PC platforms like Asus P5Q and Foxconn G31MX. Without success. I will appreciate very much any hint how to get LSI firmware and reflash Dell H310. I've successfully crossflashed Dell H200 cards with this method http://forums.servethehome.com/showthread.php?467-DELL-H200-Flash-to-IT-firmware-Procedure-for-DELL-servers -- Tim Fletcher t...@night-shade.org.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?
On 07/01/13 21:16, Sašo Kiselkov wrote: On 01/07/2013 09:32 PM, Tim Fletcher wrote: On 07/01/13 14:01, Andrzej Sochon wrote: Hello *Sašo*! I found you here: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-May/051546.html “How about reflashing LSI firmware to the card? I read on Dell's spec sheets that the card runs an LSISAS2008 chip, so chances are that standard LSI firmware will work on it. I can send you all the required bits to do the reflash, if you like.” I got Dell Perc H310 controller for do-it-yourself experiments, I tried to run it on non-Dell PC platforms like Asus P5Q and Foxconn G31MX. Without success. I will appreciate very much any hint how to get LSI firmware and reflash Dell H310. I've successfully crossflashed Dell H200 cards with this method http://forums.servethehome.com/showthread.php?467-DELL-H200-Flash-to-IT-firmware-Procedure-for-DELL-servers PERC H200 are well behaved cards that are easy to reflash and work well (even in JBOD mode) on Illumos - they are essentially a LSI SAS 9211. If you can get them, they're one heck of a reliable beast, and cheap too! The modular version of the card is often cheaper and takes 2 minutes with a torx driver to take off the black plastic L. I've bought one of these before and it worked well: http://www.ebay.co.uk/itm/170888398081 -- Tim Fletcher t...@night-shade.org.uk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dm-crypt + ZFS on Linux
On Fri, Nov 23, 2012 at 9:49 AM, John Baxter johnleebax...@gmail.comwrote: We have the need to encypt our data, approximately 30TB on three ZFS volumes under Solaris 10. The volumes currently reside on iscsi sans connected via 10Gb/s ethernet. We have tested Solaris 11 with ZFS encrypted volumes and found the performance to be very poor and have an open bug report with Oracle. We are a Linux shop and since performance is so poor and still no resolution, we are considering ZFS on Linux with dm-crypt. I have read once or twice that if we implemented ZFS + dm-crypt we would loose features, however which features are not specified. We currently mirror the volumes across identical iscsi sans with ZFS and we use hourly ZFS snapshots to update our DR site. Which features of ZFS are lost if we use dm-crypt? My guess would be they are related to raidz but unsure. Why don't you just use a SAN that supports full drive encryption? There should be basically 0 performance overhead. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Appliance as a general-purpose server question
On Thu, Nov 22, 2012 at 10:50 AM, Jim Klimov jimkli...@cos.ru wrote: On 2012-11-22 17:31, Darren J Moffat wrote: Is it possible to use the ZFS Storage appliances in a similar way, and fire up a Solaris zone (or a few) directly on the box for general-purpose software; or to shell-script administrative tasks such as the backup archive management in the global zone (if that concept still applies) as is done on their current Solaris-based box? No it is a true appliance, it might look like it has Solaris underneath but it is just based on Solaris. You can script administrative tasks but not using bash/ksh style scripting you use the ZFSSA's own scripting language. So, the only supported (or even possible) way is indeed to us it as NAS for file or block IO from another head running the database or application servers?.. In the Datasheet I read that Cloning and Remote replication are separately licensed features; does this mean that the capability for zfs send|zfs recv backups from remote Solaris systems should be purchased separately? :( I wonder if it would make weird sense to get the boxes, forfeit the cool-looking Fishworks, and install Solaris/OI/Nexenta/whatever to get the most flexibility and bang for a buck from the owned hardware... Or, rather, shop for the equivalent non-appliance servers... //Jim You'd be paying a massive premium to buy them and then install some other OS on them. You'd be far better off buying equivalent servers. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment
On Mon, Nov 12, 2012 at 10:39 AM, Trond Michelsen tron...@gmail.com wrote: On Sat, Nov 10, 2012 at 5:00 PM, Tim Cook t...@cook.ms wrote: On Sat, Nov 10, 2012 at 9:48 AM, Jan Owoc jso...@gmail.com wrote: On Sat, Nov 10, 2012 at 8:14 AM, Trond Michelsen tron...@gmail.com wrote: How can I replace the drive without migrating all the data to a different pool? It is possible, I hope? I had the same problem. I tried copying the partition layout and some other stuff but without success. I ended up having to recreate the pool and now have a non-mirrored root fs. If anyone has figured out how to mirror drives after getting the message about sector alignment, please let the list know :-). Not happening with anything that exists today. The only way this would be possible is with bp_rewrite which would allow you to evacuate a vdev (whether it be for a situation like this, or just to shrink a pool). What you're trying to do is write a block for block copy to a disk that's made up of a different block structure. Not happening. That is disappointing. I'll probably manage to find a used 2TB drive with 512b blocksize, so I'm sure I'll be able to keep the pool alive, but I had planned to swap all 2TB drives for 4TB drives within a year or so. This is apparently not an option anymore. I'm also a bit annoyed, because I cannot remember seeing any warnings (other than performance wise) about mixing 512b and 4kB blocksize discs in a pool, or any warnings that you'll be severely restricted if you use 512b blocksize discs at all. *insert everyone saying they want bp_rewrite and the guys who have the skills to do so saying their enterprise customers have other needs* bp_rewrite is what's needed to remove vdevs, right? If so, yes, being able to remove (or replace) a vdev, would've solved my problem. However, I don't see how this could not be desirable for enterprise customers. 512b blocksize discs are rapidly disappearing from the market. Enterprise discs fail ocasionally too, and if 512b blocksize discs can't be replaced by 4kB blocksize discs, then that effectively means that you can't replace failed drives on ZFS. I would think that this is a desirable feature of an enterprise storage solution. Enterprise customers are guaranteed equivalent replacement drives for the life of the system. Generally 3-5 years. At the end of that cycle, they buy all new hardware and simply migrate the data. It's generally a non-issue due to the way gear is written off. --TIm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment
On Sat, Nov 10, 2012 at 9:48 AM, Jan Owoc jso...@gmail.com wrote: On Sat, Nov 10, 2012 at 8:14 AM, Trond Michelsen tron...@gmail.com wrote: When I try to replace the old drive, I get this error: # zpool replace tank c4t5000C5002AA2F8D6d0 c4t5000C5004DE863F2d0 cannot replace c4t5000C5002AA2F8D6d0 with c4t5000C5004DE863F2d0: devices have different sector alignment How can I replace the drive without migrating all the data to a different pool? It is possible, I hope? I had the same problem. I tried copying the partition layout and some other stuff but without success. I ended up having to recreate the pool and now have a non-mirrored root fs. If anyone has figured out how to mirror drives after getting the message about sector alignment, please let the list know :-). Jan Not happening with anything that exists today. The only way this would be possible is with bp_rewrite which would allow you to evacuate a vdev (whether it be for a situation like this, or just to shrink a pool). What you're trying to do is write a block for block copy to a disk that's made up of a different block structure. Not happening. *insert everyone saying they want bp_rewrite and the guys who have the skills to do so saying their enterprise customers have other needs* --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment
On Sat, Nov 10, 2012 at 9:59 AM, Jan Owoc jso...@gmail.com wrote: On Sat, Nov 10, 2012 at 8:48 AM, Jan Owoc jso...@gmail.com wrote: On Sat, Nov 10, 2012 at 8:14 AM, Trond Michelsen tron...@gmail.com wrote: When I try to replace the old drive, I get this error: # zpool replace tank c4t5000C5002AA2F8D6d0 c4t5000C5004DE863F2d0 cannot replace c4t5000C5002AA2F8D6d0 with c4t5000C5004DE863F2d0: devices have different sector alignment How can I replace the drive without migrating all the data to a different pool? It is possible, I hope? I had the same problem. I tried copying the partition layout and some other stuff but without success. I ended up having to recreate the pool and now have a non-mirrored root fs. If anyone has figured out how to mirror drives after getting the message about sector alignment, please let the list know :-). Sorry... my question was partly answered by Jim Klimov on this list: http://openindiana.org/pipermail/openindiana-discuss/2012-June/008546.html Apparently the currently-suggested way (at least in OpenIndiana) is to: 1) create a zpool on the 4k-native drive 2) zfs send | zfs receive the data 3) mirror back onto the non-4k drive I can't test it at the moment on my setup - has anyone tested this to work? Jan That would absolutely work, but it's not really a fix for this situation. For OP to do this he'd need 42 new drives (or at least enough drives to provide the same capacity as what he's using) to mirror to and then mirror back. The only way this is happening for most people is if they only have a very small pool, and have the ability to add an equal amount of storage to dump to. Probably not a big deal if you've only got a handful of drives, or if the drives you have are small and you can take downtime. Likely impossible for OP with 42 large drives. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FC HBA for openindiana
On Sun, Oct 21, 2012 at 1:41 PM, Erik Trimble tr...@netdemons.com wrote: Do make sure you're getting one that has the proper firmware. Those with BIOS don't work in SPARC boxes, and those with OpenBoot don't work in x64 stuff. A quick Sun FC HBA search on ebay turns up a whole list of stuff that's official Sun HBAs, which will give you an idea of the (max) pricing you'll be paying. There's currently a *huge* price difference between the 4Gb and 2Gb adapters. Also, keep in mind that PCI-X adapters are far more common at the 1/2Gb range, while PCI-E starts to be the most common choice at 4Gb+ Here's a list of all the old Sun FC HBAs (which can help you sort out which are for x64 systems, and which were for SPARC systems): http://www.oracle.com/technetwork/documentation/oracle-storage-networking-190061.html As Tim said, these should all have built-in drivers in the Illumos codebase. -Erik The only ones that have that limitation are the Sun OEM cards. If you buy a QLogic retail card you can use it in either system. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] portable zfs send streams (preview webrev)
On Sat, Oct 20, 2012 at 2:54 AM, Arne Jansen sensi...@gmx.net wrote: On 10/20/2012 01:10 AM, Tim Cook wrote: On Fri, Oct 19, 2012 at 3:46 PM, Arne Jansen sensi...@gmx.net mailto:sensi...@gmx.net wrote: On 10/19/2012 09:58 PM, Matthew Ahrens wrote: On Wed, Oct 17, 2012 at 5:29 AM, Arne Jansen sensi...@gmx.net mailto:sensi...@gmx.net mailto:sensi...@gmx.net mailto:sensi...@gmx.net wrote: We have finished a beta version of the feature. A webrev for it can be found here: http://cr.illumos.org/~webrev/sensille/fits-send/ It adds a command 'zfs fits-send'. The resulting streams can currently only be received on btrfs, but more receivers will follow. It would be great if anyone interested could give it some testing and/or review. If there are no objections, I'll send a formal webrev soon. Please don't bother changing libzfs (and proliferating the copypasta there) -- do it like lzc_send(). ok. It would be easier though if zfs_send would also already use the new style. Is it in the pipeline already? Likewise, zfs_ioc_fits_send should use the new-style API. See the comment at the beginning of zfs_ioctl.c. I'm not a fan of the name FITS but I suppose somebody else already named the format. If we are going to follow someone else's format though, it at least needs to be well-documented. Where can we find the documentation? FYI, #1 google hit for FITS: http://en.wikipedia.org/wiki/FITS #3 hit: http://code.google.com/p/fits/ Both have to do with file formats. The entire first page of google results for FITS format and FITS file format are related to these two formats. FITS btrfs didn't return anything specific to the file format, either. It's not too late to change it, but I have a hard time coming up with some better name. Also, the format is still very new and I'm sure it'll need some adjustments. -arne --matt I'm sure we can come up with something. Are you planning on this being solely for ZFS, or a larger architecture for replication both directions in the future? We have senders for zfs and btrfs. The planned receiver will be mostly filesystem agnostic and can work on a much broader range. It basically only needs to know how to create snapshots and where to store a few meta informations. It would be great if more filesystems would join on the sending side, but I have no involvement there. I see no basic problem in choosing a name that's already in use. Especially with file extensions most will be already taken. How about something with 'portable' and 'backup', like pib or pibs? 'i' for incremental. -Arne Re-using names generally isn't a big deal, but in this case the existing name is a technology that's extremely similar to what you're doing - which WILL cause a ton of confusion in the userbase, and make troubleshooting far more difficult when searching google/etc looking for links to documents that are applicable. Maybe something like far - filesystem agnostic replication? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FC HBA for openindiana
The built in drivers support Mpha so you're good to go. On Friday, October 19, 2012, Christof Haemmerle wrote: Yep i Need. 4 Gig with multipathing if possible. On Oct 19, 2012, at 10:34 PM, Tim Cook t...@cook.ms javascript:_e({}, 'cvml', 't...@cook.ms'); wrote: On Friday, October 19, 2012, Christof Haemmerle wrote: hi there, i need to connect some old raid subsystems to a opensolaris box via fibre channel. can you recommend any FC HBA? thanx __ How old? If its 1gbit you'll need a 4gb or slower hba. Qlogic would be my preference. You should be able to find a 2340 for cheap on eBay. Or a 2460 if you want 4gb. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs] portable zfs send streams (preview webrev)
On Fri, Oct 19, 2012 at 3:46 PM, Arne Jansen sensi...@gmx.net wrote: On 10/19/2012 09:58 PM, Matthew Ahrens wrote: On Wed, Oct 17, 2012 at 5:29 AM, Arne Jansen sensi...@gmx.net mailto:sensi...@gmx.net wrote: We have finished a beta version of the feature. A webrev for it can be found here: http://cr.illumos.org/~webrev/sensille/fits-send/ It adds a command 'zfs fits-send'. The resulting streams can currently only be received on btrfs, but more receivers will follow. It would be great if anyone interested could give it some testing and/or review. If there are no objections, I'll send a formal webrev soon. Please don't bother changing libzfs (and proliferating the copypasta there) -- do it like lzc_send(). ok. It would be easier though if zfs_send would also already use the new style. Is it in the pipeline already? Likewise, zfs_ioc_fits_send should use the new-style API. See the comment at the beginning of zfs_ioctl.c. I'm not a fan of the name FITS but I suppose somebody else already named the format. If we are going to follow someone else's format though, it at least needs to be well-documented. Where can we find the documentation? FYI, #1 google hit for FITS: http://en.wikipedia.org/wiki/FITS #3 hit: http://code.google.com/p/fits/ Both have to do with file formats. The entire first page of google results for FITS format and FITS file format are related to these two formats. FITS btrfs didn't return anything specific to the file format, either. It's not too late to change it, but I have a hard time coming up with some better name. Also, the format is still very new and I'm sure it'll need some adjustments. -arne --matt I'm sure we can come up with something. Are you planning on this being solely for ZFS, or a larger architecture for replication both directions in the future? --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FC HBA for openindiana
On Friday, October 19, 2012, Christof Haemmerle wrote: hi there, i need to connect some old raid subsystems to a opensolaris box via fibre channel. can you recommend any FC HBA? thanx __ How old? If its 1gbit you'll need a 4gb or slower hba. Qlogic would be my preference. You should be able to find a 2340 for cheap on eBay. Or a 2460 if you want 4gb. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best way to measure performance of ZIL
On 10/01/2012 09:09 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: Just perform a bunch of writes, time it. Then set sync=disabled, perform the same set of writes, time it. Then enable sync, add a ZIL device, time it. The third option will be somewhere in between the first two. To perform a bunch of writes, vdbench is a very useful tool. https://blogs.oracle.com/henk/entry/vdbench_a_disk_and_tape http://sourceforge.net/projects/vdbench/files/vdbench503beta/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vm server storage mirror
On Thu, Sep 27, 2012 at 12:48 PM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: Tim Cook [mailto:t...@cook.ms] Sent: Wednesday, September 26, 2012 3:45 PM I would suggest if you're doing a crossover between systems, you use infiniband rather than ethernet. You can eBay a 40Gb IB card for under $300. Quite frankly the performance issues should become almost a non- factor at that point. I like that idea too - but I thought IB couldn't do crossover. I thought a switch is required? Crossover should be fine as long as you have a subnet manager on one of the hosts. Now you're going to ask me where you can get a subnet manager for illumos/solaris/whatever, and I'm going to have to plead the fifth because I haven't looked into it. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vm server storage mirror
On Wed, Sep 26, 2012 at 12:54 PM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) opensolarisisdeadlongliveopensola...@nedharvey.com wrote: Here's another one. ** ** Two identical servers are sitting side by side. They could be connected to each other via anything (presently using crossover ethernet cable.) And obviously they both connect to the regular LAN. You want to serve VM's from at least one of them, and even if the VM's aren't fault tolerant, you want at least the storage to be live synced. The first obvious thing to do is simply cron a zfs send | zfs receive at a very frequent interval. But there are a lot of downsides to that - besides the fact that you have to settle for some granularity, you also have a script on one system that will clobber the other system. So in the event of a failure, you might promote the backup into production, and you have to be careful not to let it get clobbered when the main server comes up again. ** ** I like much better, the idea of using a zfs mirror between the two systems. Even if it comes with a performance penalty, as a result of bottlenecking the storage onto Ethernet. But there are several ways to possibly do that, and I'm wondering which will be best. ** ** Option 1: Each system creates a big zpool of the local storage. Then, create a zvol within the zpool, and export it iscsi to the other system. Now both systems can see a local zvol, and a remote zvol, which it can use to create a zpool mirror. The reasons I don't like this idea are because it's a zpool within a zpool, including the double-checksumming and everything. But the double-checksumming isn't such a concern to me - I'm mostly afraid some horrible performance or reliability problem might be resultant. Naturally, you would only zpool import the nested zpool on one system. The other system would basically just ignore it. But in the event of a primary failure, you could force import the nested zpool on the secondary system. ** ** Option 2: At present, both systems are using local mirroring ,3 mirror pairs of 6 disks. I could break these mirrors, and export one side over to the other system... And vice versa. So neither server will be doing local mirroring; they will both be mirroring across iscsi to targets on the other host. Once again, each zpool will only be imported on one host, but in the event of a failure, you could force import it on the other host. ** ** Can anybody think of a reason why Option 2 would be stupid, or can you think of a better solution? I would suggest if you're doing a crossover between systems, you use infiniband rather than ethernet. You can eBay a 40Gb IB card for under $300. Quite frankly the performance issues should become almost a non-factor at that point. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about ZFS snapshots
On Fri, Sep 21, 2012 at 12:05 AM, Stefan Ring stefan...@gmail.com wrote: On Fri, Sep 21, 2012 at 6:31 AM, andy thomas a...@time-domain.co.uk wrote: I have a ZFS filseystem and create weekly snapshots over a period of 5 weeks called week01, week02, week03, week04 and week05 respectively. Ny question is: how do the snapshots relate to each other - does week03 contain the changes made since week02 or does it contain all the changes made since the first snapshot, week01, and therefore includes those in week02? Every snapshot is based on the previous one and store only what is needed to capture the differences. To rollback to week03, it's necesaary to delete snapshots week04 and week05 first but what if week01 and week02 have also been deleted - will the rollback still work or is it ncessary to keep earlier snapshots? No, it's not necessary. You can rollback to any snapshot. I almost never use rollback though, in normal use. If I've accidentally deleted or overwritten something, I just rsync it over from the corresponding /.zfs/snapshots directory. Only if what I want to restore is huge, rollback might be a better option. I wasn't going to jump into this quagmire, but I will. To the second question, if you've got snaps 1-5, and you roll back to snap 3, you will lose snaps 4 and 5. As part of the rollback, they will be discarded. As will any other changes made since snap 3. If you delete snap 1 or snap 2, any blocks they have in common with snapshot 3 will be retained, you will simply see snap 3 grow because those blocks will now be accounted for under snap 3 instead of snap 1 or snap 2. Any blocks that were not shared with snap 3 will be discarded. Another point since you seem to be new to snapshots that I'll illustrate with an example. Say you've got snap 1, and in snap 1 you've got file 1. File 1 is made up of 20 blocks. If you overwrite blocks 1-10 of file 1 50 times before you take snapshot 2, snapshot 2 will only capture the final state of the file. You will not get 50 revisions of the file. This is not continuous data protection it's a point in time copy. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] all in one server
On Tue, Sep 18, 2012 at 2:02 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Tue, 18 Sep 2012, Erik Ableson wrote: The bigger issue you'll run into will be data sizing as a year's worth of snapshot basically means that you're keeping a journal of every single write that's occurred over the year. If you are running The above is not a correct statement. The snapshot only preserves the file-level differences between the points in time. A snapshot does not preserve every single write. Zfs does not even send every single write to underlying disk. In some usage models, the same file may be re-written 100 times between snapshots, or might not ever appear in any snapshot. Depending on how frequently you're taking snapshots, your change rate, and how long you keep the snapshots around, it may very well be true. It's not universally true, but it's also no universally false. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL devices and fragmentation
On Mon, Jul 30, 2012 at 12:44 PM, Richard Elling richard.ell...@gmail.comwrote: On Jul 30, 2012, at 10:20 AM, Roy Sigurd Karlsbakk wrote: - Opprinnelig melding - On Mon, Jul 30, 2012 at 9:38 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: Also keep in mind that if you have an SLOG (ZIL on a separate device), and then lose this SLOG (disk crash etc), you will probably lose the pool. So if you want/need SLOG, you probably want two of them in a mirror… That's only true on older versions of ZFS. ZFSv19 (or 20?) includes the ability to import a pool with a failed/missing log device. You lose any data that is in the log and not in the pool, but the pool is importable. Are you sure? I booted this v28 pool a couple of months back, and found it didn't recognize its pool, apparently because of a missing SLOG. It turned out the cache shelf was disconnected, after re-connecting it, things worked as planned. I didn't try to force a new import, though, but it didn't boot up normally, and told me it couldn't import its pool due to lack of SLOG devices. Positive. :) I tested it with ZFSv28 on FreeBSD 9-STABLE a month or two ago. See the updated man page for zpool, especially the bit about import -m. :) On 151a2, man page just says 'use this or that mountpoint' with import -m, but the fact was zpool refused to import the pool at boot when 2 SLOG devices (mirrored) and 10 L2ARC devices were offline. Should OI/Illumos be able to boot cleanly without manual action with the SLOG devices gone? No. Missing slogs is a potential data-loss condition. Importing the pool without slogs requires acceptance of the data-loss -- human interaction. -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 I would think a flag to allow you to automatically continue with a disclaimer might be warranted (default behavior obviously requiring human input). --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New fast hash algorithm - is it needed?
On Thu, Jul 12, 2012 at 9:14 AM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: Jim Klimov [mailto:jimkli...@cos.ru] Sent: Thursday, July 12, 2012 8:42 AM To: Edward Ned Harvey Subject: Re: [zfs-discuss] New fast hash algorithm - is it needed? 2012-07-11 18:03, Edward Ned Harvey пишет: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Sašo Kiselkov As your dedup ratio grows, so does the performance hit from dedup=verify. At, say, dedupratio=10.0x, on average, every write results in 10 reads. Why? If you intend to write a block, and you discover it has a matching checksum, you only need to verify it matches one block. You don't need to verify it matches all the other blocks that have previously been verified to match each other. As Saso explained, if you wrote the same block 10 times and detected it was already deduped, then by verifying this detection 10 times you get about 10 extra reads. (In this case, Jim, you wrote me off-list and I replied on-list, but in this case, I thought it would be ok because this message doesn't look private or exclusive to me. I apologize if I was wrong.) I get the miscommunication now - When you write the duplicate block for the 10th time, we all understand you're not going to go back and verify 10 blocks. (It seemed, at least to me, that's what Saso was saying. Which is why I told him, No you don't.) You're saying, that when you wrote the duplicate block the 2nd time, you verified... And then when you wrote it the 3rd time, you verified... And the 4th time, and the 5th time... By the time you write the 10th time, you have already verified 9 previous times, but you're still going to verify again. Normally you would expect writing duplicate blocks dedup'd to be faster than writing the non-dedup'd data, because you get to skip the actual write. (This idealistic performance gain might be pure vapor due to need to update metadata, but ignoring that technicality, continue hypothetically...) When you verify, supposedly you're giving up that hypothetical performance gain, because you have to do a read instead of a write. So at first blush, it seems like no net-gain for performance. But if the duplicate block gets written frequently (for example a block of all zeros) then there's a high probability the duplicate block is already in ARC, so you can actually skip reading from disk and just read from RAM. So, the 10 extra reads will sometimes be true - if the duplicate block doesn't already exist in ARC. And the 10 extra reads will sometimes be false - if the duplicate block is already in ARC. Sasso: yes, it's absolutely worth implementing a higher performing hashing algorithm. I'd suggest simply ignoring the people that aren't willing to acknowledge basic mathematics rather than lashing out. No point in feeding the trolls. The PETABYTES of data Quantum and DataDomain have out there are proof positive that complex hashes get the job done without verify, even if you don't want to acknowledge the math behind it. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed
On Tue, Apr 24, 2012 at 12:16 AM, Matt Breitbach matth...@flash.shanje.comwrote: So this is a point of debate that probably deserves being brought to the floor (probably for the umpteenth time, but indulge me). I've heard from several people that I'd consider experts that once per year scrubbing is sufficient, once per quarter is _possibly_ excessive, and once a week is downright overkill. Since scrub thrashes your disk, I'd like to avoid it if at all possible. My opinion is that it depends on the data. If it's all data at rest, ZFS can't correct bit-rot if it's not read out on a regular interval. My biggest question on this? How often does bit-rot occur on media that isn't read or written to excessively, but just spinning most of the day and only has 10-20GB physically read from the spindles daily? We all know as data ages, it gets accessed less and less frequently. At what point should you be scrubbing that old data every few weeks to make sure a bit or two hasn't flipped? FYI - I personally scrub once per month. Probably overkill for my data, but I'm paranoid like that. -Matt -Original Message- How often do you normally run a scrub, before this happened? It's possible they were accumulating for a while but went undetected for lack of read attempts to the disk. Scrub more often! -- Dan. Personally unless the dataset is huge and you're using z3, I'd be scrubbing once a week. Even if it's z3, just do a window on Sunday's or something so that you at least make it through the whole dataset at least once a month. There's no reason NOT to scrub that I can think of other than the overhead - which shouldn't matter if you're doing it during off hours. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Aaron Toponce: Install ZFS on Debian GNU/Linux
Oracle never promised anything. A leaked internal memo does not signify an official company policy or statement. On Apr 18, 2012 11:13 AM, Freddie Cash fjwc...@gmail.com wrote: On Wed, Apr 18, 2012 at 7:54 AM, Cindy Swearingen cindy.swearin...@oracle.com wrote: Hmmm, how come they have encryption and we don't? As in Solaris releases, or some other we? I would guess he means Illumos, since it's mentioned in the very next sentence. :) Hmmm, how come they have encryption and we don't? Can it be backported to illumos ... It's too bad Oracle hasn't followed through (yet?) with their promise to open-source the ZFS (and other CDDL-licensed?) code in Solaris 11. :( -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Drive upgrades
On Fri, Apr 13, 2012 at 9:35 AM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Michael Armstrong Is there a way to quickly ascertain if my seagate/hitachi drives are as large as the 2.0tb samsungs? I'd like to avoid the situation of replacing all drives and then not being able to grow the pool... It doesn't matter. If you have a bunch of drives that are all approx the same size but vary slightly, and you make (for example) a raidz out of them, then the raidz will only be limited by the size of the smallest one. So you will only be wasting 1% of the drives that are slightly larger. Also, given that you have a pool currently made up of 13x2T and 5x1T ... I presume these are separate vdev's. You don't have one huge 18-disk raidz3, do you? that would be bad. And it would also mean that you're currently wasting 13x1T. I assume the 5x1T are a single raidzN. You can increase the size of these disks, without any cares about the size of the other 13. Just make sure you have the autoexpand property set. But most of all, make sure you do a scrub first, and make sure you complete the resilver in between each disk swap. Do not pull out more than one disk (or whatever your redundancy level is) while it's still resilvering from the previously replaced disk. If you're very thorough, you would also do a scrub in between each disk swap, but if it's just a bunch of home movies that are replaceable, you will probably skip that step. You will however have an issue replacing them if one should fail. You need to have the same block count to replace a device, which is why I asked for a right-sizing years ago. Deaf ears :/ --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Drive upgrades
On Fri, Apr 13, 2012 at 11:46 AM, Freddie Cash fjwc...@gmail.com wrote: On Fri, Apr 13, 2012 at 9:30 AM, Tim Cook t...@cook.ms wrote: You will however have an issue replacing them if one should fail. You need to have the same block count to replace a device, which is why I asked for a right-sizing years ago. Deaf ears :/ I thought ZFSv20-something added a if the blockcount is within 10%, then allow the replace to succeed feature, to work around this issue? -- Freddie Cash fjwc...@gmail.com That would be news to me. I'd love to hear it's true though. When I made the request there was excuse after excuse about how it would be difficult and Sun always provided replacement drives of identical size, etc (although there were people who responded who in fact had received drives from Sun of different sizes in RMA). I was hoping now that the braintrust had moved on from Sun that they'd embrace what I consider a common-sense decision, but I don't think it's happened. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?
of spread spares and/or declustered RAID would go into just making another write-block allocator in the same league raidz or mirror are nowadays... BTW, are such allocators pluggable (as software modules)? What do you think - can and should such ideas find their way into ZFS? Or why not? Perhaps from theoretical or real-life experience with such storage approaches? //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ZFS and performance consulting http://www.RichardElling.com illumos meetup, Jan 10, 2012, Menlo Park, CA http://www.meetup.com/illumos-User-Group/events/41665962/ As always, feel free to tell me why my rant is completely off base ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] S11 vs illumos zfs compatiblity
On Thu, Jan 5, 2012 at 9:32 AM, Richard Elling richard.ell...@gmail.comwrote: On Jan 5, 2012, at 6:53 AM, sol wrote: if a bug fixed in Illumos is never reported to Oracle by a customer, it would likely never get fixed in Solaris either :-( I would have liked to think that there was some good-will between the ex- and current-members of the zfs team, in the sense that the people who created zfs but then left Oracle still care about it enough to want the Oracle version to be as bug-free as possible. There is good-will between the developers. And the ZFS working group has representatives currently employed by Oracle. However, Oracle is a lawnmower. http://www.youtube.com/watch?v=-zRN7XLCRhc (Obviously I don't expect this to be the case for developers of all software but I think filesystem developers are a special breed!) They are! And there are a lot of really cool things happening in the wild as well as behind Oracle's closed doors. -- richard -- ZFS and performance consulting http://www.RichardElling.com illumos meetup, Jan 10, 2012, Menlo Park, CA http://www.meetup.com/illumos-User-Group/events/41665962/ Speaking of illumos, what exactly is the deal with the zfs discuss mailing list? There's all of 3 posts that show up for all of 2011. Am I missing something, or is there just that little traction currently? http://www.listbox.com/member/archive/182191/sort/time_rev/ --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Recovery: What do I try next?
On Thu, Dec 22, 2011 at 10:00 AM, Myers Carpenter my...@maski.org wrote: On Sat, Nov 5, 2011 at 2:35 PM, Myers Carpenter my...@maski.org wrote: I would like to pick the brains of the ZFS experts on this list: What would you do next to try and recover this zfs pool? I hate running across threads that ask a question and the person that asked them never comes back to say what they eventually did, so... To summarize: In late October I had two drives fail in a raidz1 pool. I was able to recover all the data from one drive, but the other could not be seen by the controller. Trying to zpool import was not working. I had 3 of the 4 drives, why couldn't I mount this. I read about every option in zdb and tried ones that might tell me something more about what was on this recovered drive. I eventually hit on zdb -p devs -e -lu /bank4/hd/devs/loop0 where /bank4/hd/devs/loop0 was a symlink back to /dev/loop0 where I had setup the disk image of the recovered drive. This showed the uberblocks which looked like this: Uberblock[1] magic = 00bab10c version = 26 txg = 23128193 guid_sum = 13396147021153418877 timestamp = 1316987376 UTC = Sun Sep 25 17:49:36 2011 rootbp = DVA[0]=0:2981f336c00:400 DVA[1]=0:1e8dcc01400:400 DVA[2]=0:3b16a3dd400:400 [L0 DMU objset] fletcher4 lzjb LE contiguous unique triple size=800L/200P birth=23128193L/23128193P fill=255 cksum=136175e0a4:79b27ae49c7:1857d594ca833:34ec76b965ae40 Then it all came clear: This drive had encountered errors one month before the other drive had failed and zfs had stopped writing to it. So the lesson here: Don't be a dumbass like me. Setup up nagios or some other system to alert you when a pool has become degraded. ZFS works very well with one drive out of the array, you aren't probably going to notice problems unless you are proactively looking for them. myers Or, if you aren't scrubbing on a regular basis, just change your zpool failmode property. Had you set it to wait or panic, it would've been very clear, very quickly that something was wrong. http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/ --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I create a mirror for a root rpool?
Do you still need to do the grub install? On Dec 15, 2011 5:40 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: Hi Anon, The disk that you attach to the root pool will need an SMI label and a slice 0. The syntax to attach a disk to create a mirrored root pool is like this, for example: # zpool attach rpool c1t0d0s0 c1t1d0s0 Thanks, Cindy On 12/15/11 16:20, Anonymous Remailer (austria) wrote: On Solaris 10 If I install using ZFS root on only one drive is there a way to add another drive as a mirror later? Sorry if this was discussed already. I searched the archives and couldn't find the answer. Thank you. __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss __**_ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On Tue, Nov 15, 2011 at 5:17 PM, Andrew Gabriel andrew.gabr...@oracle.comwrote: On 11/15/11 23:05, Anatoly wrote: Good day, The speed of send/recv is around 30-60 MBytes/s for initial send and 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk to 100+ disks in pool. But the speed doesn't vary in any degree. As I understand 'zfs send' is a limiting factor. I did tests by sending to /dev/null. It worked out too slow and absolutely not scalable. None of cpu/memory/disk activity were in peak load, so there is of room for improvement. Is there any bug report or article that addresses this problem? Any workaround or solution? I found these guys have the same result - around 7 Mbytes/s for 'send' and 70 Mbytes for 'recv'. http://wikitech-static.**wikimedia.org/articles/z/f/s/** Zfs_replication.htmlhttp://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk mirror, the send runs at almost 100Mbytes/sec, so it's pretty much limited by the ethernet. Since you have provided none of the diagnostic data you collected, it's difficult to guess what the limiting factor is for you. -- Andrew Gabriel So all the bugs have been fixed? I seem to recall people on this mailing list using mbuff to speed it up because it was so bursty and slow at one point. IE: http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/ --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On Tue, Oct 18, 2011 at 11:46 AM, Mark Sandrock mark.sandr...@oracle.comwrote: On Oct 18, 2011, at 11:09 AM, Nico Williams wrote: On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson wrote: I just wanted to add something on fsck on ZFS - because for me that used to make ZFS 'not ready for prime-time' in 24x7 5+ 9s uptime environments. Where ZFS doesn't have an fsck command - and that really used to bug me - it does now have a -F option on zpool import. To me it's the same functionality for my environment - the ability to try to roll back to a 'hopefully' good state and get the filesystem mounted up, leaving the corrupted data objects corrupted. [...] Yes, that's exactly what it is. There's no point calling it fsck because fsck fixes individual filesystems, while ZFS fixups need to happen at the volume level (at volume import time). It's true that this should have been in ZFS from the word go. But it's there now, and that's what matters, IMO. Doesn't a scrub do more than what 'fsck' does? Not really. fsck will work on an offline filesystem to correct errors and bring it back online. Scrub won't even work until the filesystem is already imported and online. If it's corrupted you can't even import it, hence the -F flag addition. Plus, IIRC, scrub won't actually correct any errors, it will only flag them. Manually fixing what scrub finds can be a giant pain. It's also true that this was never necessary with hardware that doesn't lie, but it's good to have it anyways, and is critical for personal systems such as laptops. IIRC, fsck was seldom needed at my former site once UFS journalling became available. Sweet update. Mark We all hope to never have to run fsck, but not having it at all is a bit of a non-starter in most environments. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On Tue, Oct 18, 2011 at 2:41 PM, Kees Nuyt k.n...@zonnet.nl wrote: On Tue, 18 Oct 2011 12:05:29 -0500, Tim Cook t...@cook.ms wrote: Doesn't a scrub do more than what 'fsck' does? Not really. fsck will work on an offline filesystem to correct errors and bring it back online. Scrub won't even work until the filesystem is already imported and online. If it's corrupted you can't even import it, hence the -F flag addition. Plus, IIRC, scrub won't actually correct any errors, it will only flag them. Manually fixing what scrub finds can be a giant pain. IIRC Scrub will correct errors if the pool has sufficient redundancy. So will any read of a corrupted block. http://hub.opensolaris.org/bin/view/Community+Group+zfs/selfheal -- ( Kees Nuyt ) c[_] Every scrub I've ever done that has found an error required manual fixing. Every pool I've ever created has been raid-z or raid-z2, so the silent healing, while a great story, has never actually happened in practice in any environment I've used ZFS in. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble peter.trib...@gmail.comwrote: On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook t...@cook.ms wrote: Every scrub I've ever done that has found an error required manual fixing. Every pool I've ever created has been raid-z or raid-z2, so the silent healing, while a great story, has never actually happened in practice in any environment I've used ZFS in. You have, of course, reported each such failure, because if that was indeed the case then it's a clear and obvious bug? For what it's worth, I've had ZFS repair data corruption on several occasions - both during normal operation and as a result of a scrub, and I've never had to intervene manually. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ Given that there are guides on how to manually fix the corruption, I don't see any need to report it. It's considered acceptable and expected behavior from everyone I've talked to at Sun... http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] about btrfs and zfs
On Tue, Oct 18, 2011 at 3:27 PM, Peter Tribble peter.trib...@gmail.comwrote: On Tue, Oct 18, 2011 at 9:12 PM, Tim Cook t...@cook.ms wrote: On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble peter.trib...@gmail.com wrote: On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook t...@cook.ms wrote: Every scrub I've ever done that has found an error required manual fixing. Every pool I've ever created has been raid-z or raid-z2, so the silent healing, while a great story, has never actually happened in practice in any environment I've used ZFS in. You have, of course, reported each such failure, because if that was indeed the case then it's a clear and obvious bug? For what it's worth, I've had ZFS repair data corruption on several occasions - both during normal operation and as a result of a scrub, and I've never had to intervene manually. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ Given that there are guides on how to manually fix the corruption, I don't see any need to report it. It's considered acceptable and expected behavior from everyone I've talked to at Sun... http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html If you have adequate redundancy, ZFS will - and does - repair errors. The document you quote is for the case where you don't actually have adequate redundancy: ZFS will refuse to make up data for you, and report back where the problem was. Exactly as designed. (And yes, I've come across systems without redundant storage, or had multiple simultaneous failures. The original statement was that if you have redundant copies of the data or, in the case of raidz, enough information to reconstruct it, then ZFS will repair it for you. Which has been exactly in accord with my experience.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ I had and have redundant storage, it has *NEVER* automatically fixed it. You're the first person I've heard that has had it automatically fix it. Per the page or an unlikely series of events conspired to corrupt multiple copies of a piece of data. Their unlikely series of events, that goes unnamed, is not that unlikely in my experience. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea
into understanding my idea better. And yes, I do also think that channeling disk over ethernet via one of the servers is a bad thing bound to degrade performance as opposed to what can be had anyway with direct disk access. Ethernet has *always* been faster than a HDD. Even back when we had 3/180s 10Mbps Ethernet it was faster than the 30ms average access time for the disks of the day. I tested a simple server the other day and round-trip for 4KB of data on a busy 1GbE switch was 0.2ms. Can you show a HDD as fast? Indeed many SSDs have trouble reaching that rate under load. As noted by other posters, access times are not bandwidth. So these are two different faster's ;) Besides, (1Gbps) Ethernet is faster than a single HDD stream. But it is not quite faster than an array of 14HDDs... And if Ethernet is utilized by its direct tasks - whatever they be, say video streaming off this server to 5000 viewers or whatever is needed to saturate the network, disk access over the same ethernet link would have to compete. And whatever the QoS settings, viewers would lose - either the real-time multimedia signal would lag, or the disk data to feed it. Moreover, usage of an external NAS (a dedicated server with Ethernet connection to the blade chassis) would make an external box dedicated and perhaps optimized to storage tasks (i.e. with ZIL/L2ARC), and would free up a blade for VM farming needs, but it would consume much of the LAN bandwidth of the blades using its storage services. Today, HDDs aren't fast, and are not getting faster. -- richard Well, typical consumer disks did get about 2-3 times faster for linear RW speeds over the past decade; but for random access they do still lag a lot. So, agreed ;) //Jim Quite frankly your choice in blade chassis was a horrible design decision. From your description of its limitations it should never be the building block for a vmware cluster in the first place. I would start by rethinking that decision instead of trying to pound a round ZFS peg into a square hole. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea
every drive that isn't being used to boot an existing server to this solaris host as individual disks, and let that server take care of RAID and presenting out the storage to the rests of the vmware hosts. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on zpool import. 200G of data inaccessible!
On Fri, Aug 19, 2011 at 4:43 AM, Stu Whitefish swhitef...@yahoo.com wrote: It seems that obtaining an Oracle support contract or a contract renewal is equally frustrating. I don't have any axe to grind with Oracle. I'm new to the Solaris thing and wanted to see if it was for me. If I was using this box to make money then sure I wouldn't have any problem paying for support. I don't expect handouts and I don't mind paying. I trusted ZFS because I heard it's for enterprise use and now I have 200G of data offline and not a peep from Oracle. Looking on the net I found another guy who had the same exact failure. To my way of thinking somebody needs to standup and get this fixed for us and make sure it doesn't happen to anybody else. If that happens I have no grudge against Oracle or Solaris. If it doesn't that's a pretty sour experience for someone to go through and it will definitely make me look at this whole thing in another light. I still believe somebody over there will do the right thing. I don't believe Oracle needs to hold people's data hostage to make money. I am sure they have enough good products and services to make money honestly. Jim You digitally signed a license agreement stating the following: *No Technical Support* Our technical support organization will not provide technical support, phone support, or updates to you for the Programs licensed under this agreement. To turn around and keep repeating that they're holding your data hostage is disingenuous at best. Nobody is holding your data hostage. You voluntarily put it on an operating system that explicitly states doesn't offer support from the parent company. Nobody from Oracle is going to show up with a patch for you on this mailing list because none of the Oracle employees want to lose their job and subsequently be subjected to a lawsuit. If that's what you're planning on waiting for, I'd suggest you take a new approach. Sorry to be a downer, but that's reality. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance question over NFS
What are the specs on the client? On Aug 18, 2011 10:28 AM, Thomas Nau thomas@uni-ulm.de wrote: Dear all. We finally got all the parts for our new fileserver following several recommendations we got over this list. We use Dell R715, 96GB RAM, dual 8-core Opterons 1 10GE Intel dual-port NIC 2 LSI 9205-8e SAS controllers 2 DataON DNS-1600 JBOD chassis 46 Seagate constellation SAS drives 2 STEC ZEUS RAM The base zpool config utilizes 42 drives plus the STECs as mirrored log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2 junks plus as said a dedicated ZIL made of the mirrored STECs. As a quick'n dirty check we ran filebench with the fileserver workload. Running locally we get statfile1 5476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu closefile3 5476ops/s 0.0mb/s 0.0ms/op 5us/op-cpu readfile1 5476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu openfile2 5477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu closefile2 5477ops/s 0.0mb/s 0.0ms/op 5us/op-cpu appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu openfile1 5477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu closefile1 5477ops/s 0.0mb/s 0.0ms/op 6us/op-cpu wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu with a single remote client (similar Dell System) using NFS statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu deletefile1 90ops/s 0.0mb/s 64.5ms/op 401us/op-cpu closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu appendfilerand1 90ops/s 0.7mb/s 0.5ms/op 101us/op-cpu openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu createfile1 90ops/s 0.0mb/s 226.5ms/op 709us/op-cpu the same remote client with zpool sync disabled on the server statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu closefile3 480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu closefile2 480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu closefile1 480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu Disabling ZIL is no option but I expected a much better performance especially the ZEUS RAM only gets us a speed-up of about 1.8x Is this test realistic for a typical fileserver scenario or does it require many more clients to push the limits? Thanks Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs any zfs-related programs, eats all RAM and dies in swapping hell
On Tue, Jun 14, 2011 at 3:16 PM, Frank Van Damme frank.vanda...@gmail.comwrote: 2011/6/10 Tim Cook t...@cook.ms: While your memory may be sufficient, that cpu is sorely lacking. Is it even 64bit? There's a reason intel couldn't give those things away in the early 2000s and amd was eating their lunch. A Pentium 4 is 32-bit. http://mail.opensolaris.org/mailman/listinfo/zfs-discuss EM64T was added to the Pentium 4 architecture with the D nomenclature, which is what he has. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Q: pool didn't expand. why? can I force it?
On Sun, Jun 12, 2011 at 3:54 AM, Johan Eliasson johan.eliasson.j...@gmail.com wrote: I replaced a smaller disk in my tank2, so now they're all 2TB. But look, zfs still thinks it's a pool of 1.5 TB disks: nebol@filez:~# zpool list tank2 NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT tank2 5.44T 4.20T 1.24T77% 1.00x ONLINE - nebol@filez:~# zpool status tank2 pool: tank2 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tank2 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 errors: No known data errors and: 6. c8t0d0 ATA-ST2000DL003-9VT1-CC32-1.82TB /pci@0,0/pci8086,29f1@1/pci8086,32c@0/pci11ab,11ab@1/disk@0,0 7. c8t1d0 ATA-ST2000DL003-9VT1-CC32-1.82TB /pci@0,0/pci8086,29f1@1/pci8086,32c@0/pci11ab,11ab@1/disk@1,0 8. c8t2d0 ATA-ST2000DL003-9VT1-CC32-1.82TB /pci@0,0/pci8086,29f1@1/pci8086,32c@0/pci11ab,11ab@1/disk@2,0 9. c8t3d0 ATA-ST2000DL003-9VT1-CC32-1.82TB So the question is, why didn't it expand? And can I fix it? Autoexpand is likely turned off. http://download.oracle.com/docs/cd/E19253-01/819-5461/githb/index.html --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Hard link space savings
On Sun, Jun 12, 2011 at 5:28 PM, Nico Williams n...@cryptonector.comwrote: On Sun, Jun 12, 2011 at 4:14 PM, Scott Lawson scott.law...@manukau.ac.nz wrote: I have an interesting question that may or may not be answerable from some internal ZFS semantics. This is really standard Unix filesystem semantics. [...] So total storage used is around ~7.5MB due to the hard linking taking place on each store. If hard linking capability had been turned off, this same message would have used 1500 x 2MB =3GB worth of storage. My question is there any simple ways of determining the space savings on each of the stores from the usage of hard links? [...] But... you just did! :) It's: number of hard links * (file size + sum(size of link names and/or directory slot size)). For sufficiently large files (say, larger than one disk block) you could approximate that as: number of hard links * file size. The key is the number of hard links, which will typically vary, but for e-mails that go to all users, well, you know the number of links then is the number of users. You could write a script to do this -- just look at the size and hard-link count of every file in the store, apply the above formula, add up the inflated sizes, and you're done. Nico PS: Is it really the case that Exchange still doesn't deduplicate e-mails? Really? It's much simpler to implement dedup in a mail store than in a filesystem... MS has had SIS since Exchange 4.0. They dumped it in 2010 because it was a huge source of their small random I/O's. In an effort to allow Exchange to be more storage friendly (IE: more of a large sequential I/O profile), they've done away with SIS. The defense for it is that you can buy more cheap storage for less money than you'd save with SIS and 15k rpm disks. Whether that's factual I suppose is for the reader to decide. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs any zfs-related programs, eats all RAM and dies in swapping hell
it nearly occurs, I have only a few seconds of uptime left, and since each run boot-to-crash takes roughly 2-3 hours now, I am unlikely to be active at the console at these critical few seconds. And the sync would likely never return in this case, too. Delete's, and dataset / snapshot deletes are not managed correctly in a deduped environment in ZFS. This is a known problem although it should not be anywhere nearly as bad as what you are describing in the current tip. Well, it is, on a not lowest-end hardware (at least in terms of what OpenSolaris developers can expect from a general enthusiast community which is supposed to help by testing, deploying and co-developing the best OS). The part where such deletes are slow are understandable and explainable - I don't have any big performance expectations for the box, 10Mbyte/sec is quite fine with me here. The part where it leads to crashes and hangs system programs (zfs, zpool, etc) is unacceptable. The startup delay you are seeing is another feature of ZFS, if you reboot in the middle of a large file delete or dataset destroy, ZFS ( and the OS) will not come up until it finishes the delete or dataset destroy first. Why can't it be an intensive, but background, operation? Import the pool, let it be used, and go on deleting... like it was supposed to be in that lifetime when the box began deleting these blocks ;) Well, it took me a worrysome while to figure this out the first time, a couple of months ago. Now I am just rather annoyed about absence of access to my box and data, but I hope that it will come around after several retries. Apparently, this unpredictability (and slowness and crashes) is a show-stopper for any enterprise use. I have made workarounds for the OS to come up okay, though. Since the root pool is separate, I removed pool and dcpool from zpool.cache file, and now the OS milestones do not depend on them to be available. Instead, importing the pool (with cachefile=none), starting the iscsi target and initiator, creating and removing the LUN with sbdadm, and importing the dcpool are all wrapped in several SMF services so I can relatively easily control the presence of these pools (I can disable them from autostart by touching a file in /etc directory). Steve - Jim Klimov jimkli...@cos.ru wrote: I've captured an illustration for this today, with my watchdog as well as vmstat, top and other tools. Half a gigabyte in under one second - the watchdog never saw it coming :( While your memory may be sufficient, that cpu is sorely lacking. Is it even 64bit? There's a reason intel couldn't give those things away in the early 2000s and amd was eating their lunch. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Metadata (DDT) Cache Bias
On Sun, Jun 5, 2011 at 9:56 AM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Saturday, June 04, 2011 9:10 PM Instant Poll : Yes/No ? No. Methinks the MRU/MFU balance algorithm adjustment is more fruitful. Operating under the assumption that cache hits can be predicted, I agree with RE. However, that's not always the case, and if you have a random work load with enough ram to hold the whole DDT, but you don't have enough ram to hold your whole storage pool, then dedup hurts your performance dramatically. Your only option is to set primarycache=metadata, and simply give up hope that you could *ever* have a userdata cache hit. The purpose for starting this thread is to suggest it might be worthwhile (particularly with dedup enabled) to at least have the *option* of always keeping the metadata in cache, but still allow userdata to be cached too, up to the size of c_max. Just in case you might ever see a userdata cache hit. ;-) And as long as we're taking a moment to think outside the box, it might as well be suggested that this doesn't have to be a binary decision, all-or-nothing. One way to implement such an idea would be to assign a relative weight to metadata versus userdata. Dan and Roch suggested a value of 128x seems appropriate. I'm sure some people would suggest infinite metadata weight (which is synonymous to the aforementioned primarycache=metadata, plus the ability to cache userdata in the remaining unused ARC space.) I'd go with the option of allowing both a weighted and a forced option. I agree though, if you do primarycache=metadata, the system should still attempt to cache userdata if there is additional space remaining. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD recommendation for ZFS usage
On Mon, May 30, 2011 at 1:35 PM, Jim Klimov j...@cos.ru wrote: Thanks, now I have someone to interrogate, who seems to have seen these boxes live - if you don't mind ;) - Original Message - From: Richard Elling richard.ell...@gmail.com Date: Monday, May 30, 2011 22:04 We also commonly see the dual-expander backplanes. According to the docs, each chip addresses all disks on its backplane, and it seems implied (but not expressly stated) that either one chip and path works, or another. For SAS targets, both paths work simultaneously. Does this mean that if the J0 uplinks of backplanes are connected to HBAs in two different servers, both of these servers can address individual disks (and the unit of failover is not a backplane but a disk after all)? And if both HBAs are in a single server, this doubles the SAS link throughput by having two paths - and can ZFS somehow balance among them? So if your application can live with the unit of failover being a bunch of 21 or 24 disks - that might be a way to go. However each head would only have one connection to each backplane, and I'm not sure if you can STONITH the non- leading head to enforce failovers (and enable the specific PRI/SEC chip of the backplane). The NexentaStor HA-Cluster plugin manages STONITH and reservations. I do not believe programming expanders or switches for clustering is the best approach. It is better to let the higher layers manage this. Makes sense. Since I originally thought that only one path works at a given time, it may be needed to somehow shutdown the competitor HBA/link ;) I am not sure if this requirement also implies dual SAS data connectors - pictures of HCL HDDs all have one connector... These are dual ported. Does this mean mecanically two 7-pin SATA data ports and a wide power port, for a total of 3 connectors on the back of HDD, as well as on the backplane sockets? Or does it mean something else? Because I've looked up half a dozen of SuperMicro-supported drives (bold SAS in the list for E2-series chassis), and in the online shops' images they all have the standard 2 connectors (wide and 7-pin): http://www.supermicro.com.tr/SAS-1-CompList.pdf The HCL is rather small, and other components may work but are not supported by SuperMicro. And to be more specific, do you know if Hitachi 7K3000 series SAS models HUS723020ALS640 (2Tb) or HUS723030ALS640 (3Tb) are suitable for these boxes? Does it make sense to keep the OS/swap on faster smaller drives like a mirror of HUS156030VLS600 (300Gb SAS 15kRPM) - or is it a waste of money? (And are they known to work in these boxes?) Hint: Nexenta people seem to be good OEM friends with Supermicro, so they might know ;) Yes :-) -- richard Thanks! //Jim Klimov SAS drives are SAS drives, they aren't like SCSI. There aren't 20 different versions with different pinouts. Multipathing is handled by mpxio. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Oracle and Nexenta
On Wed, May 25, 2011 at 8:53 AM, Frank Van Damme frank.vanda...@gmail.comwrote: Op 25-05-11 14:27, joerg.moellenk...@sun.com schreef: Well, at first ZFS development is no standard body and at the end everything has to be measured in compatibility to the Oracle ZFS implementation Why? Given that ZFS is Solaris ZFS just as well as Nexenta ZFS just as well as illumos ZFS, by what reason is Oracle ZFS being declared the standard or reference? Because they write the first so-many lines or because they make the biggest sales on it (kinda hard to sell licenses to an open source product)? Because they OWN the code, and the patents to protect the code. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Oracle and Nexenta
On Wed, May 25, 2011 at 10:01 AM, Paul Kraus p...@kraus-haus.org wrote: On Wed, May 25, 2011 at 10:27 AM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: The method the IETF uses seems to be particularly immune to vendor interference. Vendors who want to participate in defining an interoperable standard can achieve substantial success. Vendors who only want their own way encounter deafening silence and isolation. There have been a number of RFC's effectively written by one vendor in order to be able to claim open standards compliance, the biggest corporate offender in this regard, but clearly not the only one, is Microsoft. The next time I run across one of these RFC's I'll make sure to forward you a copy. The only one that comes to mind immediately was the change to the specification of what characters were permissible in DNS records to include underscore _. This was specifically to support Microsoft's existing naming convention. I am NOT saying that was a bad change, but that it was a change driven by ONE vendor. Except it wasn't just Microsoft at all. There were three vendors on the original RFC, and one of the authors was Paul Vixie... the author of BIND. http://www.ietf.org/rfc/rfc2782.txt You should probably do a bit of research before throwing out claims like that to try to shoot someone down. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris vs FreeBSD question
On Wed, May 18, 2011 at 7:47 AM, Paul Kraus p...@kraus-haus.org wrote: Over the past few months I have seen mention of FreeBSD a couple time in regards to ZFS. My question is how stable (reliable) is ZFS on this platform ? This is for a home server and the reason I am asking is that about a year ago I bought some hardware based on it's inclusion on the Solaris 10 HCL, as follows: SuperMicro 7045A-WTB (although I would have preferred the server version, but it wasn't on the HCL) Two quad core 2.0 GHz Xeon CPUs 8 GB RAM (I am NOT planning on using DeDupe) 2 x Seagate ES-2 250 GB SATA drives for the OS 4 x Seagate ES-2 1 TB SATA drives for data Nvidia Geforce 8400 (cheapest video card I could get locally) I could not get the current production Solaris or OpenSolaris to load. The miniroot would GPF while loading the kernel. I could not get the problem resolved and needed to get the server up and running as my old server was dying (dual 550 MHz P3 with 1 GB RAM) and I needed to get my data (about 600 GB) off of it before I lost anything. That old server was running Solaris 10 and the data was in a zpool with mirrored vdevs of different sized drives. I had lost one drive in each vdev and zfs saved my data. So I loaded OpenSuSE and moved the data to a mirrored pair of 1 TB drives. I still want to move my data to ZFS, and push has come to shove, as I am about to overflow the 1 TB mirror and I really, really hate the Linux options for multiple disk device management (I'm spoiled by SVM and ZFS). So now I really need to get that hardware loaded with an OS that supports ZFS. I have tried every variation of Solaris that I can get my hands on including Solaris 11 Express and Nexenta 3 and they all GPF loading the kernel to run the installer. My last hope is that I have a very plain vanilla (ancient S540) video card to swap in for the Nvidia on the very long shot chance that is the problem. But I need a backup plan if that does not work. I have tested the hardware with FreeBSD 8 and it boots to the installer. So my question is whether the FreeBSD ZFS port is up to production use ? Is there anyone here using FreeBSD in production with good results (this list tends to only hear about serious problems and not success stories) ? P.S. If anyone here has a suggestion as to how to get Solaris to load I would love to hear it. I even tried disabling multi-cores (which makes the CPUs look like dual core instead of quad) with no change. I have not been able to get serial console redirect to work so I do not have a good log of the failures. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, RPI Players I've heard nothing but good things about it. FreeNAS uses it: http://freenas.org/ and IXSystems sells a commercial product based on the FreeNAS/FreeBSD code. I don't think they have a full-blown implementation of CIFS (just Samba), but other than that, I don't think you'll have too many issues. I actually considered moving over to it, but I made the unfortunate mistake of upgrading to Solaris 11 Express, which means my zpool version is now too new to run anything else (AFAIK). --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On Mon, May 9, 2011 at 2:11 AM, Evaldas Auryla evaldas.aur...@edqm.euwrote: On 05/ 6/11 07:21 PM, Brandon High wrote: On Fri, May 6, 2011 at 9:15 AM, Ray Van Dolsonrvandol...@esri.com wrote: We use dedupe on our VMware datastores and typically see 50% savings, often times more. We do of course keep like VM's on the same volume I think NetApp uses 4k blocks by default, so the block size and alignment should match up for most filesystems and yield better savings. Assuming that VMware datastores are on NFS ? Otherwise VMware filesystem VMFS uses its own block sizes from 1M to 8M, so the important point is to align guest OS partition to 1M, and Windows guests starting from Vista/2008 do that by default now. Regards, The VMFS filesystem itself is aligned by NetApp at LUN creation time. You still align to a 4K block on a filer because there is no way to automatically align an encapsulated guest, especially when you could have different guest OS types on a LUN. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On Wed, May 4, 2011 at 6:36 PM, Erik Trimble erik.trim...@oracle.comwrote: On 5/4/2011 4:14 PM, Ray Van Dolson wrote: On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote: On Wed, May 4, 2011 at 12:29 PM, Erik Trimbleerik.trim...@oracle.com wrote: I suspect that NetApp does the following to limit their resource usage: they presume the presence of some sort of cache that can be dedicated to the DDT (and, since they also control the hardware, they can make sure there is always one present). Thus, they can make their code AFAIK, NetApp has more restrictive requirements about how much data can be dedup'd on each type of hardware. See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller pieces of hardware can only dedup 1TB volumes, and even the big-daddy filers will only dedup up to 16TB per volume, even if the volume size is 32TB (the largest volume available for dedup). NetApp solves the problem by putting rigid constraints around the problem, whereas ZFS lets you enable dedup for any size dataset. Both approaches have limitations, and it sucks when you hit them. -B That is very true, although worth mentioning you can have quite a few of the dedupe/SIS enabled FlexVols on even the lower-end filers (our FAS2050 has a bunch of 2TB SIS enabled FlexVols). Stupid question - can you hit all the various SIS volumes at once, and not get horrid performance penalties? If so, I'm almost certain NetApp is doing post-write dedup. That way, the strictly controlled max FlexVol size helps with keeping the resource limits down, as it will be able to round-robin the post-write dedup to each FlexVol in turn. ZFS's problem is that it needs ALL the resouces for EACH pool ALL the time, and can't really share them well if it expects to keep performance from tanking... (no pun intended) On a 2050? Probably not. It's got a single-core mobile celeron CPU and 2GB/ram. You couldn't even run ZFS on that box, much less ZFS+dedup. Can you do it on a model that isn't 4 years old without tanking performance? Absolutely. Outside of those two 2000 series, the reason there are dedup limits isn't performance. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On Wed, May 4, 2011 at 6:51 PM, Erik Trimble erik.trim...@oracle.comwrote: On 5/4/2011 4:44 PM, Tim Cook wrote: On Wed, May 4, 2011 at 6:36 PM, Erik Trimble erik.trim...@oracle.comwrote: On 5/4/2011 4:14 PM, Ray Van Dolson wrote: On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote: On Wed, May 4, 2011 at 12:29 PM, Erik Trimbleerik.trim...@oracle.com wrote: I suspect that NetApp does the following to limit their resource usage: they presume the presence of some sort of cache that can be dedicated to the DDT (and, since they also control the hardware, they can make sure there is always one present). Thus, they can make their code AFAIK, NetApp has more restrictive requirements about how much data can be dedup'd on each type of hardware. See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller pieces of hardware can only dedup 1TB volumes, and even the big-daddy filers will only dedup up to 16TB per volume, even if the volume size is 32TB (the largest volume available for dedup). NetApp solves the problem by putting rigid constraints around the problem, whereas ZFS lets you enable dedup for any size dataset. Both approaches have limitations, and it sucks when you hit them. -B That is very true, although worth mentioning you can have quite a few of the dedupe/SIS enabled FlexVols on even the lower-end filers (our FAS2050 has a bunch of 2TB SIS enabled FlexVols). Stupid question - can you hit all the various SIS volumes at once, and not get horrid performance penalties? If so, I'm almost certain NetApp is doing post-write dedup. That way, the strictly controlled max FlexVol size helps with keeping the resource limits down, as it will be able to round-robin the post-write dedup to each FlexVol in turn. ZFS's problem is that it needs ALL the resouces for EACH pool ALL the time, and can't really share them well if it expects to keep performance from tanking... (no pun intended) On a 2050? Probably not. It's got a single-core mobile celeron CPU and 2GB/ram. You couldn't even run ZFS on that box, much less ZFS+dedup. Can you do it on a model that isn't 4 years old without tanking performance? Absolutely. Outside of those two 2000 series, the reason there are dedup limits isn't performance. --Tim Indirectly, yes, it's performance, since NetApp has plainly chosen post-write dedup as a method to restrict the required hardware capabilities. The dedup limits on Volsize are almost certainly driven by the local RAM requirements for post-write dedup. It also looks like NetApp isn't providing for a dedicated DDT cache, which means that when the NetApp is doing dedup, it's consuming the normal filesystem cache (i.e. chewing through RAM). Frankly, I'd be very surprised if you didn't see a noticeable performance hit during the period that the NetApp appliance is performing the dedup scans. Again, it depends on the model/load/etc. The smallest models will see performance hits for sure. If the vol size limits are strictly a matter of ram, why exactly would they jump from 4TB to 16TB on a 3140 by simply upgrading ONTAP? If the limits haven't gone up on, at the very least, every one of the x2xx systems 12 months from now, feel free to dig up the thread and give an I-told-you-so. I'm quite confident that won't be the case. The 16TB limit SCREAMS to me that it's a holdover from the same 32bit limit that causes 32-bit volumes to have a 16TB limit. I'm quite confident they're just taking the cautious approach on moving to 64bit dedup code. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On Wed, May 4, 2011 at 10:15 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Erik Trimble ZFS's problem is that it needs ALL the resouces for EACH pool ALL the time, and can't really share them well if it expects to keep performance from tanking... (no pun intended) That's true, but on the flipside, if you don't have adequate resources dedicated all the time, it means performance is unsustainable. Anything which is going to do post-write dedup will necessarily have degraded performance on a periodic basis. This is in *addition* to all your scrubs and backups and so on. AGAIN, you're assuming that all system resources are used all the time and can't possibly go anywhere else. This is absolutely false. If someone is running a system at 99% capacity 24/7, perhaps that might be a factual statement. I'd argue if someone is running the system 99% all of the time, the system is grossly undersized for the workload. How can you EVER expect a highly available system to run 99% on both nodes (all nodes in a vmax/vsp scenario) and ever be able to fail over? Either a home-brew Opensolaris Cluster, Oracle 7000 cluster, or NetApp? I'm gathering that this list in general has a lack of understanding of how NetApp does things. If you don't know for a fact how it works, stop jumping to conclusions on how you think it works. I know for a fact that short of the guys currently/previously writing the code at NetApp, there's a handful of people in the entire world who know (factually) how they're allocating resources from soup to nuts. As far as this discussion is concerned, there's only two points that matter: They've got dedup on primary storage, it works in the field. The rest is just static that doesn't matter. Let's focus on how to make ZFS better instead of trying to guess how others are making it work, especially when they've got a completely different implementation. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduplication Memory Requirements
On Wed, May 4, 2011 at 10:23 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ray Van Dolson Are any of you out there using dedupe ZFS file systems to store VMware VMDK (or any VM tech. really)? Curious what recordsize you use and what your hardware specs / experiences have been. Generally speaking, dedup doesn't work on VM images. (Same is true for ZFS or netapp or anything else.) Because the VM images are all going to have their own filesystems internally with whatever blocksize is relevant to the guest OS. If the virtual blocks in the VM don't align with the ZFS (or whatever FS) host blocks... Then even when you write duplicated data inside the guest, the host won't see it as a duplicated block. There are some situations where dedup may help on VM images... For example if you're not using sparse files and you have a zero-filed disk... But in that case, you should probably just use a sparse file instead... Or ... If you have a golden image that you're copying all over the place ... but in that case, you should probably just use clones instead... Or if you're intimately familiar with both the guest host filesystems, and you choose blocksizes carefully to make them align. But that seems complicated and likely to fail. That's patently false. VM images are the absolute best use-case for dedup outside of backup workloads. I'm not sure who told you/where you got the idea that VM images are not ripe for dedup, but it's wrong. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Going forward after Oracle - Let's get organized, let's get started.
On Sat, Apr 9, 2011 at 4:25 PM, Garrett D'Amore garr...@nexenta.com wrote: On Sun, 2011-04-10 at 08:56 +1200, Ian Collins wrote: On 04/10/11 05:41 AM, Chris Forgeron wrote: I see your point, but you also have to understand that sometimes too many helpers/opinions are a bad thing. There is a set core of ZFS developers who make a lot of this move forward, and they are the key right now. The rest of us will just muddy the waters with conflicting/divergent opinions on direction and goals. In the real world we would be called customers, you know the people who actually use the product. Right. And in the real world, customers are generally not involved with architectural discussions of products. Their input is collected and feed into the process, but they don't get to sit at the whiteboard with developers as the work on the designs. What real world? Real world of enterprise storage development, or real world of open-source project? It sounds to me like you want to have your cake and eat it too. Developers, no matter how good, shouldn't work in a vacuum. Agreed, and we don't. Except for the secret mailing list, and the fact you've stated repeatedly the code will be behind a wall until you feel it's ready for the public to see, right? How exactly are the developers not working in a vacuum? If you want to see a good example of how things should be done in the open, follow the caiman-discuss list. Caiman-discuss may be an excellent example of a model that can work, but it might not be the best model for ZFS. There are many more contentious issues, and more contentious personalities, and other considerations that I don't want to get into. Ultimately, our model is like an IEEE working group. The members have decided to run this list in this fashion, without any significant dissension. Of course, if you don't like this, and want to start your own group, I encourage you to do so. I'll also point at zfs-discuss@opensolaris.org, which is monitored by a number of the members of this cabal. That's a great way to give feedback. - Garrett That's mature. If you don't like it, fork it yourself. With responses like that, I can only imagine how quickly you're going to build up steam behind your project outside of the four or so entities that have a vested interest. I've always said, the best way to build a community is by telling anyone who suggests perhaps they might be able to give feedback that they should be happy you're giving them any scraps at all (or in your case, not even that). --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dual protocal on one file system?
On Sat, Mar 12, 2011 at 7:42 PM, Fred Liu fred_...@issi.com wrote: Hi, Is it possible to run both CIFS and NFS on one file system over ZFS? Thanks. Fred Yes, but managing permissions in that scenario is generally a nightmare. If you're using NFSv4 with AD integration, it's a bit more manageable, but it's still definitely a work in progress. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dual protocal on one file system?
2011/3/12 Fred Liu fred_...@issi.com Tim, Thanks. Is there a mapping mechanism like what DataOnTap does to map the permission/acl between NIS/LDAP and AD? Thanks. Fred *From:* Tim Cook [mailto:t...@cook.ms] *Sent:* 星期日, 三月 13, 2011 9:53 *To:* Fred Liu *Cc:* zfs-discuss@opensolaris.org *Subject:* Re: [zfs-discuss] dual protocal on one file system? On Sat, Mar 12, 2011 at 7:42 PM, Fred Liu fred_...@issi.com wrote: Hi, Is it possible to run both CIFS and NFS on one file system over ZFS? Thanks. Fred Yes, but managing permissions in that scenario is generally a nightmare. If you're using NFSv4 with AD integration, it's a bit more manageable, but it's still definitely a work in progress. --Tim Yes. http://www.unix.com/man-page/OpenSolaris/1m/idmap/ --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cannot replace c10t0d0 with c10t0d0: device is too small
On Fri, Mar 4, 2011 at 10:22 AM, Robert Hartzell b...@rwhartzell.netwrote: In 2007 I bought 6 WD1600JS 160GB sata disks and used 4 to create a raidz storage pool and then shelved the other two for spares. One of the disks failed last night so I shut down the server and replaced it with a spare. When I tried to zpool replace the disk I get: zpool replace tank c10t0d0 cannot replace c10t0d0 with c10t0d0: device is too small The 4 original disk partition tables look like this: Current partition table (original): Total disk sectors available: 312560317 + 16384 (reserved sectors) Part TagFlag First Sector Size Last Sector 0usrwm34 149.04GB 312560350 1 unassignedwm 0 0 0 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 3125603518.00MB 312576734 Spare disk partition table looks like this: Current partition table (original): Total disk sectors available: 312483549 + 16384 (reserved sectors) Part TagFlag First Sector Size Last Sector 0usrwm34 149.00GB 312483582 1 unassignedwm 0 0 0 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 3124835838.00MB 312499966 So it seems that two of the disks are slightly different models and are about 40mb smaller then the original disks. I know I can just add a larger disk but I would rather user the hardware I have if possible. 1) Is there anyway to replace the failed disk with one of the spares? 2) Can I recreate the zpool using 3 of the original disks and one of the slightly smaller spares? Will zpool/zfs adjust its size to the smaller disk? 3) If #2 is possible would I still be able to use the last still shelved disk as a spare? If #2 is possible I would probably recreate the zpool as raidz2 instead of the current raidz1. Any info/comments would be greatly appreciated. Robert You cannot. That's why I suggested two years ago that they chop off 1% from the end of the disk at install time to equalize drive sizes. That way you you wouldn't run into this problem trying to replace disks from a different vendor or different batch. The response was that Sun makes sure all drives are exactly the same size (although I do recall someone on this forum having this issue with Sun OEM disks as well). It's ridiculous they don't take into account the slight differences in drive sizes from vendor to vendor. Forcing you to single-source your disks is a bad habit to get into IMO. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
On Mon, Jan 3, 2011 at 5:56 AM, Garrett D'Amore garr...@nexenta.com wrote: On 01/ 3/11 05:08 AM, Robert Milkowski wrote: On 12/26/10 05:40 AM, Tim Cook wrote: On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling richard.ell...@gmail.com wrote: There are more people outside of Oracle developing for ZFS than inside Oracle. This has been true for some time now. Pardon my skepticism, but where is the proof of this claim (I'm quite certain you know I mean no disrespect)? Solaris11 Express was a massive leap in functionality and bugfixes to ZFS. I've seen exactly nothing out of outside of Oracle in the time since it went closed. We used to see updates bi-weekly out of Sun. Nexenta spending hundreds of man-hours on a GUI and userland apps isn't work on ZFS. Exactly my observation as well. I haven't seen any ZFS related development happening at Ilumos or Nexenta, at least not yet. Just because you've not seen it yet doesn't imply it isn't happening. Please be patient. - Garrett Or, conversely, don't make claims of all this code contribution prior to having anything to show for your claimed efforts. Duke Nukem Forever was going to be the greatest video game ever created... we were told to be patient... we're still waiting for that too. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
On Tue, Jan 4, 2011 at 8:21 PM, Garrett D'Amore garr...@nexenta.com wrote: On 01/ 4/11 09:15 PM, Tim Cook wrote: On Mon, Jan 3, 2011 at 5:56 AM, Garrett D'Amore garr...@nexenta.comwrote: On 01/ 3/11 05:08 AM, Robert Milkowski wrote: On 12/26/10 05:40 AM, Tim Cook wrote: On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling richard.ell...@gmail.com wrote: There are more people outside of Oracle developing for ZFS than inside Oracle. This has been true for some time now. Pardon my skepticism, but where is the proof of this claim (I'm quite certain you know I mean no disrespect)? Solaris11 Express was a massive leap in functionality and bugfixes to ZFS. I've seen exactly nothing out of outside of Oracle in the time since it went closed. We used to see updates bi-weekly out of Sun. Nexenta spending hundreds of man-hours on a GUI and userland apps isn't work on ZFS. Exactly my observation as well. I haven't seen any ZFS related development happening at Ilumos or Nexenta, at least not yet. Just because you've not seen it yet doesn't imply it isn't happening. Please be patient. - Garrett Or, conversely, don't make claims of all this code contribution prior to having anything to show for your claimed efforts. Duke Nukem Forever was going to be the greatest video game ever created... we were told to be patient... we're still waiting for that too. Um, have you not been paying attention? I've delivered quite a lot of contribution to illumos already, just not in ZFS. Take a close look -- there almost certainly wouldn't *be* an open source version of OS/Net had I not done the work to enable this in libc, kernel crypto, and other bits. This work is still higher priority than ZFS innovation for a variety of reasons -- mostly because we need a viable and supportable illumos upon which to build those ZFS innovations. That said, much of the ZFS work I hope to contribute to illumos needs more baking, but some of it is already open source in NexentaStor. (You can for a start look at zfs-monitor, the WORM support, and support for hardware GZIP acceleration all as things that Nexenta has innovated in ZFS, and which are open source today if not part of illumos. Check out http://www.nexenta.org for source code access.) So there, money placed where mouth is. You? - Garrett The claim was that there are more people contributing code from outside of Oracle than inside to zfs. Your contributions to Illumos do absolutely nothing to backup that claim. ZFS-monitor is not ZFS code (it's an FMA module), WORM also isn't ZFS code, it's an OS level operation, and GZIP hardware acceleration is produced by Indra networks, and has absolutely nothing to do with ZFS. Does it help ZFS? Sure, but that's hardly a code contribution to ZFS when it's simply a hardware acceleration card that accelerates ALL gzip code. So, great job picking three projects that are not proof of developers working on ZFS. And great job not providing any proof to the claim there are more developers working on ZFS outside of Oracle than within. You're going to need a hell of a lot bigger bank account to cash the check than what you've got. As for me, I don't recall making any claims on this list that I can't back up, so I'm not really sure what you're getting at. I can only assume the defensive tone of your email is because you've been called out and can't backup the claims either. So again: if you've got code in the works, great. Talk about it when it's ready. Stop throwing out baseless claims that you have no proof of and then fall back on just be patient, it's coming. We've heard that enough from Oracle and Sun already. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
On Sat, Dec 25, 2010 at 8:25 AM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Joerg Schilling And people should note that Netapp filed their patents starting from 1993. This is 5 years after I started to develop WOFS, which is copy on write. This still In any case, this is 20 year old technology. Aren't patents something to protect new ideas? Boy, those guys must be really dumb to waste their time filing billion dollar lawsuits, protecting 20-year old technology, when it's so obvious that you and other people clearly invented it before them, and all the money they waste on lawyers can never achieve anything. They should all fire themselves. And anybody who defends against it can safely hire a law student for $20/hr to represent them, and just pull out your documents as defense, because that's so easy. Plus, as you said, the technology is so old, it should be worthless by now. Why are we all wasting our time in this list talking about irrelevant old technology, anyway? Indeed. Isn't the Oracle database itself at least 20 years old? And Windows? And Solaris itself? All the employees of those companies should probably just start donating their time for free instead of collecting a paycheck since it's quite obvious they should no longer be able to charge for their product. What I find most entertaining is all the armchair lawyers on this mailing list that think they've got prior art when THEY'VE NEVER EVEN SEEN THE CODE IN QUESTION! --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
On Sat, Dec 25, 2010 at 1:10 PM, Erik Trimble erik.trim...@oracle.comwrote: On 12/25/2010 6:25 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Joerg Schilling And people should note that Netapp filed their patents starting from 1993. This is 5 years after I started to develop WOFS, which is copy on write. This still In any case, this is 20 year old technology. Aren't patents something to protect new ideas? Boy, those guys must be really dumb to waste their time filing billion dollar lawsuits, protecting 20-year old technology, when it's so obvious that you and other people clearly invented it before them, and all the money they waste on lawyers can never achieve anything. They should all fire themselves. And anybody who defends against it can safely hire a law student for $20/hr to represent them, and just pull out your documents as defense, because that's so easy. Plus, as you said, the technology is so old, it should be worthless by now. Why are we all wasting our time in this list talking about irrelevant old technology, anyway? While that's a bit sarcastic there Ned, it *should* be the literal truth. But, as the SCO/Linux suit showed, having no realistic basis for a lawsuit doesn't prevent one from being dragged through the (U.S.) courts for the better part of a decade. sigh Why can't we have a loser-pays civil system like every other civilized country? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) If you've got enough money, we do. You just have to make it to the end of the trial, and have a judge who feels similar. They often award monetary settlements for the cost of legal defense to the victor. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A few questions
On Sat, Dec 25, 2010 at 11:23 PM, Richard Elling richard.ell...@gmail.comwrote: On Dec 21, 2010, at 5:05 AM, Deano wrote: The question therefore is, is there room in the software implementation to achieve performance and reliability numbers similar to expensive drives whilst using relative cheap drives? For some definition of similar, yes. But using relatively cheap drives does not mean the overall system cost will be cheap. For example, $250 will buy 8.6K random IOPS @ 4KB in an SSD[1], but to do that with cheap disks might require eighty 7,200 rpm SATA disks. ZFS is good but IMHO easy to see how it can be improved to better meet this situation, I can’t currently say when this line of thinking and code will move from research to production level use (tho I have a pretty good idea ;) ) but I wouldn’t bet on the status quo lasting much longer. In some ways the removal of OpenSolaris may actually be a good thing, as its catalyized a number of developers from the view that zfs is Oracle led, to thinking “what can we do with zfs code as a base”? There are more people outside of Oracle developing for ZFS than inside Oracle. This has been true for some time now. Pardon my skepticism, but where is the proof of this claim (I'm quite certain you know I mean no disrespect)? Solaris11 Express was a massive leap in functionality and bugfixes to ZFS. I've seen exactly nothing out of outside of Oracle in the time since it went closed. We used to see updates bi-weekly out of Sun. Nexenta spending hundreds of man-hours on a GUI and userland apps isn't work on ZFS. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk failed, System not booting
Just boot off a live cd, import the pool, and swap it that way. I'm guessing you havent changed your failmode to continue? On Dec 20, 2010 10:48 AM, Albert Frenz y...@zockbar.de wrote: hi there, i got freenas installed with a raidz1 pool of 3 disks. one of them now failed and it gives me errors like Unrecovered red errors: autorreallocatefailed or MEDIUM ERROR asc:11,4 and the system won't even boot up. so i bought a replacement drive, but i am a bit concerned since normaly you should detach the drive via terminal. i can't do it, since it won't boot up. so am i safe, if i just shut down the machine and replace the drive with the new one and resilver? thanks in advance adrian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OT: anyone aware how to obtain 1.8.0 for X2100M2?
You have to have a support contract to download BIOS and firmware now. On Dec 19, 2010 12:29 PM, Eugen Leitl eu...@leitl.org wrote: I realize this is off-topic, but Oracle has completely screwed up the support site from Sun. I figured someone here would know how to obtain Sun Fire X2100 M2 Server Software 1.8.0 Image contents: * BIOS is version 3A21 * SP is updated to version 3.24 (ELOM) * Chipset driver is updated to 9.27 from http://www.sun.com/servers/entry/x2100/downloads.jsp I've been trying for an hour, and I'm at the end of my rope. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mixing different disk sizes in a pool?
On Sat, Dec 18, 2010 at 7:26 AM, Ian D rewar...@hotmail.com wrote: Another question: all those disks are on Dell MD1000 JBODs (11 of them) and we have 12 SAS ports on three LSI 9200-16e HBAs. Is there any point connecting each JBOD on a separate port or is it ok cascading them in groups of three? Is there a bandwidth limit we'll be hitting doing that? Thanks It's fine to cascade them. SAS is all point-to-point. I strongly doubt you'll hit a bandwidth constraint on the backend, especially if you have the shelves multipathed, but if that's a concern you will get more peak bandwidth putting them on separate ports. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mixing different disk sizes in a pool?
On Sat, Dec 18, 2010 at 4:24 PM, Ian D rewar...@hotmail.com wrote: The answer really depends on what you want to do with pool(s). You'll have to provide more information. Get the maximum of very random IOPS I get can out of those drives for database usage. -- Random IOPS won't max out the SAS link. You'll be fine stacking them. But again, if you have the ports available, and already have the cables, it won't hurt anything to use them. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
On Thu, Dec 16, 2010 at 8:11 AM, Linder, Doug doug.lin...@merchantlink.comwrote: Joerg Schilling wrote: The reason for not being able to use ZFS under Linux is not the license used by ZFS but the missing will for integration. Several lawyers explained already why adding ZFS to the Linux would just create a collective work that is permitted by the GPL. Folks, I very much did not intend to start, nor do I want to participate in or perpetuate, any religious flame wars. This list is for ZFS discussion. There are plenty of other places for License Wars and IP discussion. The only thing I'll add is that I, as I said, I really don't care at all about licenses. When it comes to licenses, to me (and, I suspect, the vast majority of other OSS users), GPL is synonymous with open source. Is that correct? No. Am I aware that plenty of other licenses exist? Yes. Is the issue important? Sure. Do I have time or interest to worry about niggly little details? No. All I want is to be able to use the best technology in the ways that are most useful to me without artificial restrictions. Anything that advances that, I'm for. This is one of those geek things where the topic you're personally very geeky about seems *hugely* important and you can't understand why others don't see that. Maybe it bugs you when people use GPL to mean open source, but the fact is that lots and lots of people do. It bugs me when Stallman tries to get everyone to use the ridiculous GNU/Linux, as if anyone would ever say that. It bugs me when people say I *could* care less. But I live with these things. People talk the way they talk. If you're into IP issues and OSS licensing, that's great. But don't be surprised if other people aren't as fascinated with the dirty details of IP law as you are. Most people find the law unutterably boring. So, feel free to discuss this as much as you want, but leave me out of it. I regret and apologize for my callous disregard in casually tossing around a clearly incendiary term like GPL. Everyone have a great day! :) The problem is, what you're saying amounts to: I want Oracle to port ZFS to linux because I don't want to pay for it. I don't want to pay Oracle for it, and I want to be able to use it any way I see fit. What is in it for Oracle? Goodwill doesn't pay the bills. Claiming you'd start paying for Solaris if they gave you ZFS for free in Linux is absolutely ridiculous. If the best response you can come up with is goodwill, I suggest wishing in one hand and shitting in the other because there's no way Oracle is going to give away such a valuable piece of code for no monetary compensation. *AT BEST* I could see them releasing a binary for OEL only that they won't be sharing with anyone else. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Guide to COMSTAR iSCSI?
On Mon, Dec 13, 2010 at 5:30 PM, Chris Mosetick cmoset...@gmail.com wrote: I have found this post from Mike La Spina to be very detailed covering this topic, yet I could not seem to get it to work right on my first hasty attempt a while back. Let me know if you have success, or adjustments that get this to work. http://blog.laspina.ca/ubiquitous/securing-comstar-and-vmware-iscsi-connections -Chris On Sun, Dec 12, 2010 at 12:47 AM, Martin Mundschenk m.mundsch...@mundschenk.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi! I have configured two LUs following this guide: http://thegreyblog.blogspot.com/2010/02/setting-up-solaris-comstar-and.html Now I want each LU to be available to only one distinct client in the network. I found no easy guide how to accomplish the anywhere in the internet. Any hint? Martin -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.16 (Darwin) iQEcBAEBAgAGBQJNBIw2AAoJEA6eiwqkMgR8vAcH/0jeBh0PvZdnjLK4FOY6/Xw1 JwAqdNbS5jvUn8pvYRxdA379gqyZNoFXMRTpPl5Xefw88rpXS+vqvDHoaM1A5Wov tTERXrh9DMACAswm4KYnA7lcWxEUJWBJ8LA870Sd6GVqPHbBnE+R+o2Op69XUy/g +sAa0f7MDHPJP46xad5/qweUVRNZ0C+Ka2YYqhWKvYTN2DEYmFfnem+c6Vna2TXv uOLoEeV+CHOI/BdrpcDaU8XQzAS5f1x/oTPhk56j0Uzm4q8+aKqc2YTccvGnRJCm 8F+/ZyZ40fy2TRLfhmZIGoL+y9nrJqUDm+K2jXkdH/55vzsk+EdhfZUlDYXsalo= =NdL6 -END PGP SIGNATURE- Looking at that, the one comment I'd make is that I'd strongly suggest avoiding CHAP. It really provides nothing in the way of security, and simply adds more complexity. If you're doing iSCSI across a WAN (I really hope you aren't), you'd be better served using a VPN. If you're doing it on a LAN and you're concerned about security, use VLAN's. It's generally a good idea to dedicate a VLAN to vmware storage traffic anyways (whether it be iSCSI or NFS) if your infrastructure can handle VLAN's. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
On Sat, Dec 11, 2010 at 3:08 PM, Joerg Schilling joerg.schill...@fokus.fraunhofer.de wrote: Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: Problem is... Oracle is now the only company in the world who's immune to netapp lawsuit over ZFS. Even if IBM and Dell and HP wanted to band together and fund the open-source development of ZFS and openindiana... It's a real risk. I don't believe that there is a significant risk as the NetApp patents are invalid because of prior art. You are not a court of law, and that statement has not been tested. It is your opinion and nothing more. I'd appreciate if every time you repeated that statement, you'd preface it with in my opinion so you don't have people running around believing what they're doing is safe. I'd hope they'd be smart enough to consult with a lawyer, but it's probably better to just not spread unsubstantiated rumor in the first place. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
On Sat, Dec 11, 2010 at 5:17 PM, Joerg Schilling joerg.schill...@fokus.fraunhofer.de wrote: Tim Cook t...@cook.ms wrote: I don't believe that there is a significant risk as the NetApp patents are invalid because of prior art. You are not a court of law, and that statement has not been tested. It is your opinion and nothing more. I'd appreciate if every time you repeated that statement, you'd preface it with in my opinion so you don't have people running around believing what they're doing is safe. I'd hope they'd be smart enough to consult with a lawyer, but it's probably better to just not spread unsubstantiated rumor in the first place. If you have substancial information on why NetApp may rightfully own a patent that is essential for ZFS, I would be interested to get this information. Jörg The initial filing was public record. It has been posted on this mailing list already, and you responded to those posts. I'm not sure why you're acting like you're oblivious to the case. Regardless, I'll answer your rhetorical question: http://www.groklaw.net/articlebasic.php?story=20080529163415471 You BELIEVING the are wrong doesn't make it so, sorry. Until it is settled in a court of law, or the patent office invalidates their patents, you are making unsubstantiated claims. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
On Fri, Dec 10, 2010 at 8:54 AM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 10 Dec 2010, Edward Ned Harvey wrote: It's been a while since I last heard anybody say anything about this. What's the latest version of publicly released ZFS? Has oracle made it closed-source moving forward? Nice troll. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ I'm not sure how it's trolling. There have been 0 public statements I've seen from Oracle on their future plans for what was opensolaris. A leaked internal memo is NOT official company policy. Until I see source or an official statement, I'm not holding my breath. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3TB HDD in ZFS
It's based on a jumper on most new drives. On Dec 6, 2010 8:41 PM, taemun tae...@gmail.com wrote: On 7 December 2010 13:25, Brandon High bh...@freaks.com wrote: There shouldn't be any problems using a 3TB drive with Solaris, so long as you're using a 64-bit kernel. Recent versions of zfs should properly recognize the 4k sector size as well. I think you'll find that these 3TB, 4KiB physical sector drives are still exporting logical sectors of 512B (this is what Anandtech has indicated, anyway). ZFS assumes that the drives logical sectors are directly mapped to physical sectors, and will create an ashift=9 vdev for the drives. Hence why enthusiasts are making their own zpool binaries with a hardcoded ashift=12 so they can create pools that actually function beyond 20 random writes per second with these drives: http://digitaldj.net/2010/11/03/zfs-zpool-v28-openindiana-b147-4k-drives-and-you/ Cheers, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs ignoring spares?
-8 ONLINE 0 0 0 c4t22d0ONLINE 0 0 0 c4t23d0ONLINE 0 0 0 c4t24d0ONLINE 0 0 0 c4t25d0ONLINE 0 0 0 c4t26d0ONLINE 0 0 0 c4t27d0ONLINE 0 0 0 c4t28d0ONLINE 0 0 0 raidz2-9 ONLINE 0 0 0 c4t29d0ONLINE 0 0 0 c4t30d0ONLINE 0 0 0 c4t31d0ONLINE 0 0 0 c4t32d0ONLINE 0 0 0 c4t33d0ONLINE 0 0 0 c4t34d0ONLINE 0 0 0 c4t35d0ONLINE 0 0 0 raidz2-10ONLINE 0 0 0 c4t36d0ONLINE 0 0 0 c4t37d0ONLINE 0 0 0 c4t38d0ONLINE 0 0 0 c4t39d0ONLINE 0 0 0 c4t40d0ONLINE 0 0 0 c4t41d0ONLINE 0 0 0 c4t42d0ONLINE 0 0 0 cache c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 spares c4t43d0 INUSE currently in use c4t44d0 INUSE currently in use errors: No known data errors r...@prv-backup:~# Hot spares are dedicated spares in the ZFS world. Until you replace the actual bad drives, you will be running in a degraded state. The idea is that spares are only used in an emergency. You are degraded until your spares are no longer in use. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ RevoDrive ZFS support
On Sun, Nov 28, 2010 at 5:18 PM, Krunal Desai mov...@gmail.com wrote: There are problems with Sandforce controllers, according to forum posts. Buggy firmware. And in practice, Sandforce is far below it's theoretical values. I expect Intel to have fewer problems. I believe it's more the firmware (and pace of firmware updates) from companies making Sandforce-based drives than it is the controller. Enthusiasts can tolerate OCZ and others releasing alphas/betas in forum posts. While the G2 Intel drives may not be the performance kings anymore (or the most price-effective), I'd argue they're certainly the most stable when it comes to firmware. Have my eye on a G3 Intel drive for my laptop, where I can't really afford beta firmware updates biting me on the road. --khd Again this is news to me. Do you have examples? There were plenty of revisions when they first dropped 6-8 months ago, but I haven't heard of anything similar in quite some time. As for Intel, they've had their share of issues as well. I assume you remember the data-loss inducing BIOS password bug? --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ RevoDrive ZFS support
On Sun, Nov 28, 2010 at 1:41 PM, Orvar Korvar knatte_fnatte_tja...@yahoo.com wrote: There are problems with Sandforce controllers, according to forum posts. Buggy firmware. And in practice, Sandforce is far below it's theoretical values. I expect Intel to have fewer problems. According to what forum posts? There were issues when Crucial and a few others released alpha firmware into production... Anandtech has put those drives through the ringer without issue. Several people on this list are running them as well. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ RevoDrive ZFS support
On Sun, Nov 28, 2010 at 10:42 AM, David Magda dma...@ee.ryerson.ca wrote: On Nov 27, 2010, at 16:14, Tim Cook wrote: You don't need drivers for any SATA based SSD. It shows up as a standard hard drive and plugs into a standard SATA port. By the time the G3 Intel drive is out, the next gen Sandforce should be out as well. Unless Intel does something revolutionary, they will still be behind the Sandforce drives. Are you referring to the SF-2000 chips? http://www.sandforce.com/index.php?id=133 http://www.legitreviews.com/article/1429/1/ http://www.google.com/search?q=sandforce+sf-2000 Yup. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
On Sat, Nov 27, 2010 at 9:34 AM, Christopher George cgeo...@ddrdrive.comwrote: I haven't had a chance to test a Vertex 2 PRO against my 2 EX, and I'd be interested if anyone else has. I recently presented at the OpenStorage Summit 2010 and compared exactly the three devices you mention in your post (Vertex 2 EX, Vertex 2 Pro, and the DDRdrive X1) as ZIL Accelerators. Jump to slide 37 for the write IOPS benchmarks: http://www.ddrdrive.com/zil_accelerator.pdf and you *really* want to make sure you get the 4k alignment right Excellent point, starting on slide 66 the performance impact of partition misalignment is illustrated. Considering the results, longevity might be an even greater concern than decreased IOPS performance as ZIL acceleration is a worst case scenario for a Flash based SSD. The DDRdrive is still the way to go for the ultimate ZIL accelleration, but it's pricey as hell. In addition to product cost, I believe IOPS/$ is a relevant point of comparison. Google products gives the price range for the OCZ 50GB SSDs: Vertex 2 EX (OCZSSD2-2VTXEX50G: $870 - $1,011 USD) Vertex 2 Pro (OCZSSD2-2VTXP50G: $399 - $525 USD) 4KB Sustained and Aligned Mixed Write IOPS results (See pdf above): Vertex 2 EX (6325 IOPS) Vertex 2 Pro (3252 IOPS) DDRdrive X1 (38701 IOPS) Using the lowest online price for both the Vertex 2 EX and Vertex 2 Pro, and the full list price (SRP) of the DDRdrive X1. IOPS/Dollar($): Vertex 2 EX (6325 IOPS / $870) = 7.27 Vertex 2 Pro (3252 IOPS / $399) = 8.15 DDRdrive X1 (38701 IOPS / $1,995) = 19.40 Best regards, Why would you disable TRIM on an SSD benchmark? I can't imagine anyone intentionally crippling their drive in the real-world. Furthermore, I don't think 1 hour sustained is a very accurate benchmark. Most workloads are bursty in nature. If you're doing sustained high-IOPS workloads like that, the back-end is going to fall over and die long before the hour time-limit. Your 38k IOPS would need nearly 500 drives to sustain that workload with any kind of decent latency. If you've got 500 drives, you're going to want a hell of a lot more ZIL space than the ddrdrive currently provides. I'm all for benchmarks, but try doing something a bit more realistic. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ RevoDrive ZFS support
On Sat, Nov 27, 2010 at 8:10 AM, Orvar Korvar knatte_fnatte_tja...@yahoo.com wrote: A noob question: These drives that people talk about, can you use them as a system disc too? Install Solaris 11 Express on them? Or can you only use them as a L2ARC or Zil? -- They're a standard SATA hard drive. You can use them for whatever you'd like. For the price though, they aren't really worth the money to buy just to put your OS on. Your system drive on a Solaris system generally doesn't see enough I/O activity to require the kind of IOPS you can get out of most modern SSD's. If you were using the system as a workstation, it'd definitely help, as applications tend to feel more responsive with an SSD. That's all I run in my laptops now. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ RevoDrive ZFS support
On Sat, Nov 27, 2010 at 2:16 PM, Orvar Korvar knatte_fnatte_tja...@yahoo.com wrote: Your system drive on a Solaris system generally doesn't see enough I/O activity to require the kind of IOPS you can get out of most modern SSD's. My system drive sees a lot of activity, to the degree everything is going slow. I have a SunRay that my girlfriend use, and I have 5-10 torrents going on, and surf the web - often my system crawls. Very often my girlfriend gets irritated because everything lags and she frequently asks me if she can do some task, or if she should wait until I have finished copying my files. Unbearable. I have a quad core Intel 9450 at 2.66GHz, and 8GB RAM. I am planning to use a SSD and really hope it will be faster. $ iostat -xcnXCTdz 1 cpu us sy wt id 25 7 0 68 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0,00,00,00,0 0,0 0,00,00,0 0 0 c8 0,00,00,00,0 0,0 0,00,00,0 0 0 c8t0d0 37,0 442,1 4489,6 51326,1 7,5 2,0 15,74,1 98 100 c7d0 Desktop usage is a different beast as I alluded to. A dedicated server typically doesn't have any issues. I'd strongly suggest getting one of the sandforce controller based SSD's. They're the best on the market right now by far. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
On Sat, Nov 27, 2010 at 2:24 PM, Christopher George cgeo...@ddrdrive.comwrote: Why would you disable TRIM on an SSD benchmark? Because ZFS does *not* support TRIM, so the benchmarks are configured to replicate actual ZIL Accelerator workloads. If you're doing sustained high-IOPS workloads like that, the back-end is going to fall over and die long before the hour time-limit. The reason the graphs are done in a time line fashion is so you look at any point in the 1 hour series to see how each device performs. Best regards, TRIM was putback in July... You're telling me it didn't make it into S11 Express? http://mail.opensolaris.org/pipermail/onnv-notify/2010-July/012674.html --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ RevoDrive ZFS support
On Sat, Nov 27, 2010 at 3:12 PM, Orvar Korvar knatte_fnatte_tja...@yahoo.com wrote: I am waiting for the next gen Intel SSD drives, G3. They are arriving very soon. And from what I can infer by reading here, I can use it without issues. Solaris will recognize the Intel SDD drive without any drivers needed, or whatever? Intel new SSD should work with Solaris 11 Express, yes? You don't need drivers for any SATA based SSD. It shows up as a standard hard drive and plugs into a standard SATA port. By the time the G3 Intel drive is out, the next gen Sandforce should be out as well. Unless Intel does something revolutionary, they will still be behind the Sandforce drives. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL?
On Sat, Nov 27, 2010 at 9:29 PM, Erik Trimble erik.trim...@oracle.comwrote: On 11/27/2010 6:50 PM, Christopher George wrote: Furthermore, I don't think 1 hour sustained is a very accurate benchmark. Most workloads are bursty in nature. The IOPS degradation is additive, the length of the first and second one hour sustained period is completely arbitrary. The take away from slides 1 and 2 is drive inactivity has no effect on the eventual outcome. So with either a bursty or sustained workload the end result is always the same, dramatic write IOPS degradation after unpackaging or secure erase of the tested Flash based SSDs. Best regards, Christopher George Founder/CTO www.ddrdrive.com Without commenting on other threads, I often seen sustained IO in my setups for extended periods of time - particularly, small IO which eats up my IOPS. At this moment, I run with ZIL turned off for that pool, as it's a scratch pool and I don't care if it gets corrupted. I suspect that a DDRdrive or one of the STEC Zeus drives might help me, but I can overwhelm any other SSD quickly. I'm doing compiles of the JDK, with a single backed ZFS system handing the files for 20-30 clients, each trying to compile a 15 million-line JDK at the same time. Lots and lots of small I/O. :-) Sounds like you need lots and lots of 15krpm drives instead of 7200rpm SATA ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import is this safe to use -f option in this case ?
On Wed, Nov 17, 2010 at 2:56 PM, Jim Dunham james.dun...@oracle.com wrote: Tim, On Wed, Nov 17, 2010 at 10:12 AM, Jim Dunham james.dun...@oracle.comwrote: sridhar, I have done the following (which is required for my case) Created a zpool (smpool) on a device/LUN from an array (IBM 6K) on host1 created a array level snapshot of the device using dscli to another device which is successful. Now I make the snapshot device visible to another host (host2) Even though the array is capable of taking device/LUN snapshots, this is a non-standard mode of operation regarding the use of ZFS. It raises concerns that if one had a problem using a ZFS in this manner, there would be few Oracle or community users of ZFS that could assist. Even if the alleged problem was not related to using ZFS with array based snapshots, usage would always create a level of uncertainty. Instead I would suggest using ZFS send / recv instead. That's what we call FUD. It might be a problem if you use someone else's feature that we duplicate. If Oracle isn't going to support array-based snapshots, come right out and say it. You might as well pack up the cart now though, there isn't an enterprise array on the market that doesn't have snapshots, and you will be the ONLY OS I've ever heard of even suggesting that array-based snapshots aren't allowed. That's not what I said... Non-standard mode of operation is *not* the same thing as not supported. Using ZFS's standard mode of operation based on its built-in support for snapshots is well proven, well document technology. How is using an array based snapshot to create a copy of a filesystem non-standard? Non-standard to who? Array based snapshots were around long-before ZFS was created. It was proven and documented long before ZFS was around as well. Given your history in the industry, I know you aren't so new to this game you didn't already know that, so I'm not really sure what the purpose of proven and documented was, other than to try to insinuate that other technologies are not. would there be any issues ? Prior to taking the next snapshot, one must be assured that the device/LUN on host2 is returned to the zpool export state. Failure to do this could cause zpool corruption, ZFS I/O failures, or even the possibility of a system panic on host2. Really? And how did you come to that conclusion? As prior developer and project lead of host-based snapshot and replication software on Solaris, I have first hand experience using ZFS with snapshots. If while ZFS on node2 is accessing an instance of snapshot data, the array updates the snapshot data, ZFS will see newly created CRCs created by node1. These CRCs will be considered as metadata corruption, and depending on exactly what ZFS was doing at the time the corruption was detected, the software attempt some form of error recovery. The array doesn't update the snapshot data. That's the whole point of the snapshot. It's point-in-time. Either the snapshot exists as it was taken, or it's deleted. What array on the market changes blocks in a snapshot that are being presented out as a live filesystem to a host? I've never heard of any such behavior, and that sort of behavior would be absolutely brain-dead. OP: Yes, you do need to use a -f. The zpool has a signature that is there when the pool is imported (this is to keep an admin from accidentally importing the pool to two different systems at the same time). The only way to clear it is to do a zpool export before taking the initial snapshot, or doing the -f on import. Jim here is doing a great job of spreading FUD, and none of it is true. What you're doing should absolutely work, just make sure there is no I/O in flight when you take the original snapshot. Either export the pool first (I would recommend this approach), shut the system down, or just make sure you aren't doing any writes when taking the array-based snapshot. These last two statements need clarification. ZFS is always on disk consistent, even in the context of using snapshots. Therefore as far as ZFS is concerned, there is no need to assure that there are no I/Os in flight, or that the storage pool is exported, or that the system is shutdown, or that one is doing any writes. Except when it isn't. Which is why zfs import -F was added to ZFS. In theory ZFS doesn't need checkdisk and it didn't need an import -F because it's always consistent on disk. In reality, that's utterly false as well. Although ZFS is always on disk consistent, many applications are not filesystem consistent. To be filesystem consistent, an application by design must issue careful writes and/or synchronized filesystem operations. Not knowing this fact, or lacking this functionality, a system admin will need to deploy some of the work-arounds suggested above. The most important one not listed, is to stop or pause those applications which are know
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On Wed, Nov 17, 2010 at 7:34 PM, Richard Elling richard.ell...@gmail.comwrote: On Nov 16, 2010, at 2:03 PM, Rthoreau r7h0...@att.net wrote: I just think that some people might need that little extra nudge that a few graphs and test would provide. If it happens to also come with a few good practices you could save a lot of people some time and heart ache as I am sure people are desirous to see the results. I think people are putting encryption in their apps directly (eg Oracle's Transparent Data Encryption feature) -- richard I know there are far more apps without support for encryption than with it. And given the ever more stringent government regulations in the US, there are plenty of customers chomping at the bit for encryption at the storage array. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss