Re: [zfs-discuss] x4500 vs AVS ?
On 09.09.08 19:32, Richard Elling wrote: Ralf Ramge wrote: Richard Elling wrote: Yes, you're right. But sadly, in the mentioned scenario of having replaced an entire drive, the entire disk is rewritten by ZFS. No, this is not true. ZFS only resilvers data. Okay, I see we have a communication problem here. Probably my fault, I should have written the entire data and metadata. I made the assumption that a 1 TB drive in a X4500 may have up to 1 TB of data on it. Simply because nobody buys the 1 TB X4500 just to use 10% of the disk space, he would have bought the 250 GB, 500 GB or 750 GB model then. Actually, they do :-) Some storage vendors insist on it, to keep performance up -- short-stroking. I've done several large-scale surveys of this and the average usage is 50%. This is still a large difference in resilver times between ZFS and SVM. There is RFE 6722786 resilver on mirror could reduce window of vulnerability which is aimed to reduce this difference for mirrors. See here: http://bugs.opensolaris.org/view_bug.do?bug_id=6722786 Wbr, Victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over multiple iSCSI targets
Tuomas Leikola wrote: On Mon, Sep 8, 2008 at 8:35 PM, Miles Nordin [EMAIL PROTECTED] wrote: ps iSCSI with respect to write barriers? +1. Does anyone even know of a good way to actually test it? So far it seems the only way to know if your OS is breaking write barriers is to trade gossip and guess. Write a program that writes backwards (every other block to avoid write merges) with and without O_DSYNC, measure speed. I think you can also deduce driver and drive cache flush correctness by calculating the best theoretical correct speed (which should be really slow, one write per disc spin) this has been on my TODO list for ages.. :( Does the perl script at http://brad.livejournal.com/2116715.html do what you want? -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The best motherboard for a home ZFS fileserver
I'm a fan of ZFS since I've read about it last year. Now I'm on the way to build a home fileserver and I'm thinking to go with Opensolaris and eventually ZFS!! This seems to be a good candidate to build a home ZFS server: http://tinyurl.com/msi-so It's cheap, low power, fan-less; the only concern is the Realtek 8111C NIC. According to a Sun Blogger, there is no Solaris driver: http://blogs.sun.com/roberth/entry/msi_wind_as_a_low (Thanks for the info) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The best motherboard for a home ZFS fileserver
On Wed, Sep 10, 2008 at 03:57:13AM -0700, W. Wayne Liauh wrote: This seems to be a good candidate to build a home ZFS server: http://tinyurl.com/msi-so It's cheap, low power, fan-less; the only concern is the Realtek 8111C NIC. According to a Sun Blogger, there is no Solaris driver: Looking at the pictures, there may not be a cpu fan but there's still a case fan. One could also argue that the case really isn't optimal for multiple disks. vh Mads Toftum -- http://soulfood.dk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
Hallo list, hope that so can help me on this topic. I'd like to know where the *real* advantages of Nexenta/ZFS (i.e. ZFS/StorageTek) over DRBD/Heartbeat are. I'm pretty new to this topic and hence do not have enough experience to judge their respective advantages/disadvantages reasonably. Any suggestion would be appreciated. -- Best regards Axel Schmalowsky Platform Engineer ___ domainfactory GmbH Oskar-Messter-Str. 33 85737 Ismaning Germany Telefon: +49 (0)89 / 55266-356 Telefax: +49 (0)89 / 55266-222 E-Mail: [EMAIL PROTECTED] Internet: www.df.eu Registergericht: Amtsgericht München HRB 150294, Geschäftsführer Tobias Marburg, Jochen Tuchbreiter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The best motherboard for a home ZFS fileserver
On Wed, Sep 10, 2008 at 5:57 AM, W. Wayne Liauh [EMAIL PROTECTED] wrote: I'm a fan of ZFS since I've read about it last year. Now I'm on the way to build a home fileserver and I'm thinking to go with Opensolaris and eventually ZFS!! This seems to be a good candidate to build a home ZFS server: http://tinyurl.com/msi-so It's cheap, low power, fan-less; the only concern is the Realtek 8111C NIC. According to a Sun Blogger, there is no Solaris driver: http://blogs.sun.com/roberth/entry/msi_wind_as_a_low (Thanks for the info) -- From the other reviews I've read on the Atom 230 and 270, I don't think this box has enough CPU horsepower for a ZFS based fileserver - or maybe I have different performance expectations than the OP. To each his own. I would like to give the list a heads-up on a mini-ITX board that is already available based on the Atom 330 - the dual core version of the chip. Here you'll find a couple of pictures of the board: http://www.mp3car.com/vbulletin/general-hardware-discussion/123966-intel-d945gclf2-dual-core-atom.html NB: the 2 at the end of the part # is the Atom330 based part; no 2 indicates the board with the single-core Atom. Also: the 330 has twice the cache as the single-core Atom. This board is already available for around $85. Bear in mind that the chipset used on this board dissipated around 45 Watts - so don't just look at the power dissipation numbers for the CPU. I'm not specifically recommending this board for use as a ZFS based fileserver - but it might provide a solution for someone on this list. PS: Since the Atom supports hyperthreading, the Atom 330 will appears to Solaris as 4 CPUs. Regards, -- Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED] Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Intel M-series SSD
Interesting flash technology overview and SSD review here: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403 and another review here: http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012.html Regards, -- Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED] Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun samba - ZFS ACLs
On Sat, 6 Sep 2008, Sean McGrath wrote: The sfw project's bit has whats needed here, the libsunwrap.a src etc, http://www.opensolaris.org/os/project/sfwnv/ Thanks for the pointer, I was able to pull out the libsunwrap.a source code and use it to compile the bundled samba source from S10U5. Although a couple of functions in vfs_zfsacl.c returned NTSTATUS instead of BOOL, which initially caused a compilation error until I fixed it. I don't think the source code shipped with Solaris is the same source code actually used to make the binary packages :(. For the benefit of anyone with a similar problem that finds this thread via a search, it turns out the issue was actually with the nfs4acl module, which vfs_zfsacl.c uses. From README.nfs4acls.txt: mode = [simple|special] - simple: don't use OWNER@ and GROUP@ special IDs in ACEs. - default - special: use OWNER@ and GROUP@ special IDs in ACEs instead of simple usergroup ids. The default for the NFS4 ACL mapper subsystem is to remove special ACEs and replace them with specific user/group ACEs. Kind of seems like a dumb default to me. If you add nfs4: mode = special to your smb.conf things work as expected. Thanks again for the help... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel M-series SSD
On Wed, 10 Sep 2008, Al Hopper wrote: Interesting flash technology overview and SSD review here: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403 and another review here: http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012.html These seem like regurgitations of the same marketing drivel that you notified us about before. These Intel products are assembled in China based on non-Intel FLASH components (from Micron). There is little reason to believe that Intel will corner the market due to having an aggressive marketing department. There are other companies in the business who may seem oddly silent compared with Intel/Micron, but enjoy a vastly larger share of the FLASH market. These reviews continue their apples/oranges comparison by comparing cheap lowest-grade desktop/laptop drives with the expensive Intel SSD drives. The hard drive performance specified is for low-grade consumer drives rather than enterprise drives. The hard drive reliability specified is for low-grade consumer drives rather than enterprise drives. The table at Tom's Hardware talks about 160GB SSD drives which are not even announced. The SLC storage sizes are still quite tiny. The wear leveling algorithm ensures that the drive starts losing its memory in all locations at about the same time. RAID does not really help much here for reliability since RAID systems are usually comprised of the same devices installed at the same time and seeing identical write activity. RAID works due to failures being random. If the failures are not random (i.e. all drives start reporting read errors at once) then RAID does not really help. Hopefully the integration with the OS is sufficient that the user knows it is time to change out the drive before it is too late to salvage the data. Write performance to SSDs is not all it is cracked up to be. Buried in the AnandTech writeup, there is mention that while 4K can be written at once, 512KB needs to be erased at once. This means that write performance to an empty device will seem initially pretty good, but then it will start to suffer as 512KB regions need to be erased to make space for more writes. ZFS's COW scheme will intially be fast, but then the writes will slow after all blocks on the device have been written to before. Since writing to a used drive incurs additional latency, the device will need to buffer writes in RAM so that it returns to the user faster. This may increase the chance of data loss due to power failure. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
Well, obviously - its Linux vs. OpenSolaris question. Most serious advantage of OpenSolaris is ZFS and its enterprise level storage stack. Linux just not there yet.. On Wed, 2008-09-10 at 14:51 +0200, Axel Schmalowsky wrote: Hallo list, hope that so can help me on this topic. I'd like to know where the *real* advantages of Nexenta/ZFS (i.e. ZFS/StorageTek) over DRBD/Heartbeat are. I'm pretty new to this topic and hence do not have enough experience to judge their respective advantages/disadvantages reasonably. Any suggestion would be appreciated. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel M-series SSD
On Sep 10, 2008, at 11:40 AM, Bob Friesenhahn wrote: Write performance to SSDs is not all it is cracked up to be. Buried in the AnandTech writeup, there is mention that while 4K can be written at once, 512KB needs to be erased at once. This means that write performance to an empty device will seem initially pretty good, but then it will start to suffer as 512KB regions need to be erased to make space for more writes. That assumes that one doesn't code up the system to batch up erases prior to writes. ... returns to the user faster. This may increase the chance of data loss due to power failure. Presumably anyone deft enough to design such an enterprise grade device will be able to provide enough super-capacitor (or equivalent) to ensure that DRAM is flushed to SSD before anything bad happens. Clever use of such devices in L2ARC and slog ZFS configurations (or moral equivalents in other environments) is pretty much the only affordable way (vs. huge numbers of spindles) to bridge the gap between rotating rust and massively parallel CPUs. One imagines that Intel will go back to fabbing their own at some point; that is closer to their usual business model than OEMing other people's parts ; -- Keith H. Bierman [EMAIL PROTECTED] | AIM kbiermank 5430 Nassau Circle East | Cherry Hills Village, CO 80113 | 303-997-2749 speaking for myself* Copyright 2008 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel M-series SSD
Bob Friesenhahn wrote: On Wed, 10 Sep 2008, Al Hopper wrote: Interesting flash technology overview and SSD review here: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403 and another review here: http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012.html These seem like regurgitations of the same marketing drivel that you notified us about before. These Intel products are assembled in China based on non-Intel FLASH components (from Micron). There is little reason to believe that Intel will corner the market due to having an aggressive marketing department. There are other companies in the business who may seem oddly silent compared with Intel/Micron, but enjoy a vastly larger share of the FLASH market. Intel and Micron have a joint venture for doing the flash SSDs. For some reason, Intel's usually excellent marketing team wasn't involved in naming the JV, so it is called IM Flash Technologies... boring :-) http://www.imftech.com/ Samsung is another major vendor, rumored to be trying to buy Sandisk, but it ain't over 'til its over... might be a JV opportunity, too. http://www.ft.com/cms/s/2/eb1f748e-7f34-11dd-a3da-77b07658.html These reviews continue their apples/oranges comparison by comparing cheap lowest-grade desktop/laptop drives with the expensive Intel SSD drives. The hard drive performance specified is for low-grade consumer drives rather than enterprise drives. The hard drive reliability specified is for low-grade consumer drives rather than enterprise drives. The table at Tom's Hardware talks about 160GB SSD drives which are not even announced. The SLC storage sizes are still quite tiny. The wear leveling algorithm ensures that the drive starts losing its memory in all locations at about the same time. RAID does not really help much here for reliability since RAID systems are usually comprised of the same devices installed at the same time and seeing identical write activity. RAID works due to failures being random. If the failures are not random (i.e. all drives start reporting read errors at once) then RAID does not really help. Hopefully the integration with the OS is sufficient that the user knows it is time to change out the drive before it is too late to salvage the data. I think the market segments are becoming more solidified. There is clearly a low-cost consumer market. But there is also a large, unsatisfied demand for enterprise-class SSDs. Intel has already announced an SLC based extreme product line. Brian's blog seems to be one of the best distilled descriptions I've seen: http://www.edn.com/blog/40040/post/360032036.html -- richard Write performance to SSDs is not all it is cracked up to be. Buried in the AnandTech writeup, there is mention that while 4K can be written at once, 512KB needs to be erased at once. This means that write performance to an empty device will seem initially pretty good, but then it will start to suffer as 512KB regions need to be erased to make space for more writes. ZFS's COW scheme will intially be fast, but then the writes will slow after all blocks on the device have been written to before. Since writing to a used drive incurs additional latency, the device will need to buffer writes in RAM so that it returns to the user faster. This may increase the chance of data loss due to power failure. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel M-series SSD
On Sep 10, 2008, at 12:37 PM, Bob Friesenhahn wrote: On Wed, 10 Sep 2008, Keith Bierman wrote: written at once, 512KB needs to be erased at once. This means that write performance to an empty device will seem initially pretty good, but then it will start to suffer as 512KB regions need to be erased to make space for more writes. That assumes that one doesn't code up the system to batch up erases prior to writes. Is the notion of block erase even exposed via SATA/SCSI protocols? Maybe it is for CD/DVD type devices. This is something that only the device itself would be aware of. Only the device knows if the block has been used before. A conspiracy between the device and a savvy host is sure to emerge ; ... That is reasonable. It adds to product cost and size though. Super- capacitors are not super-small. True, but for enterprise class devices they are sufficiently small. Laptops will have a largish battery and won't need the caps ; Desktops will be on their own. -- Keith H. Bierman [EMAIL PROTECTED] | AIM kbiermank 5430 Nassau Circle East | Cherry Hills Village, CO 80113 | 303-997-2749 speaking for myself* Copyright 2008 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel M-series SSD
On Wed, 10 Sep 2008, Keith Bierman wrote: ... That is reasonable. It adds to product cost and size though. Super-capacitors are not super-small. True, but for enterprise class devices they are sufficiently small. Laptops will have a largish battery and won't need the caps ; Desktops will be on their own. The Intel SSDs are still not advertised as enterprise class devices. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. well, AVS actually does reverse synchronization and does it very good. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. -- Maurice Volaski, [EMAIL PROTECTED] Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Intel M-series SSD
On Wed, 10 Sep 2008, Keith Bierman wrote: written at once, 512KB needs to be erased at once. This means that write performance to an empty device will seem initially pretty good, but then it will start to suffer as 512KB regions need to be erased to make space for more writes. That assumes that one doesn't code up the system to batch up erases prior to writes. Is the notion of block erase even exposed via SATA/SCSI protocols? Maybe it is for CD/DVD type devices. This is something that only the device itself would be aware of. Only the device knows if the block has been used before. Only the device knows the block of physical storage which will be used for the write. The device does not know what can be erased before it sees a (over) write request and if the write request is for a smaller size, then existing data needs to be moved (for leveling) or buffered and written back to the same locations. This means that 512KB needs to be erased and re-written. Presumably anyone deft enough to design such an enterprise grade device will be able to provide enough super-capacitor (or equivalent) to ensure that DRAM is flushed to SSD before anything bad happens. That is reasonable. It adds to product cost and size though. Super-capacitors are not super-small. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. When primary is repaired you want to have it on-line and retain the changes made on the secondary. Your secondary did the job and switched back to its secondary role. This HA fail-back cycle could be repeated as many times as you need using reverse sync command. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror
Haiou Fu (Kevin) wrote: I wonder if there are any equivalent commands in zfs to dump all its associated snapshots at maximum efficiency (only the changed data blocks among all snapshots)? I know you can just zfs send all snapshots but each one is like a full dump and if you use zfs send -i it is hard to maintain the relationship of the snapshots. In NetApp filer, I can do snapmirror store /vol/vol01 ... then everything in /vol/vol01 and all of its snapshots will be mirrored to destination, and it is block level which means only the changes in snapshots are sent out. So I wonder if there is an equivalent for ZFS to do similar things: For example: given one zfs: diskpool/myzfs, it has n snapshots: diskpool/[EMAIL PROTECTED], diskpool/[EMAIL PROTECTED],.. diskpool/[EMAIL PROTECTED], and if you just do zfs magic subcommands diskpool/myzfs ... and it will dump myzfs and all its n snapshots out at maximum efficiency (only changed data in snapshots) ? Any hints/helps are aprreciated! zfs send -I -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 vs AVS ?
Just to clarify a few items... consider a setup where we desire to use AVS to replicate the ZFS pool on a 4 drive server to like hardware. The 4 drives are setup as RaidZ. If we lose a drive (say #2) in the primary server, RaidZ will take over, and our data will still be available but the array is at a degraded state. But what happens to the secondary server? Specifically to its bit-for-bit copy of Drive #2... presumably it is still good, but ZFS will offline that disk on the primary server, replicate the metadata, and when/if I promote the seconday server, it will also be running in a degraded state (ie: 3 out of 4 drives). correct? In this scenario, my replication hasn't really bought me any increased availablity... or am I missing something? Also, if I do chose to fail over to the secondary, can I just to a scrub the broken drive (which isn't really broken, but the zpool would be inconsistent at some level with the other online drives) and get back to full speed quickly? or will I always have to wait until one of the servers resilvers itself (from scratch?), and re-replicates itself?? thanks in advance. -Matt -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS or SATA HBA with write cache
I'm guessing one of the reasons you wanted a non-RAID controller with a write cache was so that if the controller failed, and the exact same model wasn't available to replace it, most of your pool would still be readable with any random controller, modulo risk of corruption from the lost write cache. so...with the slog, you don't have that, because there are magic irreplaceable bits stored on the slog without which your whole pool is useless. Actually I just wanted to get the benefit of increased write cache without paying for the RAID controller... All the best practice guides say that using your RAID controller is generally redundant, and you should use RaidZ (or a variant) in most implementations (leaving room for scenarios where hardware mirroring of some of the drives may be better, etc). Telling the RAID controller to export each drive as a single LUN works with most of the RAID controllers our there... but in addition to being painful to configure (on most of the RAID cards), you're paying for RAID hardware logic that goes unused. Also, all the RAID cards (that I've seen) write some sort of magic secret on the drive (even in 1:1 config) that messes with you when you need to replace/move the drives down the road. So how 'bout it hardware vendors? when can we get a PCIe(x8) SAS/SATA controller with an x4 internal port and an x4 external port and 512MB battery backed cache for about $250?? :) Heck, I'd take SATA only if I could get it at a decent price point... While we're at it, I'd also be happy with a PCIe(x4) card with 2 or 4 DIMM slots and a battery back-up that exposes itself as a system drive (ala iRAM, but PCIe not SATA 150) for slog and read cache... say $150 price point? heehee... there is an SSD based option out there, but it has 80GB available, and starts at $2500 (overkill for my requirement) -Matt -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror
Can you explain more about zfs send -lI know zfs send -i but didn't know there is a -l option? In which release is this option available? Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror
The closest thing I can find is: http://bugs.opensolaris.org/view_bug.do?bug_id=6421958 But just like it says: Incremental + recursive will be a bit tricker, because how do you specify the multiple source and dest snaps? Let me clarify this more: Without send -r I need do something like this; Given a zfs file system myzfs in zpool mypool, it has N snapshots: mypool/myzfs mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED], Do following things: zfs snapshot mypool/[EMAIL PROTECTED] zfs send mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-current.gz zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-1.gz zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-2.gz .. zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-N.gz As you can see, above commands are kind of a stupid solution, and it didn't reach maximum efficiency because those myzfs-1 ~ N.gz files contain a lot of common stuffs in them! I wonder how will send -r do in above situation? How does it choose multiple source and dest snaps? And can -r efficient enough to just dump the incremental changes? What is the corresponding receive command for send -r? (receive -r ? I guess? ) Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror
Haiou Fu (Kevin) wrote: The closest thing I can find is: http://bugs.opensolaris.org/view_bug.do?bug_id=6421958 Look at the man page section on zfs(1m) for -R and -I option explanations. http://docs.sun.com/app/docs/doc/819-2240/zfs-1m?a=view But just like it says: Incremental + recursive will be a bit tricker, because how do you specify the multiple source and dest snaps? Let me clarify this more: Without send -r I need do something like this; Given a zfs file system myzfs in zpool mypool, it has N snapshots: mypool/myzfs mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED], Do following things: zfs snapshot mypool/[EMAIL PROTECTED] zfs send mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-current.gz zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-1.gz zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-2.gz .. zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-N.gz As you can see, above commands are kind of a stupid solution, and it didn't reach maximum efficiency because those myzfs-1 ~ N.gz files contain a lot of common stuffs in them! No, in this example, each file will contain only the incremental changes. I wonder how will send -r do in above situation? How does it choose multiple source and dest snaps? And can -r efficient enough to just dump the incremental changes? What is the corresponding receive command for send -r? (receive -r ? I guess? ) No, receive handles what was sent. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. When primary is repaired you want to have it on-line and retain the changes made on the secondary. Not necessarily. Even when the primary is ready to go back into service, I may not want to revert to it for one reason or another. That means I am without a live mirror because AVS' realtime mirroring is only one direction, primary to secondary. -- Maurice Volaski, [EMAIL PROTECTED] Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS or SATA HBA with write cache
On Wed, Sep 10, 2008 at 16:56, Matt Beebe [EMAIL PROTECTED] wrote: So how 'bout it hardware vendors? when can we get a PCIe(x8) SAS/SATA controller with an x4 internal port and an x4 external port and 512MB battery backed cache for about $250?? :) Heck, I'd take SATA only if I could get it at a decent price point... The Supermicro AOC-USAS-S4iR fits the bill, nearly. It's got 4 internal and 4 external ports, on PCI express x8, with 256 MB of cache, for about $320[1]. Adding battery backup is about another $150[2]. While we're at it, I'd also be happy with a PCIe(x4) card with 2 or 4 DIMM slots and a battery back-up that exposes itself as a system drive (ala iRAM, but PCIe not SATA 150) for slog and read cache... say $150 price point? heehee... there is an SSD based option out there, but it has 80GB available, and starts at $2500 (overkill for my requirement) Not terribly likely to see this soon, I'm afraid. Memory interface technology keeps changing once every couple years, and that makes such a device less attractive to market. Consider that DDR(-1) RAM is almost three times as expensive as DDR-2 RAM ($184 versus $67 for a stick of 2GB ECC... and SDRAM? Survey says $500 easy) and having the latest generation seems to make sense. But that means (as a manufacturer) your device goes obsolete quicker, you sell fewer units, and make less return on your investment. So unless a common memory bus is developed, such a device would be a bad investment. Actually, what I'd rather have than battery backup is a large enough flash device to store the contents of RAM, and a battery big enough to get everything dumped to persistent storage. That takes out the question of running out of batteries prematurely, and leaves only the question of batteries losing capacity over time and needing to replace them. Will [1]: http://www.wiredzone.com/itemdesc.asp?ic=32005545 [2]: http://www.wiredzone.com/itemdesc.asp?ic=10017972 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror
correction below... Richard Elling wrote: Haiou Fu (Kevin) wrote: The closest thing I can find is: http://bugs.opensolaris.org/view_bug.do?bug_id=6421958 Look at the man page section on zfs(1m) for -R and -I option explanations. http://docs.sun.com/app/docs/doc/819-2240/zfs-1m?a=view But just like it says: Incremental + recursive will be a bit tricker, because how do you specify the multiple source and dest snaps? Let me clarify this more: Without send -r I need do something like this; Given a zfs file system myzfs in zpool mypool, it has N snapshots: mypool/myzfs mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED], Do following things: zfs snapshot mypool/[EMAIL PROTECTED] zfs send mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-current.gz zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-1.gz zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-2.gz .. zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - /somewhere/myzfs-N.gz As you can see, above commands are kind of a stupid solution, and it didn't reach maximum efficiency because those myzfs-1 ~ N.gz files contain a lot of common stuffs in them! No, in this example, each file will contain only the incremental changes. I read this wrong, because I was looking at the end snapshot, not the start. What you wrote won't work. If you do something like: zfs send -i [EMAIL PROTECTED] [EMAIL PROTECTED] ... zfs send -i [EMAIL PROTECTED] [EMAIL PROTECTED] ... then there would be no overlap. But I think you will find that zfs send -I is altogether more convenient. -- richard I wonder how will send -r do in above situation? How does it choose multiple source and dest snaps? And can -r efficient enough to just dump the incremental changes? What is the corresponding receive command for send -r? (receive -r ? I guess? ) No, receive handles what was sent. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. When primary is repaired you want to have it on-line and retain the changes made on the secondary. Not necessarily. Even when the primary is ready to go back into service, I may not want to revert to it for one reason or another. That means I am without a live mirror because AVS' realtime mirroring is only one direction, primary to secondary. This why I tried to state that this is not realistic environment for non-shared storage HA deployments. DRBD trying to emulate shared-storage behavior at a wrong level where in fact usage of FC/iSCSI-connected storage needs to be considered. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. When primary is repaired you want to have it on-line and retain the changes made on the secondary. Not necessarily. Even when the primary is ready to go back into service, I may not want to revert to it for one reason or another. That means I am without a live mirror because AVS' realtime mirroring is only one direction, primary to secondary. This why I tried to state that this is not realistic environment for non-shared storage HA deployments. What's not realistic? DRBD's highly flexible ability to switch roles on the fly is a huge advantage over AVS. But this is not to say AVS is not realistic. It's just a limitation. DRBD trying to emulate shared-storage behavior at a wrong level where in fact usage of FC/iSCSI-connected storage needs to be considered. This makes no sense to me. We're talking about mirroring the storage of two physical and independent systems. How did the concept of shared storage get in here? -- Maurice Volaski, [EMAIL PROTECTED] Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 19:10 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. When primary is repaired you want to have it on-line and retain the changes made on the secondary. Not necessarily. Even when the primary is ready to go back into service, I may not want to revert to it for one reason or another. That means I am without a live mirror because AVS' realtime mirroring is only one direction, primary to secondary. This why I tried to state that this is not realistic environment for non-shared storage HA deployments. What's not realistic? DRBD's highly flexible ability to switch roles on the fly is a huge advantage over AVS. But this is not to say AVS is not realistic. It's just a limitation. DRBD trying to emulate shared-storage behavior at a wrong level where in fact usage of FC/iSCSI-connected storage needs to be considered. This makes no sense to me. We're talking about mirroring the storage of two physical and independent systems. How did the concept of shared storage get in here? This is really outside of ZFS discussion now... But your point taken. If you want mirror-like behavior of your 2-node cluster, you'll get some benefits of DRBD but my point is that such solution trying to solve two problems at the same time: replication and availability, which is in my opinion plain wrong. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 19:10 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. When primary is repaired you want to have it on-line and retain the changes made on the secondary. Not necessarily. Even when the primary is ready to go back into service, I may not want to revert to it for one reason or another. That means I am without a live mirror because AVS' realtime mirroring is only one direction, primary to secondary. This why I tried to state that this is not realistic environment for non-shared storage HA deployments. What's not realistic? DRBD's highly flexible ability to switch roles on the fly is a huge advantage over AVS. But this is not to say AVS is not realistic. It's just a limitation. DRBD trying to emulate shared-storage behavior at a wrong level where in fact usage of FC/iSCSI-connected storage needs to be considered. This makes no sense to me. We're talking about mirroring the storage of two physical and independent systems. How did the concept of shared storage get in here? This is really outside of ZFS discussion now... But your point taken. If you want mirror-like behavior of your 2-node cluster, you'll get some benefits of DRBD but my point is that such solution trying to solve two problems at the same time: replication and availability, which is in my opinion plain wrong. Uh, no, DRBD addresses only replication. Linux-HA (aka Heartbeat) address availability. They can be an integrated solution and are to some degree intended that way, so I have no idea where your opinion is coming from. For replication, OpenSolaris is largely limited to using AVS, whose functionality is limited, at least relative to DRBD. But there seems to be a few options to implement availability, which should include Linux-HA itself as it should run on OpenSolaris! But relevant to the poster's initial question, ZFS is so far and away more advanced than any Linux filesystem can even dream about that it handily nullifies any disadvantage in having to run AVS. -- Maurice Volaski, [EMAIL PROTECTED] Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZPOOL Import Problem
I ran into an odd problem importing a zpool while testing avs. I was trying to simulate a drive failure, break SNDR replication, and then import the pool on the secondary. To simulate the drive failure is just offlined one of the disks in the RAIDZ set. -- pr1# zpool status pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c5t0d0s0 ONLINE 0 0 0 c5t1d0s0 ONLINE 0 0 0 c5t2d0s0 ONLINE 0 0 0 c5t3d0s0 ONLINE 0 0 0 errors: No known data errors pr1# zpool offline missing pool name usage: offline [-t] pool device ... pr1# zpool offline tank c5t0d0s0 pr1# zpool status pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c5t0d0s0 OFFLINE 0 0 0 c5t1d0s0 ONLINE 0 0 0 c5t2d0s0 ONLINE 0 0 0 c5t3d0s0 ONLINE 0 0 0 errors: No known data errors pr1# zpool export tank --- I then disabled SNDR replication. pr1# sndradm -g zfs-tank -d Disable Remote Mirror? (Y/N) [N]: Y - Then I try to import the ZPOOL on the secondary. -- pr2# zpool import pool: tank id: 9795707198744908806 state: DEGRADED status: One or more devices are offlined. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. config: tank DEGRADED raidz1 DEGRADED c5t0d0s0 OFFLINE c5t1d0s0 ONLINE c5t2d0s0 ONLINE c5t3d0s0 ONLINE pr2# zpool import tank cannot import 'tank': one or more devices is currently unavailable pr2# zpool import -f tank cannot import 'tank': one or more devices is currently unavailable pr2# --- Importing on the primary gives the same error. Anyone have any ideas? Thanks Corey ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
On Wed, 2008-09-10 at 19:42 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 19:10 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote: On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote: A disadvantage, however, is that Sun StorageTek Availability Suite (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than DRBD. For example, AVS is intended to replicate in one direction, from a primary to a secondary, whereas DRBD can switch on the fly. See http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 for details on this. I would be curious to see production environments switching direction on the fly at that low level... Usually some top-level brain does that in context of HA fail-over and so on. By switching on the fly, I mean if the primary services are taken down and then brought up on the secondary, the direction of synchronization gets reversed. That's not possible with AVS because... well, AVS actually does reverse synchronization and does it very good. It's a one-time operation that re-reverses once it completes. When primary is repaired you want to have it on-line and retain the changes made on the secondary. Not necessarily. Even when the primary is ready to go back into service, I may not want to revert to it for one reason or another. That means I am without a live mirror because AVS' realtime mirroring is only one direction, primary to secondary. This why I tried to state that this is not realistic environment for non-shared storage HA deployments. What's not realistic? DRBD's highly flexible ability to switch roles on the fly is a huge advantage over AVS. But this is not to say AVS is not realistic. It's just a limitation. DRBD trying to emulate shared-storage behavior at a wrong level where in fact usage of FC/iSCSI-connected storage needs to be considered. This makes no sense to me. We're talking about mirroring the storage of two physical and independent systems. How did the concept of shared storage get in here? This is really outside of ZFS discussion now... But your point taken. If you want mirror-like behavior of your 2-node cluster, you'll get some benefits of DRBD but my point is that such solution trying to solve two problems at the same time: replication and availability, which is in my opinion plain wrong. Uh, no, DRBD addresses only replication. Linux-HA (aka Heartbeat) address availability. They can be an integrated solution and are to some degree intended that way, so I have no idea where your opinion is coming from. Because in my opinion DRBD takes some responsibility of management layer if you will. Classic, predominant replication in HA clusters schema is primary-backup (or master-slave) and backup by definition is not necessary primary-identical system. Having said that, it is noble for DRBD to implement role switching and not a bad idea for many small deployments. For replication, OpenSolaris is largely limited to using AVS, whose functionality is limited, at least relative to DRBD. But there seems to be a few options to implement availability, which should include Linux-HA itself as it should run on OpenSolaris! Everything is implementable and I believe AVS designers thought about dynamic switching of roles, but they end up with what we have today, they likely discarded this idea. AVS not switching roles and forces IT admins to use it as primary-backup data protection service only. But relevant to the poster's initial question, ZFS is so far and away more advanced than any Linux filesystem can even dream about that it handily nullifies any disadvantage in having to run AVS. Right. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Apache module for ZFS ACL based authorization
We are currently working on a Solaris/ZFS based central file system to replace the DCE/DFS-based implementation we have had in place for over 10 years. One of the features of our previous implementation was that access to files regardless of method (CIFS, AFP, HTTP, FTP, etc) was completely controlled by the DFS ACL. Our ZFS implementation will be available by NFSv4 and CIFS, both of which respect the ACL. To provide ZFS ACL-based authorization to files via HTTP, I put together a small Apache module. The module allows for files to be either delivered without authentication required (if they are world readable) or requires authentication and restricts file delivery to users with access based on the ACL. If anyone is interested in taking a look at it, it is available from: http://www.csupomona.edu/~henson/www/projects/mod_authz_fsacl/dist/mod_authz_fsacl-0.10.tar.gz I'd appreciate any feedback, particularly about things that don't work right :). -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
Erast Benson wrote: Uh, no, DRBD addresses only replication. Linux-HA (aka Heartbeat) address availability. They can be an integrated solution and are to some degree intended that way, so I have no idea where your opinion is coming from. Because in my opinion DRBD takes some responsibility of management layer if you will. Classic, predominant replication in HA clusters schema is primary-backup (or master-slave) and backup by definition is not necessary primary-identical system. Having said that, it is noble for DRBD to implement role switching and not a bad idea for many small deployments. The problem with fully automated systems for remote replication is that they are fully automated. This opens you up to a set of failure modes that you may want to avoid, such as replication of data that you don't want to replicate. This is why most replication is used to support disaster recovery cases and the procedures wrapped around disaster recovery also consider the case where the primary data has been damaged -- and you really don't want that damage to spread. It so happens that snapshots are another method which can be used to limit the spread of damage, so there might be an opportunity here. By analogy, in the Oracle world, RAC does not replace DataGuard. For replication, OpenSolaris is largely limited to using AVS, whose functionality is limited, at least relative to DRBD. But there seems to be a few options to implement availability, which should include Linux-HA itself as it should run on OpenSolaris! I disagree, there are many ways to remotely replicate Solaris systems. TrueCopy and SRDF are perhaps the most popular, but almost all storage arrays have some sort of system. In truth, there is little market demand for fully automated solutions at the OS level because of the reasons mentioned above. Everything is implementable and I believe AVS designers thought about dynamic switching of roles, but they end up with what we have today, they likely discarded this idea. It is open source... -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
Let me drag this thread kicking and screaming back to ZFS... Use case: - We need an NFS server that can be replicated to another building to handle both scheduled powerdowns and unplanned outages. For scheduled powerdowns we'd want to fail over a week in advance, and fail back some time later. - We will use a virtual IP for seamless failover, and require that we not get stale NFS filehandles during a failover event, as world reboots are messy and expensive. - We do _not_ require synchronous replication. Async is fine. Today, there is _no_ rasonably priced solution that allows us to use ZFS for this. Yes, we have SRDF, and use it where required, but it's hideously expensive, and has its own risks (see below). Today our solution for the above is NetApp filers. We're not thrilled with everything about them, but they mostly work. One of the projects I've done is set up automated zfs replication for pure DR. But sadly this is of limited use, as Solaris / ZFS's lack of ability to maintain NFS file handles across systems is a deal killer for most of our users. AVS does not appear to support our weeks-long failover model, and exposes us to too much risk of simultaneous data loss. SRDF and Veritas replication have the same data loss risk. SRDF can easily support weeks-long a personality swap, I don't know enough about Veritas' product. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 vs AVS ?
Matt Beebe wrote: But what happens to the secondary server? Specifically to its bit-for-bit copy of Drive #2... presumably it is still good, but ZFS will offline that disk on the primary server, replicate the metadata, and when/if I promote the seconday server, it will also be running in a degraded state (ie: 3 out of 4 drives). correct? Correct. In this scenario, my replication hasn't really bought me any increased availablity... or am I missing something? No. You have an increase of availability when the entire primary node goes down, but you're not particularly safer when it comes to decreased zpools. Also, if I do chose to fail over to the secondary, can I just to a scrub the broken drive (which isn't really broken, but the zpool would be inconsistent at some level with the other online drives) and get back to full speed quickly? or will I always have to wait until one of the servers resilvers itself (from scratch?), and re-replicates itself?? I have not tested this scenario, so I can't say anything about this. -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 [EMAIL PROTECTED] - http://web.de/ 11 Internet AG Brauerstraße 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD
The problem with fully automated systems for remote replication is that they are fully automated. This opens you up to a set of failure modes that you may want to avoid, such as replication of data that you don't want to replicate. This is why most replication is used to support disaster recovery cases and the procedures wrapped around disaster recovery also consider the case where the primary data has been damaged -- and you really don't want that damage to spread. I seem to be misrepresenting how automatic DRBD is because it offers a rich set of policies and strategies to deal with faults. For one thing, it (along with heartbeat) will bend over backwards to avoid disasters such as split-brain. It's more likely that one would get split-brain by manually executing the wrong command than by DRBD's (or heartbeat's) doing. I disagree, there are many ways to remotely replicate Solaris systems. TrueCopy and SRDF are perhaps the most popular, but almost all I was referring to cost free options. -- Maurice Volaski, [EMAIL PROTECTED] Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss