Re: [zfs-discuss] cluster vs nfs
Instead we've switched to Linux and DRBD. And if that doesn't get me sympathy I don't know what will. SvSAN does something similar and it does it rather well, I think. http://www.stormagic.com/SvSAN.php ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On Thu, Apr 26, 2012 at 12:10 AM, Richard Elling richard.ell...@gmail.com wrote: On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote: Reboot requirement is a lame client implementation. And lame protocol design. You could possibly migrate read-write NFSv3 on the fly by preserving FHs and somehow updating the clients to go to the new server (with a hiccup in between, no doubt), but only entire shares at a time -- you could not migrate only part of a volume with NFSv3. Of course, having migration support in the protocol does not equate to getting it in the implementation, but it's certainly a good step in that direction. You are correct, a ZFS send/receive will result in different file handles on the receiver, just like rsync, tar, ufsdump+ufsrestore, etc. That's understandable for NFSv2 and v3, but for v4 there's no reason that an NFSv4 server stack and ZFS could not arrange to preserve FHs (if, perhaps, at the price of making the v4 FHs rather large). Although even for v3 it should be possible for servers in a cluster to arrange to preserve devids... Bottom line: live migration needs to be built right into the protocol. For me one of the exciting things about Lustre was/is the idea that you could just have a single volume where all new data (and metadata) is distributed evenly as you go. Need more storage? Plug it in, either to an existing head or via a new head, then flip a switch and there it is. No need to manage allocation. Migration may still be needed, both within a cluster and between clusters, but that's much more manageable when you have a protocol where data locations can be all over the place in a completely transparent manner. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
I jump into this loop with different alternative -- ip-based block device. And I saw few successful cases with HAST + UCARP + ZFS + FreeBSD. If zfsonlinux is robust enough, trying DRBD + PACEMAKER + ZFS + LINUX is definitely encouraged. Thanks. Fred -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Nico Williams Sent: 星期四, 四月 26, 2012 14:00 To: Richard Elling Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] cluster vs nfs On Thu, Apr 26, 2012 at 12:10 AM, Richard Elling richard.ell...@gmail.com wrote: On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote: Reboot requirement is a lame client implementation. And lame protocol design. You could possibly migrate read-write NFSv3 on the fly by preserving FHs and somehow updating the clients to go to the new server (with a hiccup in between, no doubt), but only entire shares at a time -- you could not migrate only part of a volume with NFSv3. Of course, having migration support in the protocol does not equate to getting it in the implementation, but it's certainly a good step in that direction. You are correct, a ZFS send/receive will result in different file handles on the receiver, just like rsync, tar, ufsdump+ufsrestore, etc. That's understandable for NFSv2 and v3, but for v4 there's no reason that an NFSv4 server stack and ZFS could not arrange to preserve FHs (if, perhaps, at the price of making the v4 FHs rather large). Although even for v3 it should be possible for servers in a cluster to arrange to preserve devids... Bottom line: live migration needs to be built right into the protocol. For me one of the exciting things about Lustre was/is the idea that you could just have a single volume where all new data (and metadata) is distributed evenly as you go. Need more storage? Plug it in, either to an existing head or via a new head, then flip a switch and there it is. No need to manage allocation. Migration may still be needed, both within a cluster and between clusters, but that's much more manageable when you have a protocol where data locations can be all over the place in a completely transparent manner. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 2012-04-26 2:20, Ian Collins wrote: On 04/26/12 09:54 AM, Bob Friesenhahn wrote: On Wed, 25 Apr 2012, Rich Teer wrote: Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. This solution would limit bandwidth to that available from that single server. With the cluster approach, the objective is for each machine in the cluster to primarily access files which are stored locally. Whole files could be moved as necessary. Distributed software building faces similar issues, but I've found once the common files have been read (and cached) by each node, network traffic becomes one way (to the file server). I guess that topology works well when most access to shared data is read. Which reminds me: older Solarises used to have a nifty-looking (via descriptions) cachefs, apparently to speed up NFS clients and reduce traffic, which we did not get to really use in real life. AFAIK Oracle EOLed it for Solaris 11, and I am not sure it is in illumos either. Does caching in current Solaris/illumos NFS client replace those benefits, or did the project have some merits of its own (like caching into local storage of client, so that the cache was not empty after reboot)? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 26 April, 2012 - Jim Klimov sent me these 1,6K bytes: Which reminds me: older Solarises used to have a nifty-looking (via descriptions) cachefs, apparently to speed up NFS clients and reduce traffic, which we did not get to really use in real life. AFAIK Oracle EOLed it for Solaris 11, and I am not sure it is in illumos either. Does caching in current Solaris/illumos NFS client replace those benefits, or did the project have some merits of its own (like caching into local storage of client, so that the cache was not empty after reboot)? It had its share of merits and bugs. /Tomas -- Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 04/26/12 10:12 PM, Jim Klimov wrote: On 2012-04-26 2:20, Ian Collins wrote: On 04/26/12 09:54 AM, Bob Friesenhahn wrote: On Wed, 25 Apr 2012, Rich Teer wrote: Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. This solution would limit bandwidth to that available from that single server. With the cluster approach, the objective is for each machine in the cluster to primarily access files which are stored locally. Whole files could be moved as necessary. Distributed software building faces similar issues, but I've found once the common files have been read (and cached) by each node, network traffic becomes one way (to the file server). I guess that topology works well when most access to shared data is read. Which reminds me: older Solarises used to have a nifty-looking (via descriptions) cachefs, apparently to speed up NFS clients and reduce traffic, which we did not get to really use in real life. AFAIK Oracle EOLed it for Solaris 11, and I am not sure it is in illumos either. I don't think it even made it into Solaris 10.. I used to use it with Solaris 8 back in the days when 100Mb switches were exotic! Does caching in current Solaris/illumos NFS client replace those benefits, or did the project have some merits of its own (like caching into local storage of client, so that the cache was not empty after reboot)? It did have local backing store, but my current desktop has more RAM than that Solaris 8 box had disk and my network is 100 times faster, so it doesn't really matter any more. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 2012-04-26 14:47, Ian Collins wrote: I don't think it even made it into Solaris 10. Actually, I see the kernel modules available in both Solaris 10, several builds of OpenSolaris SXCE and an illumos-current. $ find /kernel/ /platform/ /usr/platform/ /usr/kernel/ | grep -i cachefs /kernel/fs/amd64/cachefs /kernel/fs/cachefs /platform/i86pc/amd64/archive_cache/kernel/fs/amd64/cachefs /platform/i86pc/archive_cache/kernel/fs/cachefs $ uname -a SunOS summit-blade5 5.11 oi_151a2 i86pc i386 i86pc It did have local backing store, but my current desktop has more RAM than that Solaris 8 box had disk and my network is 100 times faster, so it doesn't really matter any more. Well, it depends on your working set size. A matter of scale. If those researchers dig into their terabyte of data each (each seems important here for conflict/sync resolution), on a gigabit-connected workstation, it would still take them a couple of minutes to just download the dataset from the server, let alone random-seek around it afterwards. And you can easily have a local backing store for such cachefs (or equivalent) today, even on an SSD or a few. Just my 2c for possible build of that cluster they wanted, and perhaps some evolution/revival of cachefs with today's realities and demands - if it's deemed appropriate for their task. MY THEORY based on marketing info: I believe they could make a central fileserver with enough data space for everyone, and each worker would use cachefs+nfs to access it. Their actual worksets would be stored locally in the cachefs backing stores on each workstation, and not abuse networking traffic and the fileserver until there are some writes to be replicated into central storage. They would have approximately one common share to mount ;) //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 04/26/12 04:17 PM, Ian Collins wrote: On 04/26/12 10:12 PM, Jim Klimov wrote: On 2012-04-26 2:20, Ian Collins wrote: On 04/26/12 09:54 AM, Bob Friesenhahn wrote: On Wed, 25 Apr 2012, Rich Teer wrote: Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. This solution would limit bandwidth to that available from that single server. With the cluster approach, the objective is for each machine in the cluster to primarily access files which are stored locally. Whole files could be moved as necessary. Distributed software building faces similar issues, but I've found once the common files have been read (and cached) by each node, network traffic becomes one way (to the file server). I guess that topology works well when most access to shared data is read. Which reminds me: older Solarises used to have a nifty-looking (via descriptions) cachefs, apparently to speed up NFS clients and reduce traffic, which we did not get to really use in real life. AFAIK Oracle EOLed it for Solaris 11, and I am not sure it is in illumos either. I don't think it even made it into Solaris 10.. I used to use it with Solaris 8 back in the days when 100Mb switches were exotic! cachefs is present in Solaris 10. It is EOL'd in S11. Does caching in current Solaris/illumos NFS client replace those benefits, or did the project have some merits of its own (like caching into local storage of client, so that the cache was not empty after reboot)? It did have local backing store, but my current desktop has more RAM than that Solaris 8 box had disk and my network is 100 times faster, so it doesn't really matter any more. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On Thu, Apr 26, 2012 at 4:34 AM, Deepak Honnalli deepak.honna...@oracle.com wrote: cachefs is present in Solaris 10. It is EOL'd in S11. And for those who need/want to use Linux, the equivalent is FSCache. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 4/25/12 10:10 PM, Richard Elling wrote: On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote: And applications that don't pin the mount points, and can be idled during the migration. If your migration is due to a dead server, and you have pending writes, you have no choice but to reboot the client(s) (and accept the data loss, of course). Reboot requirement is a lame client implementation. Then it's a lame client misfeature of every single NFS client I've ever seen, assuming the mount is hard (and if a RW mount isn't, you're crazy). To bring this back to ZFS, sadly ZFS doesn't support NFS HA without shared / replicated storage, as ZFS send / recv can't preserve the data necessary to have the same NFS filehandle, so failing over to a replica causes stale NFS filehandles on the clients. Which frustrates me, because the technology to do NFS shadow copy (which is possible in Solaris - not sure about the open source forks) is a superset of that needed to do HA, but can't be used for HA. You are correct, a ZFS send/receive will result in different file handles on the receiver, just like rsync, tar, ufsdump+ufsrestore, etc. But unlike SnapMirror. It is possible to preserve NFSv[23] file handles in a ZFS environment using lower-level replication like TrueCopy, SRDF, AVS, etc. But those have other architectural issues (aka suckage). I am open to looking at what it would take to make a ZFS-friendly replicator that would do this, but need to know the business case [1] See below. The beauty of AFS and others, is that the file handle equivalent is not a number. NFSv4 also has this feature. So I have a little bit of heartburn when people say, NFS sux because it has a feature I won't use because I won't upgrade to NFSv4 even though it was released 10 years ago. NFSv4 implementations are still iffy. We've tried it - it hasn't been stable (on Linux, at least). However we haven't tested RHEL6 yet. Are you saying that if we have a Solaris NFSv4 server serving Solaris and Linux NFSv4 clients with ZFS send/recv replication, that we can flip a VIP to point to the replica target and the clients won't get stale filehandles? Or that this is not the case today, but would be easier to make the case than for v[23] filehandles? [1] FWIW, you can build a metropolitan area ZFS-based, shared storage cluster today for about 1/4 the cost of the NetApp Stretch Metro software license. There is more than one way to skin a cat :-) So if the idea is to get even lower than 1/4 the NetApp cost, it feels like a race to the bottom. Shared storage is evil (in this context). Corrupt the storage, and you have no DR. That goes for all block-based replication products as well. This is not acceptable risk. I keep looking for a non-block-based replication system that allows seamless client failover, and can't find anything but NetApp SnapMirror. Please tell me I haven't been looking hard enough. Lustre et. al. don't support Solaris clients (which I find hilarious as Oracle owns it). I could build something on top of / under AFS for RW replication if I tried hard, but it would be fairly fragile. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
Shared storage is evil (in this context). Corrupt the storage, and you have no DR. Now I am confused. We're talking about storage which can be used for failover, aren't we? In which case we are talking about HA not DR. That goes for all block-based replication products as well. This is not acceptable risk. I keep looking for a non-block-based replication system that allows seamless client failover, and can't find anything but NetApp SnapMirror. I don't know SnapMirror, so I may be mistaken, but I don't see how you can have non-synchronous replication which can allow for seamless client failover (in the general case). Technically this doesn't have to be block based, but I've not seen anything which wasn't. Synchronous replication pretty much precludes DR (again, I can think of theoretical ways around this, but have never come across anything in practice). Carson Julian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 4/26/12 2:17 PM, J.P. King wrote: Shared storage is evil (in this context). Corrupt the storage, and you have no DR. Now I am confused. We're talking about storage which can be used for failover, aren't we? In which case we are talking about HA not DR. Depends on how you define DR - we have shared storage HA in each datacenter (NetApp cluster), and replication between them in case we lose a datacenter (all clients on the MAN hit the same cluster unless we do a DR failover). The latter is what I'm calling DR. That goes for all block-based replication products as well. This is not acceptable risk. I keep looking for a non-block-based replication system that allows seamless client failover, and can't find anything but NetApp SnapMirror. I don't know SnapMirror, so I may be mistaken, but I don't see how you can have non-synchronous replication which can allow for seamless client failover (in the general case). Technically this doesn't have to be block based, but I've not seen anything which wasn't. Synchronous replication pretty much precludes DR (again, I can think of theoretical ways around this, but have never come across anything in practice). seamless is an over-statement, I agree. NetApp has synchronous SnapMirror (which is only mostly synchronous...). Worst case, clients may see a filesystem go backwards in time, but to a point-in-time consistent state. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
Depends on how you define DR - we have shared storage HA in each datacenter (NetApp cluster), and replication between them in case we lose a datacenter (all clients on the MAN hit the same cluster unless we do a DR failover). The latter is what I'm calling DR. It's what I call HA. DR is what snapshots or backups can help you towards. HA can be used to reduce the likelyhood of needing to use DR measures of course. seamless is an over-statement, I agree. NetApp has synchronous SnapMirror (which is only mostly synchronous...). Worst case, clients may see a filesystem go backwards in time, but to a point-in-time consistent state. Tell that to my swapfile! Here we use synchronous mirroring for our VM systems storage. Having that go back in time will cause unpredictable problems. Worst case is pretty bad! It may be that for your purposes you can treat your filesystems the way you do safely - although you'd better not have any in-memory caching of files, obviously - however lots and lots of people cannot. I believe that we can do seamless replication and failover of NFS/ZFS, except that it is very painful to manage, iSCSI (the only way I know to do mirroring in this context) caused us a lot of pain last time we used it, and the way Oracle treats Solaris and its support has made it largely untenable for us. Instead we've switched to Linux and DRBD. And if that doesn't get me sympathy I don't know what will. Carson Julian -- Julian King Computer Officer, University of Cambridge, Unix Support ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On Thu, Apr 26, 2012 at 5:45 PM, Carson Gaspar car...@taltos.org wrote: On 4/26/12 2:17 PM, J.P. King wrote: I don't know SnapMirror, so I may be mistaken, but I don't see how you can have non-synchronous replication which can allow for seamless client failover (in the general case). Technically this doesn't have to be block based, but I've not seen anything which wasn't. Synchronous replication pretty much precludes DR (again, I can think of theoretical ways around this, but have never come across anything in practice). seamless is an over-statement, I agree. NetApp has synchronous SnapMirror (which is only mostly synchronous...). Worst case, clients may see a filesystem go backwards in time, but to a point-in-time consistent state. Sure, if we assume apps make proper use of O_EXECL, O_APPEND, link(2)/unlink(2)/rename(2), sync(2), fsync(2), and fdatasync(3C) and can roll their own state back on their own. Databases typically know how to do that (e.g., SQLite3). Most apps? Doubtful. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On Apr 25, 2012, at 11:00 PM, Nico Williams wrote: On Thu, Apr 26, 2012 at 12:10 AM, Richard Elling richard.ell...@gmail.com wrote: On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote: Reboot requirement is a lame client implementation. And lame protocol design. You could possibly migrate read-write NFSv3 on the fly by preserving FHs and somehow updating the clients to go to the new server (with a hiccup in between, no doubt), but only entire shares at a time -- you could not migrate only part of a volume with NFSv3. Requirements, requirements, requirements... boil the ocean while we're at it? :-) Of course, having migration support in the protocol does not equate to getting it in the implementation, but it's certainly a good step in that direction. NFSv4 has support for migrating volumes and managing the movement of file handles. The technique includes filehandle expiry, similar to methods used in other distributed FSs. You are correct, a ZFS send/receive will result in different file handles on the receiver, just like rsync, tar, ufsdump+ufsrestore, etc. That's understandable for NFSv2 and v3, but for v4 there's no reason that an NFSv4 server stack and ZFS could not arrange to preserve FHs (if, perhaps, at the price of making the v4 FHs rather large). This is already in the v4 spec. Although even for v3 it should be possible for servers in a cluster to arrange to preserve devids... We've been doing that for many years. Bottom line: live migration needs to be built right into the protocol. Agree, and volume migration support is already in the NFSv4 spec. For me one of the exciting things about Lustre was/is the idea that you could just have a single volume where all new data (and metadata) is distributed evenly as you go. Need more storage? Plug it in, either to an existing head or via a new head, then flip a switch and there it is. No need to manage allocation. Migration may still be needed, both within a cluster and between clusters, but that's much more manageable when you have a protocol where data locations can be all over the place in a completely transparent manner. Many distributed file systems do this, at the cost of being not quite POSIX-ish. In the brave new world of storage vmotion, nosql, and distributed object stores, it is not clear to me that coding to a POSIX file system is a strong requirement. Perhaps people are so tainted by experiences with v2 and v3 that we can explain the non-migration to v4 as being due to poor marketing? As a leader of NFS, Sun had unimpressive marketing. -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On Thu, Apr 26, 2012 at 12:37 PM, Richard Elling richard.ell...@gmail.com wrote: [...] NFSv4 had migration in the protocol (excluding protocols between servers) from the get-go, but it was missing a lot (FedFS) and was not implemented until recently. I've no idea what clients and servers support it adequately besides Solaris 11, though that's just my fault (not being informed). It's taken over a decade to get to where we have any implementations of NFSv4 migration. For me one of the exciting things about Lustre was/is the idea that you could just have a single volume where all new data (and metadata) is distributed evenly as you go. Need more storage? Plug it in, either to an existing head or via a new head, then flip a switch and there it is. No need to manage allocation. Migration may still be needed, both within a cluster and between clusters, but that's much more manageable when you have a protocol where data locations can be all over the place in a completely transparent manner. Many distributed file systems do this, at the cost of being not quite POSIX-ish. Well, Lustre does POSIX semantics just fine, including cache coherency (as opposed to NFS' close-to-open coherency, which is decidedly non-POSIX). In the brave new world of storage vmotion, nosql, and distributed object stores, it is not clear to me that coding to a POSIX file system is a strong requirement. Well, I don't quite agree. I'm very suspicious of eventually-consistent. I'm not saying that the enormous DBs that eBay and such run should sport SQL and ACID semantics -- I'm saying that I think we can do much better than eventually-consistent (and no-language) while not paying the steep price that ACID requires. I'm not alone in this either. The trick is to find the right compromise. Close-to-open semantics works out fine for NFS, but O_APPEND is too wonderful not to have (ditto O_EXCL, which NFSv2 did not have; v4 has O_EXCL, but not O_APPEND). Whoever first delivers the right compromise in distributed DB semantics stands to make a fortune. Perhaps people are so tainted by experiences with v2 and v3 that we can explain the non-migration to v4 as being due to poor marketing? As a leader of NFS, Sun had unimpressive marketing. Sun did not do too much to improve NFS in the 90s, not compared to the v4 work that only really started paying off only too recently. And then since Sun had lost the client space by then it doesn't mean all that much to have the best server if the clients aren't able to take advantage of the server's best features for lack of client implementation. Basically, Sun's ZFS, DTrace, SMF, NFSv4, Zones, and other amazing innovations came a few years too late to make up for the awful management that Sun was saddled with. But for all the decidedly awful things Sun management did (or didn't do), the worst was terminating Sun PS (yes, worse that all the non-marketing, poor marketing, poor acquisitions, poor strategy, and all the rest including truly epic mistakes like icing Solaris on x86 a decade ago). One of the worst outcomes of the Sun debacle is that now there's a bevy of senior execs who think the worst thing Sun did was to open source Solaris and Java -- which isn't to say that Sun should have open sourced as much as it did, or that open source is an end in itself, but that open sourcing these things was legitimate a business tool with very specific goals in mind in each case, and which had nothing to do with the sinking of the company. Or maybe that's one of the best outcomes, because the good news about it is that those who learn the right lessons (in that case: that open source is a legitimate business tool that is sometimes, often even, a great mind-share building tool) will be in the minority, and thus will have a huge advantage over their competition. That's another thing Sun did not learn until it was too late: mind-share matters enormously to a software company. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
11:26am, Richard Elling wrote: On Apr 25, 2012, at 10:59 AM, Paul Archer wrote: The point of a clustered filesystem was to be able to spread our data out among all nodes and still have access from any node without having to run NFS. Size of the data set (once you get past the point where you can replicate it on each node) is irrelevant. Interesting, something more complex than NFS to avoid the complexities of NFS? ;-) We have data coming in on multiple nodes (with local storage) that is needed on other multiple nodes. The only way to do that with NFS would be with a matrix of cross mounts that would be truly scary. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
I agree, you need something like AFS, Lustre, or pNFS. And/or an NFS proxy to those. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
And he will still need an underlying filesystem like ZFS for them :) -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Nico Williams Sent: 25 April 2012 20:32 To: Paul Archer Cc: ZFS-Discuss mailing list Subject: Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD) I agree, you need something like AFS, Lustre, or pNFS. And/or an NFS proxy to those. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Apr 25, 2012, at 12:04 PM, Paul Archer wrote: 11:26am, Richard Elling wrote: On Apr 25, 2012, at 10:59 AM, Paul Archer wrote: The point of a clustered filesystem was to be able to spread our data out among all nodes and still have access from any node without having to run NFS. Size of the data set (once you get past the point where you can replicate it on each node) is irrelevant. Interesting, something more complex than NFS to avoid the complexities of NFS? ;-) We have data coming in on multiple nodes (with local storage) that is needed on other multiple nodes. The only way to do that with NFS would be with a matrix of cross mounts that would be truly scary. Ignoring lame NFS clients, how is that architecture different than what you would have with any other distributed file system? If all nodes share data to all other nodes, then...? -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
2:20pm, Richard Elling wrote: On Apr 25, 2012, at 12:04 PM, Paul Archer wrote: Interesting, something more complex than NFS to avoid the complexities of NFS? ;-) We have data coming in on multiple nodes (with local storage) that is needed on other multiple nodes. The only way to do that with NFS would be with a matrix of cross mounts that would be truly scary. Ignoring lame NFS clients, how is that architecture different than what you would have with any other distributed file system? If all nodes share data to all other nodes, then...? -- richard Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace.___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Wed, 25 Apr 2012, Paul Archer wrote: Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace. Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. -- Rich Teer, Publisher Vinylphile Magazine www.vinylphilemag.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Wed, Apr 25, 2012 at 4:26 PM, Paul Archer p...@paularcher.org wrote: 2:20pm, Richard Elling wrote: Ignoring lame NFS clients, how is that architecture different than what you would have with any other distributed file system? If all nodes share data to all other nodes, then...? Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace. To be fair NFSv4 now has a distributed namespace scheme so you could still have a single mount on the client. That said, some DFSes have better properties, such as striping of data across sets of servers, aggressive caching, and various choices of semantics (e.g., Lustre tries hard to give you POSIX cache coherency semantics). Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Wed, 25 Apr 2012, Rich Teer wrote: Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. This solution would limit bandwidth to that available from that single server. With the cluster approach, the objective is for each machine in the cluster to primarily access files which are stored locally. Whole files could be moved as necessary. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 04/26/12 09:54 AM, Bob Friesenhahn wrote: On Wed, 25 Apr 2012, Rich Teer wrote: Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. This solution would limit bandwidth to that available from that single server. With the cluster approach, the objective is for each machine in the cluster to primarily access files which are stored locally. Whole files could be moved as necessary. Distributed software building faces similar issues, but I've found once the common files have been read (and cached) by each node, network traffic becomes one way (to the file server). I guess that topology works well when most access to shared data is read. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Apr 25, 2012, at 2:26 PM, Paul Archer wrote: 2:20pm, Richard Elling wrote: On Apr 25, 2012, at 12:04 PM, Paul Archer wrote: Interesting, something more complex than NFS to avoid the complexities of NFS? ;-) We have data coming in on multiple nodes (with local storage) that is needed on other multiple nodes. The only way to do that with NFS would be with a matrix of cross mounts that would be truly scary. Ignoring lame NFS clients, how is that architecture different than what you would have with any other distributed file system? If all nodes share data to all other nodes, then...? -- richard Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace. Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents). FWIW, automounters were invented 20+ years ago to handle this in a nearly seamless manner. Today, we have DFS from Microsoft and NFS referrals that almost eliminate the need for automounter-like solutions. Also, it is not unusual for a NFS environment to have 10,000+ mounts with thousands of mounts on each server. No big deal, happens every day. On Apr 25, 2012, at 2:53 PM, Nico Williams wrote: To be fair NFSv4 now has a distributed namespace scheme so you could still have a single mount on the client. That said, some DFSes have better properties, such as striping of data across sets of servers, aggressive caching, and various choices of semantics (e.g., Lustre tries hard to give you POSIX cache coherency semantics). I think this is where the real value is. NFS CIFS are intentionally generic and have caching policies that are favorably described as generic. For special-purpose workloads there can be advantages to having policies more explicitly applicable to the workload. -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
2:34pm, Rich Teer wrote: On Wed, 25 Apr 2012, Paul Archer wrote: Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace. Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. That assumes the data set will fit on one machine, and that machine won't be a performance bottleneck. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Wed, Apr 25, 2012 at 5:22 PM, Richard Elling richard.ell...@gmail.com wrote: Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents). FWIW, automounters were invented 20+ years ago to handle this in a nearly seamless manner. Today, we have DFS from Microsoft and NFS referrals that almost eliminate the need for automounter-like solutions. I disagree vehemently. automount is a disaster because you need to synchronize changes with all those clients. That's not realistic. I've built a large automount-based namespace, replete with a distributed configuration system for setting the environment variables available to the automounter. I can tell you this: the automounter does not scale, and it certainly does not avoid the need for outages when storage migrates. With server-side, referral-based namespace construction that problem goes away, and the whole thing can be transparent w.r.t. migrations. For my money the key features a DFS must have are: - server-driven namespace construction - data migration without having to restart clients, reconfigure them, or do anything at all to them - aggressive caching - striping of file data for HPC and media environments - semantics that ultimately allow multiple processes on disparate clients to cooperate (i.e., byte range locking), but I don't think full POSIX semantics are needed (that said, I think O_EXCL is necessary, and it'd be very nice to have O_APPEND, though the latter is particularly difficult to implement and painful when there's contention if you stripe file data across multiple servers) Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 04/26/12 10:34 AM, Paul Archer wrote: 2:34pm, Rich Teer wrote: On Wed, 25 Apr 2012, Paul Archer wrote: Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace. Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. That assumes the data set will fit on one machine, and that machine won't be a performance bottleneck. Aren't those general considerations when specifying a file server? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Apr 25, 2012, at 3:36 PM, Nico Williams wrote: On Wed, Apr 25, 2012 at 5:22 PM, Richard Elling richard.ell...@gmail.com wrote: Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents). FWIW, automounters were invented 20+ years ago to handle this in a nearly seamless manner. Today, we have DFS from Microsoft and NFS referrals that almost eliminate the need for automounter-like solutions. I disagree vehemently. automount is a disaster because you need to synchronize changes with all those clients. That's not realistic. Really? I did it with NIS automount maps and 600+ clients back in 1991. Other than the obvious problems with open files, has it gotten worse since then? I've built a large automount-based namespace, replete with a distributed configuration system for setting the environment variables available to the automounter. I can tell you this: the automounter does not scale, and it certainly does not avoid the need for outages when storage migrates. Storage migration is much more difficult with NFSv2, NFSv3, NetWare, etc. With server-side, referral-based namespace construction that problem goes away, and the whole thing can be transparent w.r.t. migrations. Agree, but we didn't have NFSv4 back in 1991 :-) Today, of course, this is how one would design it if you had to design a new DFS today. For my money the key features a DFS must have are: - server-driven namespace construction - data migration without having to restart clients, reconfigure them, or do anything at all to them - aggressive caching - striping of file data for HPC and media environments - semantics that ultimately allow multiple processes on disparate clients to cooperate (i.e., byte range locking), but I don't think full POSIX semantics are needed Almost any of the popular nosql databases offer this and more. The movement away from POSIX-ish DFS and storing data in traditional files is inevitable. Even ZFS is a object store at its core. (that said, I think O_EXCL is necessary, and it'd be very nice to have O_APPEND, though the latter is particularly difficult to implement and painful when there's contention if you stripe file data across multiple servers) +1 -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On Wed, Apr 25, 2012 at 5:42 PM, Ian Collins i...@ianshome.com wrote: Aren't those general considerations when specifying a file server? There are Lustre clusters with thousands of nodes, hundreds of them being servers, and high utilization rates. Whatever specs you might have for one server head will not meet the demand that hundreds of the same can. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling richard.ell...@gmail.com wrote: On Apr 25, 2012, at 3:36 PM, Nico Williams wrote: I disagree vehemently. automount is a disaster because you need to synchronize changes with all those clients. That's not realistic. Really? I did it with NIS automount maps and 600+ clients back in 1991. Other than the obvious problems with open files, has it gotten worse since then? Nothing's changed. Automounter + data migration - rebooting clients (or close enough to rebooting). I.e., outage. Storage migration is much more difficult with NFSv2, NFSv3, NetWare, etc. But not with AFS. And spec-wise not with NFSv4 (though I don't know if/when all NFSv4 clients will properly support migration, just that the protocol and some servers do). With server-side, referral-based namespace construction that problem goes away, and the whole thing can be transparent w.r.t. migrations. Yes. Agree, but we didn't have NFSv4 back in 1991 :-) Today, of course, this is how one would design it if you had to design a new DFS today. Indeed, that's why I built an automounter solution in 1996 (that's still in use, I'm told). Although to be fair AFS existed back then and had global namespace and data migration back then, and was mature. It's taken NFS that long to catch up... [...] Almost any of the popular nosql databases offer this and more. The movement away from POSIX-ish DFS and storing data in traditional files is inevitable. Even ZFS is a object store at its core. I agree. Except that there are applications where large octet streams are needed. HPC, media come to mind. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
Tomorrow, Ian Collins wrote: On 04/26/12 10:34 AM, Paul Archer wrote: That assumes the data set will fit on one machine, and that machine won't be a performance bottleneck. Aren't those general considerations when specifying a file server? I suppose. But I meant specifically that our data will not fit on one single machine, and we are relying on spreading it across more nodes to get it on more spindles as well. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams n...@cryptonector.com wrote: On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling richard.ell...@gmail.com wrote: On Apr 25, 2012, at 3:36 PM, Nico Williams wrote: I disagree vehemently. automount is a disaster because you need to synchronize changes with all those clients. That's not realistic. Really? I did it with NIS automount maps and 600+ clients back in 1991. Other than the obvious problems with open files, has it gotten worse since then? Nothing's changed. Automounter + data migration - rebooting clients (or close enough to rebooting). I.e., outage. Uhhh, not if you design your automounter architecture correctly and (as Richard said) have NFS clients that are not lame to which I'll add, automunters that actually work as advertised. I was designing automount architectures that permitted dynamic changes with minimal to no outages in the late 1990's. I only had a little over 100 clients (most of which were also servers) and NIS+ (NIS ver. 3) to distribute the indirect automount maps. I also had to _redesign_ a number of automount strategies that were built by people who thought that using direct maps for everything was a good idea. That _was_ a pain in the a** due to the changes needed at the applications to point at a different hierarchy. It all depends on _what_ the application is doing. Something that opens and locks a file and never releases the lock or closes the file until the application exits will require a restart of the application with an automounter / NFS approach. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On 4/25/12 6:57 PM, Paul Kraus wrote: On Wed, Apr 25, 2012 at 9:07 PM, Nico Williamsn...@cryptonector.com wrote: On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling richard.ell...@gmail.com wrote: Nothing's changed. Automounter + data migration - rebooting clients (or close enough to rebooting). I.e., outage. Uhhh, not if you design your automounter architecture correctly and (as Richard said) have NFS clients that are not lame to which I'll add, automunters that actually work as advertised. I was designing And applications that don't pin the mount points, and can be idled during the migration. If your migration is due to a dead server, and you have pending writes, you have no choice but to reboot the client(s) (and accept the data loss, of course). Which is why we use AFS for RO replicated data, and NetApp clusters with SnapMirror and VIPs for RW data. To bring this back to ZFS, sadly ZFS doesn't support NFS HA without shared / replicated storage, as ZFS send / recv can't preserve the data necessary to have the same NFS filehandle, so failing over to a replica causes stale NFS filehandles on the clients. Which frustrates me, because the technology to do NFS shadow copy (which is possible in Solaris - not sure about the open source forks) is a superset of that needed to do HA, but can't be used for HA. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
On Wed, Apr 25, 2012 at 8:57 PM, Paul Kraus pk1...@gmail.com wrote: On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams n...@cryptonector.com wrote: Nothing's changed. Automounter + data migration - rebooting clients (or close enough to rebooting). I.e., outage. Uhhh, not if you design your automounter architecture correctly and (as Richard said) have NFS clients that are not lame to which I'll add, automunters that actually work as advertised. I was designing automount architectures that permitted dynamic changes with minimal to no outages in the late 1990's. I only had a little over 100 clients (most of which were also servers) and NIS+ (NIS ver. 3) to distribute the indirect automount maps. Further below you admit that you're talking about read-only data, effectively. But the world is not static. Sure, *code* is by and large static, and indeed, we segregated data by whether it was read-only (code, historical data) or not (application data, home directories). We were able to migrated *read-only* data with no outages. But for the rest? Yeah, there were always outages. Of course, we had a periodic maintenance window, with all systems rebooting within a short period, and this meant that some data migration outages were not noticeable, but they were real. I also had to _redesign_ a number of automount strategies that were built by people who thought that using direct maps for everything was a good idea. That _was_ a pain in the a** due to the changes needed at the applications to point at a different hierarchy. We used indirect maps almost exclusively. Moreover, we used hierarchical automount entries, and even -autofs mounts. We also used environment variables to control various things, such as which servers to mount what from (this was particularly useful for spreading the load on read-only static data). We used practically every feature of the automounter except for executable maps (and direct maps, when we eventually stopped using those). It all depends on _what_ the application is doing. Something that opens and locks a file and never releases the lock or closes the file until the application exits will require a restart of the application with an automounter / NFS approach. No kidding! In the real world such applications exist and get used. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote: On 4/25/12 6:57 PM, Paul Kraus wrote: On Wed, Apr 25, 2012 at 9:07 PM, Nico Williamsn...@cryptonector.com wrote: On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling richard.ell...@gmail.com wrote: Nothing's changed. Automounter + data migration - rebooting clients (or close enough to rebooting). I.e., outage. Uhhh, not if you design your automounter architecture correctly and (as Richard said) have NFS clients that are not lame to which I'll add, automunters that actually work as advertised. I was designing And applications that don't pin the mount points, and can be idled during the migration. If your migration is due to a dead server, and you have pending writes, you have no choice but to reboot the client(s) (and accept the data loss, of course). Reboot requirement is a lame client implementation. Which is why we use AFS for RO replicated data, and NetApp clusters with SnapMirror and VIPs for RW data. To bring this back to ZFS, sadly ZFS doesn't support NFS HA without shared / replicated storage, as ZFS send / recv can't preserve the data necessary to have the same NFS filehandle, so failing over to a replica causes stale NFS filehandles on the clients. Which frustrates me, because the technology to do NFS shadow copy (which is possible in Solaris - not sure about the open source forks) is a superset of that needed to do HA, but can't be used for HA. You are correct, a ZFS send/receive will result in different file handles on the receiver, just like rsync, tar, ufsdump+ufsrestore, etc. Do you mean the Sun ZFS Storage 7000 Shadow Migration feature? This is not a HA feature, it is an interposition architecture. It is possible to preserve NFSv[23] file handles in a ZFS environment using lower-level replication like TrueCopy, SRDF, AVS, etc. But those have other architectural issues (aka suckage). I am open to looking at what it would take to make a ZFS-friendly replicator that would do this, but need to know the business case [1] The beauty of AFS and others, is that the file handle equivalent is not a number. NFSv4 also has this feature. So I have a little bit of heartburn when people say, NFS sux because it has a feature I won't use because I won't upgrade to NFSv4 even though it was released 10 years ago. As Nico points out, there are cases where you really need a Lustre, Ceph, Gluster, or other parallel file system. That is not the design point for ZFS's ZPL or volume interfaces. [1] FWIW, you can build a metropolitan area ZFS-based, shared storage cluster today for about 1/4 the cost of the NetApp Stretch Metro software license. There is more than one way to skin a cat :-) So if the idea is to get even lower than 1/4 the NetApp cost, it feels like a race to the bottom. -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss