> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Karl Wagner > > Consider the situation where someone has a large amount of off-site data > storage (of the order of 100s of TB or more). They have a slow network link > to this storage. > > My idea is that this could be used to build the main vdevs for a ZFS pool. > On top of this, an array of disks (of the order of TBs to 10s of TB) is > available locally, which can be used as L2ARC. There are also smaller, > faster arrays (of the order of 100s of GB) which, in my mind, could be used > as a ZIL. > > Now, in this theoretical situation, in-play read data is kept on the L2ARC, > and can be accessed about as fast as if this array was just used as the main > pool vdevs. Written data goes to the ZIL, as is then sent down the slow link > to the offsite storage. Rarely used data is still available as if on site > (shows up in the same file structure), but is effectively "archived" to the > offsite storage. > > Now, here comes the problem. According to what I have read, the maximum > size > for the ZIL is approx 50% of the physical memory in the system, which would
Here's the bigger problem: You seem to be thinking of ZIL as write buffer. This is not the case. ZIL only allows sync writes to become async writes, which are buffered in RAM. Depending on your system, it will refuse to buffer more than 5sec or 30sec of async writes, and your async writes are still going to be slow. Also, L2ARC is not persistent, and there is a maximum fill rate (which I don't know much about.) So populating the L2ARC might not happen as fast as you want, and every time you reboot it will have to be repopulated. If at all possible, instead of using the remote storage as the primary storage, you can use the remote storage to receive incremental periodic snapshots, and that would perform optimally, because the remote storage is then isolated from rapid volatile changes. The zfs send | zfs receive datastreams will be full of large sequential blocks and not small random IO. Most likely you will gain performance by enabling both compression and dedup. But of course, that depends on the nature of your data. > And > finally, if the network link was to die, I am assuming the entire ZPool > would become unavailable. The behavior in this situation is configurable via "failmode." The default is "wait" which essentially pauses the filesystem until the disks become available again. Unfortunately, until the disks become available again, the system can become ... pretty undesirable to use, and possibly require a power cycle. You can also use "panic" or "continue," which you can read about in the zpool manpage if your want. > vdevs as an "archive" store (i.e. it goes > [ARC]->[L2ARC/ZIL]->[main]->[archive]). Infrequently used files/blocks could You're pretty much describing precisely what I'm suggesting... using zfs send | zfs receive. I suppose the difference between what you're suggesting and what I'm suggesting, is the separation of two pools versus "misrepresenting" the remote storage as part of the local pool, etc. That's a pretty major architectural change. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss