Re: [zfs-discuss] Request for comments: L2ARC, ZIL, RAM, and slow storage

Edward Ned Harvey Tue, 18 Jan 2011 17:43:31 -0800

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Karl Wagner
> 
> Consider the situation where someone has a large amount of off-site data
> storage (of the order of 100s of TB or more). They have a slow network
link
> to this storage.
> 
> My idea is that this could be used to build the main vdevs for a ZFS pool.
> On top of this, an array of disks (of the order of TBs to 10s of TB) is
> available locally, which can be used as L2ARC. There are also smaller,
> faster arrays (of the order of 100s of GB) which, in my mind, could be
used
> as a ZIL.
> 
> Now, in this theoretical situation, in-play read data is kept on the
L2ARC,
> and can be accessed about as fast as if this array was just used as the
main
> pool vdevs. Written data goes to the ZIL, as is then sent down the slow
link
> to the offsite storage. Rarely used data is still available as if on site
> (shows up in the same file structure), but is effectively "archived" to
the
> offsite storage.
> 
> Now, here comes the problem. According to what I have read, the maximum
> size
> for the ZIL is approx 50% of the physical memory in the system, which
would


Here's the bigger problem:
You seem to be thinking of ZIL as write buffer.  This is not the case.  ZIL
only allows sync writes to become async writes, which are buffered in RAM.
Depending on your system, it will refuse to buffer more than 5sec or 30sec
of async writes, and your async writes are still going to be slow.

Also, L2ARC is not persistent, and there is a maximum fill rate (which I
don't know much about.)  So populating the L2ARC might not happen as fast as
you want, and every time you reboot it will have to be repopulated.

If at all possible, instead of using the remote storage as the primary
storage, you can use the remote storage to receive incremental periodic
snapshots, and that would perform optimally, because the remote storage is
then isolated from rapid volatile changes.  The zfs send | zfs receive
datastreams will be full of large sequential blocks and not small random IO.

Most likely you will gain performance by enabling both compression and
dedup.  But of course, that depends on the nature of your data.


> And
> finally, if the network link was to die, I am assuming the entire ZPool
> would become unavailable.

The behavior in this situation is configurable via "failmode."  The default
is "wait" which essentially pauses the filesystem until the disks become
available again.  Unfortunately, until the disks become available again, the
system can become ... pretty undesirable to use, and possibly require a
power cycle.

You can also use "panic" or "continue," which you can read about in the
zpool manpage if your want.

> vdevs as an "archive" store (i.e. it goes
> [ARC]->[L2ARC/ZIL]->[main]->[archive]). Infrequently used files/blocks
could

You're pretty much describing precisely what I'm suggesting... using zfs
send | zfs receive.

I suppose the difference between what you're suggesting and what I'm
suggesting, is the separation of two pools versus "misrepresenting" the
remote storage as part of the local pool, etc.  That's a pretty major
architectural change.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Request for comments: L2ARC, ZIL, RAM, and slow storage

Reply via email to