>
> I've been observing two threads on zfs-discuss with
> the following
> Subject lines:
>
> Yager on ZFS
> ZFS + DB + "fragments"
>
> and have reached the rather obvious conclusion that
> the author "can
> you guess?" is a professional spinmeister,
Ah - I see we have another incompetent psychic chiming in - and judging by his
drivel below a technical incompetent as well. While I really can't help him
with the former area, I can at least try to educate him in the latter.
...
> Excerpt 1: Is this premium technical BullShit (BS)
> or what?
Since you asked: no, it's just clearly beyond your grade level, so I'll try to
dumb it down enough for you to follow.
>
> - BS 301 'grad level technical BS'
> ---
>
> Still, it does drive up snapshot overhead, and if you
> start trying to
> use snapshots to simulate 'continuous data
> protection' rather than
> more sparingly the problem becomes more significant
> (because each
> snapshot will catch any background defragmentation
> activity at a
> different point, such that common parent blocks may
> appear in more
> than one snapshot even if no child data has actually
> been updated).
> Once you introduce CDP into the process (and it's
> tempting to, since
> the file system is in a better position to handle it
> efficiently than
> some add-on product), rethinking how one approaches
> snapshots (and COW
> in general) starts to make more sense.
Do you by any chance not even know what 'continuous data protection' is? It's
considered a fairly desirable item these days and was the basis for several hot
start-ups (some since gobbled up by bigger fish that apparently agreed that
they were onto something significant), since it allows you to roll back the
state of individual files or the system as a whole to *any* historical point
you might want to (unlike snapshots, which require that you anticipate points
you might want to roll back to and capture them explicitly - or take such
frequent snapshots that you'll probably be able to get at least somewhere near
any point you might want to, a second-class simulation of CDP which some
vendors offer because it's the best they can do and is precisely the activity
which I outlined above, expecting that anyone sufficiently familiar with file
systems to be able to follow the discussion would be familiar with it).
But given your obvious limitations I guess I should spell it out in words of
even fewer syllables:
1. Simulating CDP without actually implementing it means taking very frequent
snapshots.
2. Taking very frequent snapshots means that you're likely to interrupt
background defragmentation activity such that one child of a parent is moved
*before* the snapshot is taken while another is moved *after* the snapshot is
taken, resulting in the need to capture a before-image of the parent (because
at least one of its pointers is about to change) *and all ancestors of the
parent* (because the pointer change will propagate through all the ancestral
checksums - and pointers, with COW) in every snapshot that occurs immediately
prior to moving *any* of its children rather than just having to capture a
single before-image of the parent and all its ancestors after which all its
child pointers will likely get changed before the next snapshot is taken.
So that's what any competent reader should have been able to glean from the
comments that stymied you. The paragraph's concluding comments were
considerably more general in nature and thus legitimately harder to follow:
had you asked for clarification rather than just assumed that they were BS
simply because you couldn't understand them you would not have looked like such
an idiot, but since you did call them into question I'll now put a bit more
flesh on them for those who may be able to follow a discussion at that level of
detail:
3. The file system is in a better position to handle CDP than some external
mechanism because
a) the file system knows (right down to the byte level if it wants to) exactly
what any individual update is changing,
b) the file system knows which updates are significant (e.g., there's probably
no intrinsic need to capture rollback information for lazy writes because the
application didn't care whether they were made persistent at that time, but for
any explicitly-forced writes or syncs a rollback point should be established),
and
c) the file system is already performing log forces (where a log is involved)
or batch disk updates (a la ZFS) to honor such application-requested
persistence, and can piggyback the required CDP before-image persistence on
them rather than requiring separate synchronous log or disk accesses to do so.
4. If you've got full-fledged CDP, it's questionable whether you need
snapshots as well (unless you have really, really inflexible requirements for
virtually instantaneous rollback and/or for high-performance writable-clone
access) - and if CDP turns out