Jim,

You are spot on.  I was hoping that the writes would be close enough to identical that
there would be a high ratio of duplicate data since I use the same record size, page size,
compression algorithm, … etc.  However, that was not the case.  The main thing that I
wanted to prove though was that if the data was the same the L1 ARC only caches the
data that was actually written to storage.  That is a really cool thing!  I am sure there will
be future study on this topic as it applies to other scenarios.

With regards to directory engineering investing any energy into optimizing ODSEE DS 
to more effectively leverage this caching potential, that won't happen.  OUD far out
performs ODSEE.  That said OUD may get some focus in this area.  However, time will
tell on that one.

For now, I hope everyone benefits from the little that I did validate.

Have a great day!

Brad
Brad Diggs | Principal Sales Consultant

On Dec 29, 2011, at 4:45 AM, Jim Klimov wrote:

Thanks for running and publishing the tests :)

A comment on your testing technique follows, though.

2011-12-29 1:14, Brad Diggs wrote:
As promised, here are the findings from my testing. I created 6
directory server instances ...

However, once I started modifying the data of the replicated directory
server topology, the caching efficiency
quickly diminished. The following table shows that the delta for each
instance increased by roughly 2GB
after only 300k of changes.

I suspect the divergence in data as seen by ZFS deduplication most
likely occurs because reduplication
occurs at the block level rather than at the byte level. When a write is
sent to one directory server instance,
the exact same write is propagated to the other 5 instances and
therefore should be considered a duplicate.
However this was not the case. There could be other reasons for the
divergence as well.

Hello, Brad,

If you tested with Sun DSEE (and I have no reason to
believe other descendants of iPlanet Directory server
would work differently under the hood), then there are
two factors hindering your block-dedup gains:

1) The data is stored in the backend BerkeleyDB binary
file. In Sun DSEE7 and/or in ZFS this could also be
compressed data. Since for ZFS you dedup unique blocks,
including same data at same offsets, it is quite unlikely
you'd get the same data often enough. For example, each
database might position same userdata blocks at different
offsets due to garbage collection or whatever other
optimisation the DB might think of, making on-disk
blocks different and undedupable.

You might look if it is possible to tune the database
to write in sector-sized -> min.block-sized (512b/4096b)
records and consistently use the same DSEE compression
(or lack thereof) - in this case you might get more same
blocks and win with dedup. But you'll likely lose with
compression, especially of the empty sparse structure
which a database initially is.

2) During replication each database actually becomes
unique. There are hidden records with "ns" prefix which
mark when the record was created and replicated, who
initiated it, etc. Timestamps in the data already
warrant uniqueness ;)

This might be an RFE for the DSEE team though - to keep
such volatile metadata separately from userdata. Then
your DS instances would more likely dedup well after
replication, and unique metadata would be stored
separately and stay unique. You might even keep it in
a different dataset with no dedup, then... :)

---


So, at the moment, this expectation does not hold true:
 "When a write is sent to one directory server instance,
 the exact same write is propagated to the other five
 instances and therefore should be considered a duplicate."
These writes are not exact.

HTH,
//Jim Klimov


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to