Thanks for running and publishing the tests :) A comment on your testing technique follows, though.
2011-12-29 1:14, Brad Diggs wrote:
As promised, here are the findings from my testing. I created 6 directory server instances ... However, once I started modifying the data of the replicated directory server topology, the caching efficiency quickly diminished. The following table shows that the delta for each instance increased by roughly 2GB after only 300k of changes. I suspect the divergence in data as seen by ZFS deduplication most likely occurs because reduplication occurs at the block level rather than at the byte level. When a write is sent to one directory server instance, the exact same write is propagated to the other 5 instances and therefore should be considered a duplicate. However this was not the case. There could be other reasons for the divergence as well.
Hello, Brad, If you tested with Sun DSEE (and I have no reason to believe other descendants of iPlanet Directory server would work differently under the hood), then there are two factors hindering your block-dedup gains: 1) The data is stored in the backend BerkeleyDB binary file. In Sun DSEE7 and/or in ZFS this could also be compressed data. Since for ZFS you dedup unique blocks, including same data at same offsets, it is quite unlikely you'd get the same data often enough. For example, each database might position same userdata blocks at different offsets due to garbage collection or whatever other optimisation the DB might think of, making on-disk blocks different and undedupable. You might look if it is possible to tune the database to write in sector-sized -> min.block-sized (512b/4096b) records and consistently use the same DSEE compression (or lack thereof) - in this case you might get more same blocks and win with dedup. But you'll likely lose with compression, especially of the empty sparse structure which a database initially is. 2) During replication each database actually becomes unique. There are hidden records with "ns" prefix which mark when the record was created and replicated, who initiated it, etc. Timestamps in the data already warrant uniqueness ;) This might be an RFE for the DSEE team though - to keep such volatile metadata separately from userdata. Then your DS instances would more likely dedup well after replication, and unique metadata would be stored separately and stay unique. You might even keep it in a different dataset with no dedup, then... :) --- So, at the moment, this expectation does not hold true: "When a write is sent to one directory server instance, the exact same write is propagated to the other five instances and therefore should be considered a duplicate." These writes are not exact. HTH, //Jim Klimov _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss