On Thu, Nov 26, 2009 at 8:53 PM, Toby Thain <t...@telegraphics.com.au> wrote: > > On 26-Nov-09, at 8:57 PM, Richard Elling wrote: > >> On Nov 26, 2009, at 1:20 PM, Toby Thain wrote: >>> >>> On 25-Nov-09, at 4:31 PM, Peter Jeremy wrote: >>> >>>> On 2009-Nov-24 14:07:06 -0600, Mike Gerdts <mger...@gmail.com> wrote: >>>>> >>>>> ... fill a 128 >>>>> KB buffer with random data then do bitwise rotations for each >>>>> successive use of the buffer. Unless my math is wrong, it should >>>>> allow 128 KB of random data to be write 128 GB of data with very >>>>> little deduplication or compression. A much larger data set could be >>>>> generated with the use of a 128 KB linear feedback shift register... >>>> >>>> This strikes me as much harder to use than just filling the buffer >>>> with 8/32/64-bit random numbers >>> >>> I think Mike's reasoning is that a single bit shift (and propagation) is >>> cheaper than generating a new random word. After the whole buffer is >>> shifted, you have a new very-likely-unique block. (This seems like overkill >>> if you know the dedup unit size in advance.) >> >> You should be able to get a unique block by shifting one word, as long >> as the shift doesn't duplicate the word. > > That is true, but you will run out of permutations sooner.
Rather than shifting a word, you could just increment it. In a multi-threaded test, each thread picks the word corresponding to the thread that is executing. Assuming 32-bit words (b4-bit is overkill), this allows up to 128 threads with 512 byte blocks. It also allows up to 2 TB per thread per 512 bytes in a block. That is, if 50 threads are used and the block size is 8 KB, there should be no duplicates in 2 * 50 * 8192 / 512 = 1600 TB. But... this leads us to the point the workload generators are too good at generating unique data. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss