Re: [zfs-discuss] ZFS dedup success stories (take two)

Craig Morgan Wed, 02 Feb 2011 23:51:02 -0800

Two caveats inline …

On 1 Feb 2011, at 01:05, Garrett D'Amore wrote:


> On 01/31/11 04:48 PM, Roy Sigurd Karlsbakk wrote:
>>> As I've said here on the list a few times earlier, the last on the
>>> thread 'ZFS not usable (was ZFS Dedup question)', I've been doing some
>>> rather thorough testing on zfs dedup, and as you can see from the
>>> posts, it wasn't very satisfactory. The docs claim 1-2GB memory usage
>>> per terabyte stored, ARC or L2ARC, but as you can read from the post,
>>> I don't find this very likely.
>>>     
>> Sorry about the initial post - it was wrong. The hardware configuration was 
>> right, but for initial tests, I use NFS, meaning sync writes. This obviously 
>> stresses the ARC/L2ARC more than async writes, but the result remains the 
>> same.
>> 
>> With 140GB with of L2ARC on two X25-Ms and some 4GB partitions on the same 
>> devices, 4GB each, in a mirror, the write speed was reduced to something 
>> like 20% of the origian speed. This was with about 2TB used on the zpool 
>> with a single data stream, no parallelism whatsoever. Still with 8GB ARC and 
>> 140GB of L2ARC on two SSDs, this speed is fairly low. I could not see 
>> substantially high CPU or I/O load during this test.
>>   
> 
> I would not expect good performance on dedup with write... dedup isn't going 
> to make write's fast - its something you want on a system with a lot of 
> duplicated data that sustain a lot of reads.  (That said, highly duplicate 
> date with a DDT that fits entirely in RAM might see a benefit from not having 
> to write meta data frequently.  But I suspect an SLOG here is going to be 
> critical to get good performance since you'll still have a lot of synchronous 
> meta data writes.)
> 
>    - Garrett

There is one circumstance where the write operation could be an improvement, in 
a system with data which is highly de-dupable *and* undergoing heavy write 
load, it may be useful to forego the large write and instead convert into a 
smaller (and more frequent) small metadata write, SLOGs would then show more 
benefit and we'd release pressure on the back-end for thruput.

On a system with a high read ratio, de-duped data currently would be quite 
efficient, but there is one pathology in current ZFS which impacts this 
somewhat, last time I looked each ARC ref to a de-duped block leads to a 
inflated ARC copy of the data, hence a highly ref'ed block (20x for instance), 
could exist 20x in an inflated state in ARC after read refs to each occurrence.
De-dup of inflated data in ARC was a pending ZFS optimisation …

Craig
  
>> Vennlige hilsener / Best regards
>> 
>> roy
>> --
>> Roy Sigurd Karlsbakk
>> (+47) 97542685
>> r...@karlsbakk.net
>> http://blogg.karlsbakk.net/
>> --
>> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
>> er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
>> idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate 
>> og relevante synonymer på norsk.
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>   
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Craig Morgan
Cinnabar Solutions Ltd

t: +44 (0)791 338 3190
f: +44 (0)870 705 1726
e: cr...@cinnabar-solutions.com
w: www.cinnabar-solutions.com



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup success stories (take two)

Reply via email to