Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-06 Thread Darren J Moffat

On 03/04/2010 00:57, Richard Elling wrote:


This is annoying. By default, zdb is compiled as a 32-bit executable and
it can be a hog. Compiling it yourself is too painful for most folks :-(


/usr/sbin/zdb is actually a link to /usr/lib/isaexec

$ ls -il /usr/sbin/zdb /usr/lib/isaexec
300679 -r-xr-xr-x  92 root bin 8248 Nov 16 10:26 
/usr/lib/isaexec*
300679 -r-xr-xr-x  92 root bin 8248 Nov 16 10:26 
/usr/sbin/zdb*



$ ls -il /usr/sbin/i86/zdb /usr/sbin/amd64/zdb
200932 -r-xr-xr-x   1 root bin   173224 Mar 15 10:20 
/usr/sbin/amd64/zdb*
200933 -r-xr-xr-x   1 root bin   159960 Mar 15 10:20 
/usr/sbin/i86/zdb*


This means both 32 and 64 bit versions are already available and if the 
kernel is 64 bit then the 64 bit version of zdb will be run if you run 
/usr/sbin/zdb.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-03 Thread Richard Elling
On Apr 1, 2010, at 9:34 PM, Roy Sigurd Karlsbakk wrote:

>> You can estimate the amount of disk space needed for the deduplication
>> table
>> and the expected deduplication ratio by using "zdb -S poolname" on
>> your existing
>> pool. 
> 
> This is all good, but it doesn't work too well for planning. Is there a rule 
> of thumb I can use for a general overview?

If you know the average record size for your workload, then you can calculate
the average number of records when given the total space.  This should get 
you in the ballpark.

> Say I want 125TB space and I want to dedup that for backup use. It'll 
> probably be quite efficient dedup, so long alignment will match. By the way, 
> is there a way to auto-align data for dedup in case of backup? Or does zfs do 
> this by itself?

ZFS does not change alignment.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-03 Thread Roy Sigurd Karlsbakk
> You can estimate the amount of disk space needed for the deduplication
> table
> and the expected deduplication ratio by using "zdb -S poolname" on
> your existing
> pool. 

This is all good, but it doesn't work too well for planning. Is there a rule of 
thumb I can use for a general overview? Say I want 125TB space and I want to 
dedup that for backup use. It'll probably be quite efficient dedup, so long 
alignment will match. By the way, is there a way to auto-align data for dedup 
in case of backup? Or does zfs do this by itself?

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-03 Thread Roy Sigurd Karlsbakk
> > I might add some swap I guess.  I will have to try it on another
> > machine with more RAM and less pool, and see how the size of the
> zdb
> > image compares to the calculated size of DDT needed.  So long as
> zdb
> > is the same or a little smaller than the DDT it predicts, the
> tool's
> > still useful, just sometimes it will report ``DDT too big but not
> sure
> > by how much'', by coredumping/thrashing instead of finishing.
> 
> In my experience, more swap doesn't help break through the 2GB memory
> barrier.  As zdb is an intentionally unsupported tool, methinks
> recompile
> may be required (or write your own).

I guess this tool might not work too well, then, with 20TiB in 47M files?

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-02 Thread Richard Elling
On Apr 2, 2010, at 2:03 PM, Miles Nordin wrote:

>> "re" == Richard Elling  writes:
> 
>re> # ptime zdb -S zwimming Simulated DDT histogram:
>re>  refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   
> DSIZE
>re>   Total2.63M277G218G225G3.22M337G263G
> 270G
> 
>re>in-core size = 2.63M * 250 = 657.5 MB
> 
> Thanks, that is really useful!  It'll probably make the difference
> between trying dedup and not, for me.
> 
> It is not working for me yet.  It got to this point in prstat:
> 
>  6754 root 2554M 1439M sleep   600   0:03:31 1.9% zdb/106
> 
> and then ran out of memory:
> 
> $ pfexec ptime zdb -S tub
> out of memory -- generating core dump

This is annoying. By default, zdb is compiled as a 32-bit executable and
it can be a hog. Compiling it yourself is too painful for most folks :-(

> I might add some swap I guess.  I will have to try it on another
> machine with more RAM and less pool, and see how the size of the zdb
> image compares to the calculated size of DDT needed.  So long as zdb
> is the same or a little smaller than the DDT it predicts, the tool's
> still useful, just sometimes it will report ``DDT too big but not sure
> by how much'', by coredumping/thrashing instead of finishing.

In my experience, more swap doesn't help break through the 2GB memory
barrier.  As zdb is an intentionally unsupported tool, methinks recompile
may be required (or write your own).
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-02 Thread Miles Nordin
> "re" == Richard Elling  writes:

re> # ptime zdb -S zwimming Simulated DDT histogram:
re>  refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   
DSIZE
re>   Total2.63M277G218G225G3.22M337G263G
270G

re>in-core size = 2.63M * 250 = 657.5 MB

Thanks, that is really useful!  It'll probably make the difference
between trying dedup and not, for me.

It is not working for me yet.  It got to this point in prstat:

  6754 root 2554M 1439M sleep   600   0:03:31 1.9% zdb/106

and then ran out of memory:

 $ pfexec ptime zdb -S tub
 out of memory -- generating core dump

I might add some swap I guess.  I will have to try it on another
machine with more RAM and less pool, and see how the size of the zdb
image compares to the calculated size of DDT needed.  So long as zdb
is the same or a little smaller than the DDT it predicts, the tool's
still useful, just sometimes it will report ``DDT too big but not sure
by how much'', by coredumping/thrashing instead of finishing.


pgprpk9HSdr61.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup and memory/l2arc requirements

2010-04-02 Thread Richard Elling
On Apr 1, 2010, at 5:39 PM, Roy Sigurd Karlsbakk wrote:
> Hi all
> 
> I've been told (on #opensolaris, irc.freenode.net) that opensolaris needs a 
> lot of memory and/or l2arc for dedup to function properly. How much memory or 
> l2arc should I get for a 12TB zpool (8x2GB in RAIDz2), and then, how much for 
> 125TB (after RAIDz2 overhead)? Is there a function into which I can plug my 
> recordsize and volume size to get the appropriate numbers?

You can estimate the amount of disk space needed for the deduplication table
and the expected deduplication ratio by using "zdb -S poolname" on your existing
pool.  Be patient, for an existing pool with lots of objects, this can take 
some time to run.

# ptime zdb -S zwimming
Simulated DDT histogram:

bucket  allocated   referenced  
__   __   __
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
--   --   -   -   -   --   -   -   -
 12.27M239G188G194G2.27M239G188G194G
 2 327K   34.3G   27.8G   28.1G 698K   73.3G   59.2G   59.9G
 430.1K   2.91G   2.10G   2.11G 152K   14.9G   10.6G   10.6G
 87.73K691M529M529M74.5K   6.25G   4.79G   4.80G
16  673   43.7M   25.8M   25.9M13.1K822M492M494M
32  197   12.3M   7.02M   7.03M7.66K480M269M270M
64   47   1.27M626K626K3.86K103M   51.2M   51.2M
   128   22908K250K251K3.71K150M   40.3M   40.3M
   2567302K 48K   53.7K2.27K   88.6M   17.3M   19.5M
   5124131K   7.50K   7.75K2.74K102M   5.62M   5.79M
2K1  2K  2K  2K3.23K   6.47M   6.47M   6.47M
8K1128K  5K  5K13.9K   1.74G   69.5M   69.5M
 Total2.63M277G218G225G3.22M337G263G270G

dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50


real 8:02.391932786
user 1:24.231855093
sys15.193256108

In this file system, 2.75 million blocks are allocated. The in-core size
of a DDT entry is approximately 250 bytes.  So the math is pretty simple:
in-core size = 2.63M * 250 = 657.5 MB

If your dedup ratio is 1.0, then this number will scale linearly with size.
If the dedup rate > 1.0, then this number will not scale linearly, it will be
less. So you can use the linear scale as a worst-case approximation.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] dedup and memory/l2arc requirements

2010-04-02 Thread Roy Sigurd Karlsbakk
Hi all

I've been told (on #opensolaris, irc.freenode.net) that opensolaris needs a lot 
of memory and/or l2arc for dedup to function properly. How much memory or l2arc 
should I get for a 12TB zpool (8x2GB in RAIDz2), and then, how much for 125TB 
(after RAIDz2 overhead)? Is there a function into which I can plug my 
recordsize and volume size to get the appropriate numbers?

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss