Re: [zfs-discuss] Confused by compressratio

2008-04-17 Thread Richard Elling
Stuart Anderson wrote:
> On Wed, Apr 16, 2008 at 02:07:53PM -0700, Richard Elling wrote:
>   
 Personally, I'd estimate using du rather than ls.

 
>>> They report the exact same number as far as I can tell. With the caveat
>>> that Solaris ls -s returns the number of 512-byte blocks, whereas
>>> GNU ls -s returns the number of 1024byte blocks by default.
>>>
>>>  
>>>   
>> That is file-system dependent.  Some file systems have larger blocks
>> and ls -s shows the size in blocks.  ZFS uses dynamic block sizes, but
>> you knew that already... :-)
>> -- richard
>>
>> 
>
> OK, we are now clearly exposing my ignorance, so hopefully I can learn
> something new about ZFS.
>
> What is the distinction/relationship between recordsize (which as
> I understand is a fixed quantity for each ZFS dataset) and dynamic
> block sizes?  Are blocks what are allocated for metadata, and records
> what are allocated for data, i.e., the contents of files?
>
> What does it mean that blocks are compressed for a ZFS dataset with
> "compression=off"? Is this equivalent to saying that ZFS metadata is
> always compressed?
>
> Is there any ZFS documentation that shows by example exactly how to
> interpret the the various numbers from ls, du, df, and zfs used/refernced/
> available/compressratio in the context of compression={on,off}, possibly
> also refering to both sparse and non-sparse files?
>   

ls, du, and df have fairly rigorous definitions in their respective man
pages and specifications.  But only df has an estimate of remaining
available space. More detailed descriptions of the space usage and
properties which impact it is in the ZFS Admin Guide.
http://www.opensolaris.org/os/community/zfs/docs/zfsadmin.pdf

The elephant in the room is the question of how much physical
space you have free?  That is not easy to predict, especially with
compression. You might sleep better if you can convince yourself
that you'll never really know what is down the road until you get
there :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Stuart Anderson
On Wed, Apr 16, 2008 at 02:07:53PM -0700, Richard Elling wrote:
> 
> >>Personally, I'd estimate using du rather than ls.
> >>
> >
> >They report the exact same number as far as I can tell. With the caveat
> >that Solaris ls -s returns the number of 512-byte blocks, whereas
> >GNU ls -s returns the number of 1024byte blocks by default.
> >
> >  
> That is file-system dependent.  Some file systems have larger blocks
> and ls -s shows the size in blocks.  ZFS uses dynamic block sizes, but
> you knew that already... :-)
> -- richard
> 

OK, we are now clearly exposing my ignorance, so hopefully I can learn
something new about ZFS.

What is the distinction/relationship between recordsize (which as
I understand is a fixed quantity for each ZFS dataset) and dynamic
block sizes?  Are blocks what are allocated for metadata, and records
what are allocated for data, i.e., the contents of files?

What does it mean that blocks are compressed for a ZFS dataset with
"compression=off"? Is this equivalent to saying that ZFS metadata is
always compressed?

Is there any ZFS documentation that shows by example exactly how to
interpret the the various numbers from ls, du, df, and zfs used/refernced/
available/compressratio in the context of compression={on,off}, possibly
also refering to both sparse and non-sparse files?

Thanks.


-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Joerg Schilling
Stuart Anderson <[EMAIL PROTECTED]> wrote:

> They report the exact same number as far as I can tell. With the caveat
> that Solaris ls -s returns the number of 512-byte blocks, whereas
> GNU ls -s returns the number of 1024byte blocks by default.

IIRC, this may be controlled by environment variables

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Richard Elling
Stuart Anderson wrote:
> On Wed, Apr 16, 2008 at 10:09:00AM -0700, Richard Elling wrote:
>   
>> Stuart Anderson wrote:
>> 
>>> On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote:
>>>  
>>>   
 UTSL.  compressratio is the ratio of uncompressed bytes to compressed 
 bytes.
 http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIO&defs=&refs=&path=zfs&hist=&project=%2Fonnv

 IMHO, you will (almost) never get the same number looking at bytes as you
 get from counting blocks.

 
>>> If I can't use /bin/ls to get an accurate measure of the number of 
>>> compressed
>>> blocks used (-s) and the original number of uncompressed bytes (-l). What 
>>> is
>>> a more accurate way to measure these?
>>>  
>>>   
>> ls -s should give the proper number of blocks used.
>> ls -l should give the proper file length.
>> Do not assume that compressed data in a block consumes the whole block.
>> 
>
> Not even on a pristine ZFS filesystem where just one file has been created?
>   

In theory, yes.  Blocks are compressed, not files.

>   
>>> As a gedankan experiment, what command(s) can I run to examine a compressed
>>> ZFS filesystem and determine how much space it will require to replicate
>>> to an uncompressed ZFS filesystem? I can add up the file sizes, e.g.,
>>> /bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}'
>>> but I would have thought there was a more efficient way using the already
>>> aggregated filesystem metadata via "/bin/df" or "zfs list" and the
>>> compressratio.
>>>  
>>>   
>> IMHO, this is a by-product of the dynamic nature of ZFS.
>> 
>
> Are you saying it can't be done except by adding up all the individual
> file sizes?
>   

I'm saying that adding up all of the individual files sizes, rounded up
to the smallest block size for the target file system, plus some estimate
of metadata space requirements, will be the most pessimistic estimate.
Metadata is also compressed and copied, by default.

>> Personally, I'd estimate using du rather than ls.
>> 
>
> They report the exact same number as far as I can tell. With the caveat
> that Solaris ls -s returns the number of 512-byte blocks, whereas
> GNU ls -s returns the number of 1024byte blocks by default.
>
>   
That is file-system dependent.  Some file systems have larger blocks
and ls -s shows the size in blocks.  ZFS uses dynamic block sizes, but
you knew that already... :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Stuart Anderson
On Wed, Apr 16, 2008 at 10:09:00AM -0700, Richard Elling wrote:
> Stuart Anderson wrote:
> >On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote:
> >  
> >>UTSL.  compressratio is the ratio of uncompressed bytes to compressed 
> >>bytes.
> >>http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIO&defs=&refs=&path=zfs&hist=&project=%2Fonnv
> >>
> >>IMHO, you will (almost) never get the same number looking at bytes as you
> >>get from counting blocks.
> >>
> >
> >If I can't use /bin/ls to get an accurate measure of the number of 
> >compressed
> >blocks used (-s) and the original number of uncompressed bytes (-l). What 
> >is
> >a more accurate way to measure these?
> >  
> 
> ls -s should give the proper number of blocks used.
> ls -l should give the proper file length.
> Do not assume that compressed data in a block consumes the whole block.

Not even on a pristine ZFS filesystem where just one file has been created?

> 
> >As a gedankan experiment, what command(s) can I run to examine a compressed
> >ZFS filesystem and determine how much space it will require to replicate
> >to an uncompressed ZFS filesystem? I can add up the file sizes, e.g.,
> >/bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}'
> >but I would have thought there was a more efficient way using the already
> >aggregated filesystem metadata via "/bin/df" or "zfs list" and the
> >compressratio.
> >  
> 
> IMHO, this is a by-product of the dynamic nature of ZFS.

Are you saying it can't be done except by adding up all the individual
file sizes?

> Personally, I'd estimate using du rather than ls.

They report the exact same number as far as I can tell. With the caveat
that Solaris ls -s returns the number of 512-byte blocks, whereas
GNU ls -s returns the number of 1024byte blocks by default.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Richard Elling
Stuart Anderson wrote:
> On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote:
>   
>> UTSL.  compressratio is the ratio of uncompressed bytes to compressed bytes.
>> http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIO&defs=&refs=&path=zfs&hist=&project=%2Fonnv
>>
>> IMHO, you will (almost) never get the same number looking at bytes as you
>> get from counting blocks.
>> 
>
> If I can't use /bin/ls to get an accurate measure of the number of compressed
> blocks used (-s) and the original number of uncompressed bytes (-l). What is
> a more accurate way to measure these?
>   

ls -s should give the proper number of blocks used.
ls -l should give the proper file length.
Do not assume that compressed data in a block consumes the whole block.

> As a gedankan experiment, what command(s) can I run to examine a compressed
> ZFS filesystem and determine how much space it will require to replicate
> to an uncompressed ZFS filesystem? I can add up the file sizes, e.g.,
> /bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}'
> but I would have thought there was a more efficient way using the already
> aggregated filesystem metadata via "/bin/df" or "zfs list" and the
> compressratio.
>   

IMHO, this is a by-product of the dynamic nature of ZFS.
Personally, I'd estimate using du rather than ls.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Robert Milkowski
Hello Luke,

Tuesday, April 15, 2008, 4:50:17 PM, you wrote:

LS> You can fill up an ext3 filesystem with the following command:
LS> dd if=/dev/zero of=delme.dat
LS> You can't really fill up a ZFS filesystme that way.  I guess you could,
LS> but I've never had the patience -- when several GB worth of zeroes takes
LS> 1kb worth of data, then it would take a very long time.

Unless something changed recently without compression ZFS will
actually write zero block.



-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Stuart Anderson
On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote:
> UTSL.  compressratio is the ratio of uncompressed bytes to compressed bytes.
> http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIO&defs=&refs=&path=zfs&hist=&project=%2Fonnv
> 
> IMHO, you will (almost) never get the same number looking at bytes as you
> get from counting blocks.

If I can't use /bin/ls to get an accurate measure of the number of compressed
blocks used (-s) and the original number of uncompressed bytes (-l). What is
a more accurate way to measure these?

As a gedankan experiment, what command(s) can I run to examine a compressed
ZFS filesystem and determine how much space it will require to replicate
to an uncompressed ZFS filesystem? I can add up the file sizes, e.g.,
/bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}'
but I would have thought there was a more efficient way using the already
aggregated filesystem metadata via "/bin/df" or "zfs list" and the
compressratio.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Richard Elling
UTSL.  compressratio is the ratio of uncompressed bytes to compressed bytes.
http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIO&defs=&refs=&path=zfs&hist=&project=%2Fonnv

IMHO, you will (almost) never get the same number looking at bytes as you
get from counting blocks.
 -- richard

Stuart Anderson wrote:
> On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
>   
>> Stuart Anderson wrote:
>> 
>>> On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
>>>  
>>>   
 Stuart Anderson wrote:

 
> As an artificial test, I created a filesystem with compression enabled
> and ran "mkfile 1g" and the reported compressratio for that filesystem
> is 1.00x even though this 1GB file only uses only 1kB.
>
>  
>   
 ZFS seems to treat files filled with zeroes as sparse files, regardless 
 of whether or not compression is enabled.  Try "dd if=/dev/urandom 
 of=1g.dat bs=1024 count=1048576" to create a file that won't exhibit 
 this behavior.  Creating this file is a lot slower than writing zeroes 
 (mostly due to the speed of the urandom device), but ZFS won't treat it 
 like a sparse file, and it won't compress very well either.

 
>>> However, I am still trying to reconcile the compression ratio as
>>> reported by compressratio vs the ratio of file sizes to disk blocks
>>> used (whether or not ZFS is creating sparse files).
>>>  
>>>   
>> Can you describe the data you're storing a bit?  Any big disk images?
>>
>> 
>
> Understanding the "mkfile" case would be a start, but the initial filesystem
> that started my confusion is one that has a number of ~50GByte mysql database
> files as well as a number of application code files.
>
> Here is another simple test to avoid any confusion/bugs related to NULL
> character sequeneces being compressed to nothing versus being treated
> as sparse files. In particular, a 2GByte file full of the output of
> /bin/yes:
>
>   
>> zfs create export-cit/compress
>> cd /export/compress
>> /bin/df -k .
>> 
> Filesystemkbytesused   avail capacity  Mounted on
> export-cit/compress  1704858624  55 1261199742 1%/export/compress
>   
>> zfs get compression export-cit/compress
>> 
> NAME PROPERTY VALUESOURCE
> export-cit/compress  compression  on   inherited from 
> export-cit
>   
>> /bin/yes | head -1073741824 > yes.dat
>> /bin/ls -ls yes.dat
>> 
> 185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
>   
>> /bin/df -k .
>> 
> Filesystemkbytesused   avail capacity  Mounted on
> export-cit/compress  1704858624   92563 1261107232 1%/export/compress
>   
>> zfs get compressratio export-cit/compress
>> 
> NAME PROPERTY   VALUESOURCE
> export-cit/compress  compressratio  28.39x   -
>
> So compressratio reports 28.39, but the ratio of file size to used disk for
> the only regular file on this filesystem, i.e., excluding the initial 55kB
> allocated for the "empty" filesystem is:
>
> 2147483648 / (185017 * 512) = 22.67
>
>
> Calculated another way from "zfs list" for the entire filesystem:
>
>   
>> zfs list /export/compress
>> 
> NAME  USED  AVAIL  REFER  MOUNTPOINT
> export-cit/compress  90.4M  1.17T  90.4M  /export/compress
>
> is 2GB/90.4M = 2048 / 90.4 = 22.65
>
>
> That still leaves me puzzled what the precise definition of compressratio is?
>
>
> Thanks.
>
> ---
> Stuart Anderson  [EMAIL PROTECTED]
> http://www.ligo.caltech.edu/~anderson
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Stuart Anderson
On Tue, Apr 15, 2008 at 01:37:43PM -0400, Luke Scharf wrote:
> 
> >>>zfs list /export/compress
> >>>
> >>>  
> >>NAME  USED  AVAIL  REFER  MOUNTPOINT
> >>export-cit/compress  90.4M  1.17T  90.4M  /export/compress
> >>
> >>is 2GB/90.4M = 2048 / 90.4 = 22.65
> >>
> >>
> >>That still leaves me puzzled what the precise definition of compressratio 
> >>is?
> >>
> 
> My guess is that the compressratio doesn't include any of those runs of 
> null characaters that weren't actually written to the disk.

This test was done with a file created with via "/bin/yes | head", i.e.,
it does not have any null characters specifically for this possibility.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Luke Scharf

>>> zfs list /export/compress
>>> 
>>>   
>> NAME  USED  AVAIL  REFER  MOUNTPOINT
>> export-cit/compress  90.4M  1.17T  90.4M  /export/compress
>>
>> is 2GB/90.4M = 2048 / 90.4 = 22.65
>>
>>
>> That still leaves me puzzled what the precise definition of compressratio is?
>> 

My guess is that the compressratio doesn't include any of those runs of 
null characaters that weren't actually written to the disk.

What I'm thinking is that if you have a disk-image (of a new computer) 
in there, the 4GB worth of actual data is counted against the 
compressratio, but the 36GB worth of empty (zeroed) space isn't counted.

But I don't have hard numbers, or a good way to prove it.  Not without 
reading all of the OP's data, anyway...  :-)

-Luke

P.S.  This "don't bother writing zeroes" behavior is wonderful when 
working with Xen disk images.  I'm a fan!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Bob Friesenhahn
On Tue, 15 Apr 2008, Luke Scharf wrote:
>
> AFAIK, ext3 supports sparse files just like it should -- but it doesn't
> dynamically figure out what to write based on the contents of the file.

Since zfs inspects all data anyway in order to compute the block 
checksum, it can easily know if a block is all zeros.

For ext3, inspecting all blocks for zeros would be viewed as 
unnecessary overhead.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Luke Scharf
You can fill up an ext3 filesystem with the following command:
dd if=/dev/zero of=delme.dat
You can't really fill up a ZFS filesystme that way.  I guess you could, 
but I've never had the patience -- when several GB worth of zeroes takes 
1kb worth of data, then it would take a very long time.

AFAIK, ext3 supports sparse files just like it should -- but it doesn't 
dynamically figure out what to write based on the contents of the file.

-Luke

Jeremy F. wrote:
> This may be my ignorance, but I thought all modern unix filesystems created 
> sparse files in this way?
>
>
> -Original Message-
> From: Stuart Anderson <[EMAIL PROTECTED]>
>
> Date: Mon, 14 Apr 2008 15:45:03 
> To:Luke Scharf <[EMAIL PROTECTED]>
> Cc:zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Confused by compressratio
>
>
> On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
>   
>> Stuart Anderson wrote:
>> 
>>> On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
>>>  
>>>   
>>>> Stuart Anderson wrote:
>>>>
>>>> 
>>>>> As an artificial test, I created a filesystem with compression enabled
>>>>> and ran "mkfile 1g" and the reported compressratio for that filesystem
>>>>> is 1.00x even though this 1GB file only uses only 1kB.
>>>>>
>>>>>  
>>>>>   
>>>> ZFS seems to treat files filled with zeroes as sparse files, regardless 
>>>> of whether or not compression is enabled.  Try "dd if=/dev/urandom 
>>>> of=1g.dat bs=1024 count=1048576" to create a file that won't exhibit 
>>>> this behavior.  Creating this file is a lot slower than writing zeroes 
>>>> (mostly due to the speed of the urandom device), but ZFS won't treat it 
>>>> like a sparse file, and it won't compress very well either.
>>>>
>>>> 
>>> However, I am still trying to reconcile the compression ratio as
>>> reported by compressratio vs the ratio of file sizes to disk blocks
>>> used (whether or not ZFS is creating sparse files).
>>>  
>>>   
>> Can you describe the data you're storing a bit?  Any big disk images?
>>
>> 
>
> Understanding the "mkfile" case would be a start, but the initial filesystem
> that started my confusion is one that has a number of ~50GByte mysql database
> files as well as a number of application code files.
>
> Here is another simple test to avoid any confusion/bugs related to NULL
> character sequeneces being compressed to nothing versus being treated
> as sparse files. In particular, a 2GByte file full of the output of
> /bin/yes:
>
>   
>> zfs create export-cit/compress
>> cd /export/compress
>> /bin/df -k .
>> 
> Filesystemkbytesused   avail capacity  Mounted on
> export-cit/compress  1704858624  55 1261199742 1%/export/compress
>   
>> zfs get compression export-cit/compress
>> 
> NAME PROPERTY VALUESOURCE
> export-cit/compress  compression  on   inherited from 
> export-cit
>   
>> /bin/yes | head -1073741824 > yes.dat
>> /bin/ls -ls yes.dat
>> 
> 185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
>   
>> /bin/df -k .
>> 
> Filesystemkbytesused   avail capacity  Mounted on
> export-cit/compress  1704858624   92563 1261107232 1%/export/compress
>   
>> zfs get compressratio export-cit/compress
>> 
> NAME PROPERTY   VALUESOURCE
> export-cit/compress  compressratio  28.39x   -
>
> So compressratio reports 28.39, but the ratio of file size to used disk for
> the only regular file on this filesystem, i.e., excluding the initial 55kB
> allocated for the "empty" filesystem is:
>
> 2147483648 / (185017 * 512) = 22.67
>
>
> Calculated another way from "zfs list" for the entire filesystem:
>
>   
>> zfs list /export/compress
>> 
> NAME  USED  AVAIL  REFER  MOUNTPOINT
> export-cit/compress  90.4M  1.17T  90.4M  /export/compress
>
> is 2GB/90.4M = 2048 / 90.4 = 22.65
>
>
> That still leaves me puzzled what the precise definition of compressratio is?
>
>
> Thanks.
>
> ---
> Stuart Anderson  [EMAIL PROTECTED]
> http://www.ligo.caltech.edu/~anderson
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Jeremy F.
This may be my ignorance, but I thought all modern unix filesystems created 
sparse files in this way?


-Original Message-
From: Stuart Anderson <[EMAIL PROTECTED]>

Date: Mon, 14 Apr 2008 15:45:03 
To:Luke Scharf <[EMAIL PROTECTED]>
Cc:zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Confused by compressratio


On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
> Stuart Anderson wrote:
> >On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
> >  
> >>Stuart Anderson wrote:
> >>
> >>>As an artificial test, I created a filesystem with compression enabled
> >>>and ran "mkfile 1g" and the reported compressratio for that filesystem
> >>>is 1.00x even though this 1GB file only uses only 1kB.
> >>> 
> >>>  
> >>ZFS seems to treat files filled with zeroes as sparse files, regardless 
> >>of whether or not compression is enabled.  Try "dd if=/dev/urandom 
> >>of=1g.dat bs=1024 count=1048576" to create a file that won't exhibit 
> >>this behavior.  Creating this file is a lot slower than writing zeroes 
> >>(mostly due to the speed of the urandom device), but ZFS won't treat it 
> >>like a sparse file, and it won't compress very well either.
> >>
> >
> >However, I am still trying to reconcile the compression ratio as
> >reported by compressratio vs the ratio of file sizes to disk blocks
> >used (whether or not ZFS is creating sparse files).
> >  
> 
> Can you describe the data you're storing a bit?  Any big disk images?
> 

Understanding the "mkfile" case would be a start, but the initial filesystem
that started my confusion is one that has a number of ~50GByte mysql database
files as well as a number of application code files.

Here is another simple test to avoid any confusion/bugs related to NULL
character sequeneces being compressed to nothing versus being treated
as sparse files. In particular, a 2GByte file full of the output of
/bin/yes:

>zfs create export-cit/compress
>cd /export/compress
>/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624  55 1261199742 1%/export/compress
>zfs get compression export-cit/compress
NAME PROPERTY VALUESOURCE
export-cit/compress  compression  on   inherited from export-cit
>/bin/yes | head -1073741824 > yes.dat
>/bin/ls -ls yes.dat
185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
>/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624   92563 1261107232 1%/export/compress
>zfs get compressratio export-cit/compress
NAME PROPERTY   VALUESOURCE
export-cit/compress  compressratio  28.39x   -

So compressratio reports 28.39, but the ratio of file size to used disk for
the only regular file on this filesystem, i.e., excluding the initial 55kB
allocated for the "empty" filesystem is:

2147483648 / (185017 * 512) = 22.67


Calculated another way from "zfs list" for the entire filesystem:

>zfs list /export/compress
NAME  USED  AVAIL  REFER  MOUNTPOINT
export-cit/compress  90.4M  1.17T  90.4M  /export/compress

is 2GB/90.4M = 2048 / 90.4 = 22.65


That still leaves me puzzled what the precise definition of compressratio is?


Thanks.

---
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-14 Thread Stuart Anderson
On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
> Stuart Anderson wrote:
> >On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
> >  
> >>Stuart Anderson wrote:
> >>
> >>>As an artificial test, I created a filesystem with compression enabled
> >>>and ran "mkfile 1g" and the reported compressratio for that filesystem
> >>>is 1.00x even though this 1GB file only uses only 1kB.
> >>> 
> >>>  
> >>ZFS seems to treat files filled with zeroes as sparse files, regardless 
> >>of whether or not compression is enabled.  Try "dd if=/dev/urandom 
> >>of=1g.dat bs=1024 count=1048576" to create a file that won't exhibit 
> >>this behavior.  Creating this file is a lot slower than writing zeroes 
> >>(mostly due to the speed of the urandom device), but ZFS won't treat it 
> >>like a sparse file, and it won't compress very well either.
> >>
> >
> >However, I am still trying to reconcile the compression ratio as
> >reported by compressratio vs the ratio of file sizes to disk blocks
> >used (whether or not ZFS is creating sparse files).
> >  
> 
> Can you describe the data you're storing a bit?  Any big disk images?
> 

Understanding the "mkfile" case would be a start, but the initial filesystem
that started my confusion is one that has a number of ~50GByte mysql database
files as well as a number of application code files.

Here is another simple test to avoid any confusion/bugs related to NULL
character sequeneces being compressed to nothing versus being treated
as sparse files. In particular, a 2GByte file full of the output of
/bin/yes:

>zfs create export-cit/compress
>cd /export/compress
>/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624  55 1261199742 1%/export/compress
>zfs get compression export-cit/compress
NAME PROPERTY VALUESOURCE
export-cit/compress  compression  on   inherited from export-cit
>/bin/yes | head -1073741824 > yes.dat
>/bin/ls -ls yes.dat
185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
>/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624   92563 1261107232 1%/export/compress
>zfs get compressratio export-cit/compress
NAME PROPERTY   VALUESOURCE
export-cit/compress  compressratio  28.39x   -

So compressratio reports 28.39, but the ratio of file size to used disk for
the only regular file on this filesystem, i.e., excluding the initial 55kB
allocated for the "empty" filesystem is:

2147483648 / (185017 * 512) = 22.67


Calculated another way from "zfs list" for the entire filesystem:

>zfs list /export/compress
NAME  USED  AVAIL  REFER  MOUNTPOINT
export-cit/compress  90.4M  1.17T  90.4M  /export/compress

is 2GB/90.4M = 2048 / 90.4 = 22.65


That still leaves me puzzled what the precise definition of compressratio is?


Thanks.

---
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-14 Thread Luke Scharf
Stuart Anderson wrote:
> On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
>   
>> Stuart Anderson wrote:
>> 
>>> As an artificial test, I created a filesystem with compression enabled
>>> and ran "mkfile 1g" and the reported compressratio for that filesystem
>>> is 1.00x even though this 1GB file only uses only 1kB.
>>>  
>>>   
>> ZFS seems to treat files filled with zeroes as sparse files, regardless 
>> of whether or not compression is enabled.  Try "dd if=/dev/urandom 
>> of=1g.dat bs=1024 count=1048576" to create a file that won't exhibit 
>> this behavior.  Creating this file is a lot slower than writing zeroes 
>> (mostly due to the speed of the urandom device), but ZFS won't treat it 
>> like a sparse file, and it won't compress very well either.
>> 
>
> However, I am still trying to reconcile the compression ratio as
> reported by compressratio vs the ratio of file sizes to disk blocks
> used (whether or not ZFS is creating sparse files).
>   

Can you describe the data you're storing a bit?  Any big disk images?

-Luke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-14 Thread Stuart Anderson
On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
> Stuart Anderson wrote:
> >As an artificial test, I created a filesystem with compression enabled
> >and ran "mkfile 1g" and the reported compressratio for that filesystem
> >is 1.00x even though this 1GB file only uses only 1kB.
> >  
> 
> ZFS seems to treat files filled with zeroes as sparse files, regardless 
> of whether or not compression is enabled.  Try "dd if=/dev/urandom 
> of=1g.dat bs=1024 count=1048576" to create a file that won't exhibit 
> this behavior.  Creating this file is a lot slower than writing zeroes 
> (mostly due to the speed of the urandom device), but ZFS won't treat it 
> like a sparse file, and it won't compress very well either.

However, I am still trying to reconcile the compression ratio as
reported by compressratio vs the ratio of file sizes to disk blocks
used (whether or not ZFS is creating sparse files).

Regarding sparse files, I recently found that the builtin heuristic
for auto detecting and creating sparse files in the GNU cp program
"works" on ZFS filesystems. In particular, if you use GNU cp to copy
a file from ZFS and it has a string of null characters in it (whether
or not it is stored as a sparse file) the output file (regardless of
the destination filesystem type) will be a sparse file. I have not seen
this behavior for copying such files from other source filesystems.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused by compressratio

2008-04-14 Thread Luke Scharf
Stuart Anderson wrote:
> As an artificial test, I created a filesystem with compression enabled
> and ran "mkfile 1g" and the reported compressratio for that filesystem
> is 1.00x even though this 1GB file only uses only 1kB.
>   

ZFS seems to treat files filled with zeroes as sparse files, regardless 
of whether or not compression is enabled.  Try "dd if=/dev/urandom 
of=1g.dat bs=1024 count=1048576" to create a file that won't exhibit 
this behavior.  Creating this file is a lot slower than writing zeroes 
(mostly due to the speed of the urandom device), but ZFS won't treat it 
like a sparse file, and it won't compress very well either.

-Luke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Confused by compressratio

2008-04-11 Thread Stuart Anderson
I am confused by the numerical value of compressratio. I copied a
compressed ZFS filesystem that is 38.5G in size (zfs list USED and
REFER value) and reports a compressratio value of "2.52x" to an
uncompressed ZFS filesystem and it expanded to 198G. So why is the
compressratio 2.52 rather than 198/38.5 = 5.14?

As an artificial test, I created a filesystem with compression enabled
and ran "mkfile 1g" and the reported compressratio for that filesystem
is 1.00x even though this 1GB file only uses only 1kB.

Note, this was done with ZFS version 4 on S10U4.

I would appreciate any help in understanding what compressratio means.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss