Re: [zfs-discuss] Gzip compression for ZFS

2007-04-11 Thread Darren Reed

Erblichs wrote:

My two cents,

...
Secondly, if I can add an additional item, would anyone
want to be able to encrypt the data vs compress or to
be able to combine encryption with compression?
  


Yes, I might want to encrypt all of my laptop's hard drive contents and 
I might
also want to have compression used prior to encryption to maximise the 
utility

I get from the relatively limited space.

Darren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gzip compression for ZFS

2007-04-05 Thread Erblichs
My two cents,

Assuming that you may pick a specific compression algorithm,
most algorithms can have different levels/percentages of
deflations/inflations which is effects the time to compress
and/or inflate wrt the CPU capacity.

Secondly, if I can add an additional item, would anyone
want to be able to encrypt the data vs compress or to
be able to combine encryption with compression?

Third, if data were to be compressed within a file
object, should a reader be made aware that the data
being read is compressed or should he just read
garbage? Would/should a field in the znode be read
transparently that de-compresses already compressed
data?

Fourth, if you take 8k and expect to alloc 8k of disk
block storage for it and compress it to 7k, are you
really saving 1k? Or are you just creating an additional
1K of internal fragmentation? It is possible that moving
'   7K of data accross your SCSI type interface may
give you a faster read/write performance. But that is
after the additional latency of the compress on the
async write and adds a real latency on the current
block read. So, what are you really gaining?

Fifth and hopefully last, should the znode have a
new length field that keeps the non-compressed length
for Posix compatibility. I am assuming large file
support where a process that is not large file aware
should not be able to even open the file. With the
additional field (unccompressed size) the file may
lie on the boundry for the large file open reqs.

Really last..., why not just compress the data stream
before writing it out to disk? Then you can at least do
a file on it and identify the type of compression...

Mitchell Erblich
-

Darren Reed wrote:
 
 From: Darren J Moffat [EMAIL PROTECTED]
 ...
  The other problem is that you basically need a global unique registry
  anyway so that compress algorithm 1 is always lzjb, 2 is gzip, 3 is 
  etc etc.  Similarly for crypto and any other transform.
 
 I've two thoughts on that:
 1) if there is to be a registry, it should be hosted by OpenSolaris
and be open to all and
 
 2) there should be provision for a private number space so that
people can implement their own whatever so long as they understand
that the filesystem will not work if plugged into something else.
 
 Case in point for (2), if I wanted to make a bzip2 version of ZFS at
 home then I should be able to and in doing so chose a number for it
 that I know will be safe for my playing at home.  I shouldn't have
 to come to zfs-discuss@opensolaris.org to pick a number.
 
 Darren
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gzip compression for ZFS

2007-04-04 Thread Darren Reed

From: Darren J Moffat [EMAIL PROTECTED]
...
The other problem is that you basically need a global unique registry 
anyway so that compress algorithm 1 is always lzjb, 2 is gzip, 3 is  
etc etc.  Similarly for crypto and any other transform.


I've two thoughts on that:
1) if there is to be a registry, it should be hosted by OpenSolaris
  and be open to all and

2) there should be provision for a private number space so that
  people can implement their own whatever so long as they understand
  that the filesystem will not work if plugged into something else.

Case in point for (2), if I wanted to make a bzip2 version of ZFS at
home then I should be able to and in doing so chose a number for it
that I know will be safe for my playing at home.  I shouldn't have
to come to zfs-discuss@opensolaris.org to pick a number.

Darren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gzip compression for ZFS

2007-04-04 Thread Casper . Dik

From: Darren J Moffat [EMAIL PROTECTED]
...
 The other problem is that you basically need a global unique registry 
 anyway so that compress algorithm 1 is always lzjb, 2 is gzip, 3 is  
 etc etc.  Similarly for crypto and any other transform.

I've two thoughts on that:
1) if there is to be a registry, it should be hosted by OpenSolaris
   and be open to all and

2) there should be provision for a private number space so that
   people can implement their own whatever so long as they understand
   that the filesystem will not work if plugged into something else.

Case in point for (2), if I wanted to make a bzip2 version of ZFS at
home then I should be able to and in doing so chose a number for it
that I know will be safe for my playing at home.  I shouldn't have
to come to zfs-discuss@opensolaris.org to pick a number.


I'm not sure we really need a registry or a number space.

Algorithms should have names, not numbers.

The zpool should contain a table:

- 1 lzjb
- 2 gzip
- 3 ...

but it could just as well be:

- 1 gzip
- 2 ...
- 3 lzjb

the zpool would simply not load if it cannot find the algorithm(s) used
to store data in the zpool (or return I/O errors on the files/metadata it
can't decompress)
 
Global registries seem like a bad idea; names can be made arbitrarily
long to make uniqueness.  There's no reason why the algorithm can't
be renamed after creating the pool might a clash occur; renumbering
would be much harder.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gzip compression for ZFS

2007-03-29 Thread Darren J Moffat



I suppose what would have been nice to see, architecturally,
was a way to transform data at some part in the pipeline and
to be able to specify various types of transforms, be they
compression, encryption or something else.  But maybe I'm
just dreaming without understanding the complexities of
what needs to happen on the inside, such that where these
two operations might take place is actually incompatible and
thus there is little point of such a generalisation.


You really don't want crypto algorithms to be that pluggable.  The 
reason being with crypto you need to make a very careful choice of 
algorithm,keylength and mode (CBC vs EBC vs CTR vs CCM vs GCM etc) and 
that isn't something you want an end admin doing.  You don't want them 
switching from AES-256-CCM to Blowfish-448-CBC because they see 448 is 
bigger than 128 so therefore more secure.


For compression it wouldn't be such a big deal except for an 
implementation artifact of how the ZIO pipeline works, it is partly 
controlled by the compress stage at the moment.


The other problem is that you basically need a global unique registry 
anyway so that compress algorithm 1 is always lzjb, 2 is gzip, 3 is  
etc etc.  Similarly for crypto and any other transform.



BTW I actually floated the idea of a generic ZTL - ZIO Transform Layer 
about this time last year (partly in jest because ZTL is the last three 
on my car registration :-)).



So, for example, if the interface was plugable and Sun only
wanted to ship gzip, but I wanted to create a better ZFS
based appliance than one based on just OpenSolaris, I might
build a bzip2 module for the kernel and have ZFS use that
by default.


I have to say it because normally I'm all for pluggable interfaces and I 
don't think the answer should be its open source just add it in this 
case I think that is for now the safer way.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Gzip compression for ZFS

2007-03-28 Thread Darren . Reed

Adam,

With the blog entry[1] you've made about gzip for ZFS, it raises
a couple of questions...

1) It would appear that a ZFS filesystem can support files of
  varying compression algorithm.  If a file is compressed using
  method A but method B is now active, if I truncate the file
  and rewrite it, is A or B used?

2) The question of whether or not to use bzip2 was raised in
  the comment section of your blog.  How easy would it be to
  implement a plugable (or more generic) interface between
  ZFS and the compression algorithms it uses such that I
  can modload a bzip2 compression LKM and tell ZFS to
  use that?  I suspect that doing this will take extra work
  from the Solaris side of things too...

3) Given (1), are there any thoughts about being able to specify
  different compression algorithms for different directories
  (or files) on a ZFS filesystem?

And thanks for the great work!

Cheers,
Darren

[1] - http://blogs.sun.com/ahl/entry/gzip_for_zfs_update
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gzip compression for ZFS

2007-03-28 Thread Robert Milkowski
Hello Darren,

Thursday, March 29, 2007, 12:01:21 AM, you wrote:

DRSC Adam,

DRSC With the blog entry[1] you've made about gzip for ZFS, it raises
DRSC a couple of questions...

DRSC 1) It would appear that a ZFS filesystem can support files of
DRSCvarying compression algorithm.  If a file is compressed using
DRSCmethod A but method B is now active, if I truncate the file
DRSCand rewrite it, is A or B used?

All new blocks will be written using B.
It also means that some block belonging to the same file can be
compressed with method A and some with method B (and other if
compression gained less that 12% won't be compressed at all - unless
it was changed in a code).


DRSC 2) The question of whether or not to use bzip2 was raised in
DRSCthe comment section of your blog.  How easy would it be to
DRSCimplement a plugable (or more generic) interface between
DRSCZFS and the compression algorithms it uses such that I
DRSCcan modload a bzip2 compression LKM and tell ZFS to
DRSCuse that?  I suspect that doing this will take extra work
DRSCfrom the Solaris side of things too...

LKM - Linux Kernel Module? :))

Anyway - first problem is to find in-kernel compress/decompress
algorithms or port user-land to kernel. Gzip was easier as it already
was there. So if you have in-kernel bzip2 implementation, and better
yet working on Solaris, then adding bzip2 to ZFS would be quite easy.

Last time I looked in ZFS compression code it wasn't dynamically
expandable - available compression algorithms have to be compiled in.

Now while dynamically plugable implementation sounds appealing I doubt
people will actually create any such modules in reality. Not to
mentions problems like - you export a pool, import on another host
without your module and basically you can't access your data.


DRSC 3) Given (1), are there any thoughts about being able to specify
DRSCdifferent compression algorithms for different directories
DRSC(or files) on a ZFS filesystem?

There was small discussion here some time ago about such
possibilities but I doubt anything was actually be done about it.

Despite that 12% barrier which possibly saves CPU on decompression
with poorly compressed data (as such data won't actually be
compressed) I'm afraid there's nothing more.

It was suggested here that perhaps ZFS could turn compression off for
specific file types determined either on file name extension or its
magic cookie - that would probably save some CPU on some workloads.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gzip compression for ZFS

2007-03-28 Thread Darren . Reed

Robert Milkowski wrote:


Hello Darren,

Thursday, March 29, 2007, 12:01:21 AM, you wrote:

DRSC Adam,
...
DRSC 2) The question of whether or not to use bzip2 was raised in
DRSCthe comment section of your blog.  How easy would it be to
DRSCimplement a plugable (or more generic) interface between
DRSCZFS and the compression algorithms it uses such that I
DRSCcan modload a bzip2 compression LKM and tell ZFS to
DRSCuse that?  I suspect that doing this will take extra work
DRSCfrom the Solaris side of things too...

LKM - Linux Kernel Module? :))

Anyway - first problem is to find in-kernel compress/decompress
algorithms or port user-land to kernel. Gzip was easier as it already
was there. So if you have in-kernel bzip2 implementation, and better
yet working on Solaris, then adding bzip2 to ZFS would be quite easy.

Last time I looked in ZFS compression code it wasn't dynamically
expandable - available compression algorithms have to be compiled in.
 



I suppose what would have been nice to see, architecturally,
was a way to transform data at some part in the pipeline and
to be able to specify various types of transforms, be they
compression, encryption or something else.  But maybe I'm
just dreaming without understanding the complexities of
what needs to happen on the inside, such that where these
two operations might take place is actually incompatible and
thus there is little point of such a generalisation.


Now while dynamically plugable implementation sounds appealing I doubt
people will actually create any such modules in reality. Not to
mentions problems like - you export a pool, import on another host
without your module and basically you can't access your data.
 



Maybe, but it is also a very good method for enabling people
to develop and test new compression algorithms for use with
filesystems.  It also opens up a new avenue for people that
want to build their own appliances using ZFS to create an
extra thing that differentiates them from others.

So, for example, if the interface was plugable and Sun only
wanted to ship gzip, but I wanted to create a better ZFS
based appliance than one based on just OpenSolaris, I might
build a bzip2 module for the kernel and have ZFS use that
by default.

Darren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Gzip compression for ZFS

2007-03-28 Thread Robert Milkowski
Hello Darren,

Thursday, March 29, 2007, 12:55:03 AM, you wrote:

DRSC So, for example, if the interface was plugable and Sun only
DRSC wanted to ship gzip, but I wanted to create a better ZFS
DRSC based appliance than one based on just OpenSolaris, I might
DRSC build a bzip2 module for the kernel and have ZFS use that
DRSC by default.

Or better yet to implement CAS-like (de-DUP) solution (but this would probably
better work on file basis rather on block basis).

ok, I get the idea.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss