This is a great idea. I'd like to add a couple of suggestions:

It might be interesting to focus on compression algorithms which are
optimized for particular workloads and data types, an Oracle database for
example.

It might be worthwhile to have some sort of adaptive compression whereby
ZFS could choose a compression algorithm based on its detection of the
type of data being stored.

Adam

On Thu, Jul 05, 2007 at 08:29:38PM -0300, Domingos Soares wrote:
> Bellow, follows a proposal for a new opensolaris project. Of course,
> this is open to change since I just wrote down some ideas I had months
> ago, while researching the topic as a graduate student in Computer
> Science, and since I'm not an opensolaris/ZFS expert at all. I would
> really appreciate any suggestion or comments.
> 
> PROJECT PROPOSAL: ZFS Compression Algorithms.
> 
> The main purpose of this project is the development of new
> compression schemes for the ZFS file system. We plan to start with
> the development of a fast implementation of a Burrows Wheeler
> Transform based algorithm (BWT). BWT is an outstanding tool
> and the currently known lossless compression algorithms
> based on it outperform the compression ratio of algorithms derived from the 
> well
> known Ziv-Lempel algorithm, while being a little more time and space
> expensive. Therefore, there is space for improvement: recent results
> show that the running time and space needs of such algorithms can be
> significantly reduced and the same results suggests that BWT is
> likely to become the new standard in compression
> algorithms[1]. Suffixes Sorting (i.e. the problem of sorting suffixes of a
> given string) is the main bottleneck of BWT and really significant
> progress has been made in this area since the first algorithms of
> Manbers and Myers[2] and Larsson and Sadakane[3], notably the new
> linear time algorithms of Karkkainen and Sanders[4]; Kim, Sim and
> Park[5] and Ko e aluru[6] and also the promising O(nlogn) algorithm of
> Karkkainen and Burkhardt[7].
> 
> As a conjecture, we believe that some intrinsic properties of ZFS and
> file systems in general (e.g. sparseness and data entropy in blocks)
> could be exploited in order to produce brand new and really efficient
> compression algorithms, as well as the adaptation of existing ones to
> the task. The study might be extended to the analysis of data in
> specific applications (e.g. web servers, mail servers and others) in
> order to develop compression schemes for specific environments and/or
> modify the existing Ziv-Lempel based scheme to deal better with such
> environments.
> 
> [1] "The Burrows-Wheeler Transform: Theory and Practice". Manzini,
> Giovanni. Proc. 24th Int. Symposium on Mathematical Foundations of
> Computer Science
> 
> [2] "Suffix Arrays: A New Method for
> On-Line String Searches". Manber, Udi and Myers, Eugene W..  SIAM
> Journal on Computing, Vol. 22 Issue 5. 1990
> 
> [3] "Faster suffix sorting". Larsson, N Jasper and Sadakane,
> Kunihiko. TECHREPORT, Department of Computer Science, Lund University,
> 1999
> 
> [4] "Simple Linear Work Suffix Array Construction". Karkkainen, Juha
> and Sanders,Peter. Proc. 13th International Conference on Automata,
> Languages and Programming, 2003
> 
> [5]"Linear-time construction of suffix arrays" D.K. Kim, J.S. Sim,
> H. Park, K. Park, CPM, LNCS, Vol. 2676, 2003
> 
> [6]"Space ecient linear time construction of sux arrays",P. Ko and
> S. Aluru, CPM 2003.
> 
> [7]"Fast Lightweight Suffix Array Construction and
> Checking". Burkhardt, Stefan and K?rkk?inen, Juha. 14th Annual
> Symposium, CPM 2003,
> 
> 
> Domingos Soares Neto
> University of Sao Paulo
> Institute of Mathematics and Statistics
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Adam Leventhal, Solaris Kernel Development       http://blogs.sun.com/ahl
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to