On Mon, Mar 02, 2015 at 04:21:57PM -0500, Jesse Becker wrote:
One thing that we *have* learned is that you should keep all of the
raw records.  They compress well, and disk space is cheap.  Our UGE
logs compress about 85% using gzip -9, and is fast.  Other methods
(xz) get almost 90%, but take about 100 times longer to compress.
(The specific method doesn't matter, even LZO would do nicely).


Some numbers for reference. YMMV, etc
The source file covers a recent time period, with 1,485,006 jobs in a
about a week of time.

Normalized compression times (CPU user time, gzip = 1.0).  Options were
"-9" for gzip, "-e" for xz, and levels 1, 3, 6, and 9 for lzo:

   lzo1 =  0.055
   lzo3 =  0.061
   lzo6 =  0.062
   gzip =  1.000
   lzo9 =  2.510
   xz   = 23.538

   [jb@host sge]$ ls -l accounting.0*
   -rw-r--r-- 1 jb jb 748505122 Feb 14 13:26 accounting.0
   -rw-r--r-- 1 jb jb 201460466 Mar  2 18:49 accounting.0.lzo1 (73.1%)
   -rw-r--r-- 1 jb jb 200146962 Mar  2 18:49 accounting.0.lzo3 (73.3%)
   -rw-r--r-- 1 jb jb 200146962 Mar  2 18:49 accounting.0.lzo6 (73.3%)
   -rw-r--r-- 1 jb jb 109623543 Mar  2 16:06 accounting.0.gz   (85.4%)
   -rw-r--r-- 1 jb jb 135796662 Mar  2 18:51 accounting.0.lzo9 (81.9%)
   -rw-r--r-- 1 jb jb  75222044 Mar  2 16:17 accounting.0.xz   (90.0%)

   [jb@host sge]$ wc -l accounting.0
   1485006 accounting.0

Note that lzo9 took 2.5 times longer to compress, but produced a notably
larger file.


Decompression times (writing to /dev/null), normalized to gzip:

   lzo9: 0.332
   lzo6: 0.351
   lzo3: 0.361
   lzo1: 0.362
   gzip: 1.000
   xz:   2.002

Interesting that xz is so much faster at decompression.

--
Jesse Becker (Contractor)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to