On Mon, Mar 02, 2015 at 04:21:57PM -0500, Jesse Becker wrote:
One thing that we *have* learned is that you should keep all of the raw records. They compress well, and disk space is cheap. Our UGE logs compress about 85% using gzip -9, and is fast. Other methods (xz) get almost 90%, but take about 100 times longer to compress. (The specific method doesn't matter, even LZO would do nicely).
Some numbers for reference. YMMV, etc
The source file covers a recent time period, with 1,485,006 jobs in a about a week of time. Normalized compression times (CPU user time, gzip = 1.0). Options were "-9" for gzip, "-e" for xz, and levels 1, 3, 6, and 9 for lzo: lzo1 = 0.055 lzo3 = 0.061 lzo6 = 0.062 gzip = 1.000 lzo9 = 2.510 xz = 23.538 [jb@host sge]$ ls -l accounting.0* -rw-r--r-- 1 jb jb 748505122 Feb 14 13:26 accounting.0 -rw-r--r-- 1 jb jb 201460466 Mar 2 18:49 accounting.0.lzo1 (73.1%) -rw-r--r-- 1 jb jb 200146962 Mar 2 18:49 accounting.0.lzo3 (73.3%) -rw-r--r-- 1 jb jb 200146962 Mar 2 18:49 accounting.0.lzo6 (73.3%) -rw-r--r-- 1 jb jb 109623543 Mar 2 16:06 accounting.0.gz (85.4%) -rw-r--r-- 1 jb jb 135796662 Mar 2 18:51 accounting.0.lzo9 (81.9%) -rw-r--r-- 1 jb jb 75222044 Mar 2 16:17 accounting.0.xz (90.0%) [jb@host sge]$ wc -l accounting.0 1485006 accounting.0 Note that lzo9 took 2.5 times longer to compress, but produced a notably larger file. Decompression times (writing to /dev/null), normalized to gzip: lzo9: 0.332 lzo6: 0.351 lzo3: 0.361 lzo1: 0.362 gzip: 1.000 xz: 2.002 Interesting that xz is so much faster at decompression. -- Jesse Becker (Contractor) _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
