Thanks. I will review it :)
Thanks, Hari On Wednesday, June 11, 2014 at 5:00 PM, Abraham Fine wrote: > I went ahead and created a JIRA and patch: > https://issues.apache.org/jira/browse/FLUME-2401 > > The option is configurable with: > agentX.channels.ch1.compressBackupCheckpoint = true > > As per your recommendation, I used snappy-java. I also considered the > snappy and lz4 implementations in Hadoop IO but noticed that the > Hadoop IO dependency was removed in > https://issues.apache.org/jira/browse/FLUME-1285 > > Thanks, > Abe > -- > Abraham Fine | Software Engineer > (516) 567-2535 > BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com > (http://www.brightroll.com) > > > On Mon, Jun 9, 2014 at 4:01 PM, Hari Shreedharan > <[email protected] (mailto:[email protected])> wrote: > > Hi Abraham, > > > > Compressing the backup checkpoint is very possible. Since the backup is > > rarely read (only if the original one is corrupt on restarts), is it used. > > So I think compressing it using something like Snappy would make sense (GZIP > > might hit performance). Can you try using snappy-java and see if that gives > > good perf and reasonable compression? > > > > Patches are always welcome. I’d be glad to review and commit it. I would > > suggest making the compression optional via configuration so that anyone > > with smaller channels don’t end up using CPU for not much gain. > > > > > > Thanks, > > Hari > > > > On Monday, June 9, 2014 at 3:56 PM, Abraham Fine wrote: > > > > Hello- > > > > We are using Flume 1.4 with File Channel configured to use a very > > large capacity. We keep the checkpoint and backup checkpoint on > > separate disks. > > > > Normally the file channel is mostly empty (<<1% of capacity). For the > > checkpoint the disk I/O seems to be very reasonable due to the usage > > of a MappedByteBuffer. > > > > On the other hand, the backup checkpoint seems to be written to disk > > in its entirety over and over again, resulting in very high disk > > utilization. > > > > I noticed that, because the checkpoint file is mostly empty, it is > > very compressible. I was able to GZIP our checkpoint from 381M to > > 386K. I was wondering if it would be possible to always compress the > > backup checkpoint before writing it to disk. > > > > I would be happy to work on a patch to implement this functionality if > > there is interest. > > > > Thanks in Advance, > > > > -- > > Abraham Fine | Software Engineer > > (516) 567-2535 > > BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com > > (http://www.brightroll.com) > > > > >
