Hari, thank you for your quick reply. A follow-up question to help me figure out how best to proceed on my end: Can you provide an estimate as to when the next Flume release will occur?
On Mon, Sep 8, 2014 at 4:07 PM, Hari Shreedharan <[email protected]> wrote: > This patch should address the issue, if enabled: > https://git-wip-us.apache.org/repos/asf?p=flume.git;a=commitdiff;h=69fd6b3ad5e5b9ae6f1293b3d8e57ed57fd6701c;hp=f15f20785262ac3cb3e35c2a12e669b7a836d35f > > It will be part of the next Flume release (or CDH5.2.0). > > -- > > Thanks, > Hari > > > Michael Diamant <[email protected]> > September 8, 2014 at 12:58 PM > My team uses Flume 1.4.0 packaged with CDH5.0.2 via an embedded agent to > write to a file channel. From a previous thread started by my colleague, > "FileChannel Replays consistently take a long time" and associated issue, > https://issues.apache.org/jira/browse/FLUME-2450, it was suggested to use > a backup checkpoint directory to avoid lengthy replays. When I enabled the > backup checkpoint directory, I observed via iotop near 100% IO by my > application with the embedded agent. This level of IO persists for about > 30 seconds rendering the application unusable during this time period. > > For comparison, I monitored via iotop when backup checkpoint is disabled. > IO activity occurs for at most several seconds. That is, there is a > qualitative difference when enabling the backup checkpoint directory. > Additionally, I also tried deleting the existing checkpoints/data > directories to start with a clean slate. Those experiment results are > in-line with my above observations. > > Is this expected behavior when using a backup checkpoint directory? Is > there anyway in which the amount of IO can be reduced? I appreciate > feedback and insights because the current behavior is untenable for a > production environment. > > Thank you, > Michael > >
