On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <[email protected]> wrote:
> On 11/13/2013 03:04 PM, Brock Noland wrote: > > The file channel uses a WAL which sits on disk. Each time an event is > > committed an fsync is called to ensure that data is durable. Without > > this fsync there is no durability guarantee. More details here: > > https://blogs.apache.org/flume/entry/apache_flume_filechannel > > Yes indeed. I was just not expecting the performance impact to be that big. > > The issue is that when the source is committing one-by-one it's > > consuming the disk doing an fsync for each event. I would find a way to > > batch up the requests so they are not written one-by-one or use multiple > > disks for the file channel. > > I am already using multiple disks for the channel (4). Can you share your configuration? > Batching the > requests is indeed what I am doing to prevent the filechannel to be the > bottleneck (using a flume agent with a memory channel in front of the > agent with the file channel), but it inheritely means that I loose > end-to-end durability because events are buffered in memory before being > flushed to disk. > I would be curious to know though if you doubled the sinks if that would give more time to readers. Could you take three-four thread dumps of the JVM while it's in this state and share them?
