Re: file channel read performance impacted by write rate

Brock Noland Tue, 17 Dec 2013 08:17:28 -0800

Hi,

1) You are only using a single disk for file channel and it looks like a
single disk for both checkpoint and data directories therefore throughput
is going to be extremely slow.
2) I asked for 8-10 thread dumps of the whole JVM but got two thread dumps
of two random threads. If you don't want to share this info on-list, please
send it to me directly...but I will need a bunch of thread dumps to debug
thing.


Brock


On Tue, Dec 17, 2013 at 9:32 AM, Shangan Chen <[email protected]>wrote:

> the attachment flume.conf is channel and sink config, dumps.txt is thread
> dumps.
> channel type "dual" is a channel type I developped to utilize the merits
> of memory channel and filechannel. when the volume is not quite big, I use
> memory channel, when the size of events reach to a percentage of the memory
> channel capacity, it switch to the filechannel, when volume decrease switch
> to memory again.
>
> thanks for looking into this.
>
>
> On Tue, Dec 17, 2013 at 8:54 PM, Brock Noland <[email protected]> wrote:
>
>> Can you take and share 8-10 thread dumps while the sink is taking events
>> "slowly"?
>>
>> Can you share your machine and file channel configuration?
>> On Dec 17, 2013 6:28 AM, "Shangan Chen" <[email protected]> wrote:
>>
>>> we face the same problem, performance of taking events from channel is a
>>> severe bottleneck. When there're less events in channel,  problem does not
>>> alleviate. following is a log of the metrics of writing to hdfs, writing to
>>> 5 files with a batchsize of 200000, take cost the most of the total time.
>>>
>>>
>>> 17 十二月 2013 18:49:28,056 INFO
>>>  [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:489)  -
>>> HdfsSink-TIME-STAT sink[sink_hdfs_b] writers[5] eventcount[200000]
>>> all[44513] take[38197] append[5647] sync[17] getFilenameTime[371]
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Nov 25, 2013 at 4:46 PM, Jan Van Besien <[email protected]>wrote:
>>>
>>>> Hi,
>>>>
>>>> Is anybody still looking into this question?
>>>>
>>>> Should I log it in jira such that somebody can look into it later?
>>>>
>>>> thanks,
>>>> Jan
>>>>
>>>>
>>>>
>>>> On 11/18/2013 11:28 AM, Jan Van Besien wrote:
>>>> > Hi,
>>>> >
>>>> > Sorry it took me a while to answer this. I compiled a small test case
>>>> > using only off the shelve flume components that shows what is going
>>>> on.
>>>> >
>>>> > The setup is a single agent with http source, null sink and file
>>>> > channel. I am using the default configuration as much as possible.
>>>> >
>>>> > The test goes as follows:
>>>> >
>>>> > - start the agent without sink
>>>> > - run a script that sends http requests in multiple threads to the
>>>> http
>>>> > source (the script simply calls the url
>>>> http://localhost:8080/?key=value
>>>> > over and over a gain, whereby value is a random string of 100 chars).
>>>> > - this script does about 100 requests per second on my machine. I
>>>> leave
>>>> > it running for a while, such that the file channel contains about
>>>> 20000
>>>> > events.
>>>> > - add the null sink to the configuration (around 11:14:33 in the log).
>>>> > - observe the logging of the null sink. You'll see in the log file
>>>> that
>>>> > it takes more than 10 seconds per 1000 events (until about even 5000,
>>>> > around 11:15:33)
>>>> > - stop the http request generating script (i.e. no more writing in
>>>> file
>>>> > channel)
>>>> > - observer the logging of the null sink: events 5000 until 20000 are
>>>> all
>>>> > processed within a few seconds.
>>>> >
>>>> > In the attachment:
>>>> > - flume log
>>>> > - thread dumps while the ingest was running and the null sink was
>>>> enabled
>>>> > - config (agent1.conf)
>>>> >
>>>> > I also tried with more sinks (4), see agent2.conf. The results are
>>>> the same.
>>>> >
>>>> > Thanks for looking into this,
>>>> > Jan
>>>> >
>>>> >
>>>> > On 11/14/2013 05:08 PM, Brock Noland wrote:
>>>> >> On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <[email protected]
>>>> >> <mailto:[email protected]>> wrote:
>>>> >>
>>>> >>      On 11/13/2013 03:04 PM, Brock Noland wrote:
>>>> >>       > The file channel uses a WAL which sits on disk.  Each time an
>>>> >>      event is
>>>> >>       > committed an fsync is called to ensure that data is durable.
>>>> Without
>>>> >>       > this fsync there is no durability guarantee. More details
>>>> here:
>>>> >>       >
>>>> https://blogs.apache.org/flume/entry/apache_flume_filechannel
>>>> >>
>>>> >>      Yes indeed. I was just not expecting the performance impact to
>>>> be
>>>> >>      that big.
>>>> >>
>>>> >>
>>>> >>       > The issue is that when the source is committing one-by-one
>>>> it's
>>>> >>       > consuming the disk doing an fsync for each event.  I would
>>>> find a
>>>> >>      way to
>>>> >>       > batch up the requests so they are not written one-by-one or
>>>> use
>>>> >>      multiple
>>>> >>       > disks for the file channel.
>>>> >>
>>>> >>      I am already using multiple disks for the channel (4).
>>>> >>
>>>> >>
>>>> >> Can you share your configuration?
>>>> >>
>>>> >>      Batching the
>>>> >>      requests is indeed what I am doing to prevent the filechannel
>>>> to be the
>>>> >>      bottleneck (using a flume agent with a memory channel in front
>>>> of the
>>>> >>      agent with the file channel), but it inheritely means that I
>>>> loose
>>>> >>      end-to-end durability because events are buffered in memory
>>>> before being
>>>> >>      flushed to disk.
>>>> >>
>>>> >>
>>>> >> I would be curious to know though if you doubled the sinks if that
>>>> would
>>>> >> give more time to readers. Could you take three-four thread dumps of
>>>> the
>>>> >> JVM while it's in this state and share them?
>>>> >>
>>>> >
>>>>
>>>>
>>>
>>>
>>> --
>>> have a good day!
>>> chenshang'an
>>>
>>>
>
>
> --
> have a good day!
> chenshang'an
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Re: file channel read performance impacted by write rate

Reply via email to