Hi, 1) You are only using a single disk for file channel and it looks like a single disk for both checkpoint and data directories therefore throughput is going to be extremely slow. 2) I asked for 8-10 thread dumps of the whole JVM but got two thread dumps of two random threads. If you don't want to share this info on-list, please send it to me directly...but I will need a bunch of thread dumps to debug thing.
Brock On Tue, Dec 17, 2013 at 9:32 AM, Shangan Chen <[email protected]>wrote: > the attachment flume.conf is channel and sink config, dumps.txt is thread > dumps. > channel type "dual" is a channel type I developped to utilize the merits > of memory channel and filechannel. when the volume is not quite big, I use > memory channel, when the size of events reach to a percentage of the memory > channel capacity, it switch to the filechannel, when volume decrease switch > to memory again. > > thanks for looking into this. > > > On Tue, Dec 17, 2013 at 8:54 PM, Brock Noland <[email protected]> wrote: > >> Can you take and share 8-10 thread dumps while the sink is taking events >> "slowly"? >> >> Can you share your machine and file channel configuration? >> On Dec 17, 2013 6:28 AM, "Shangan Chen" <[email protected]> wrote: >> >>> we face the same problem, performance of taking events from channel is a >>> severe bottleneck. When there're less events in channel, problem does not >>> alleviate. following is a log of the metrics of writing to hdfs, writing to >>> 5 files with a batchsize of 200000, take cost the most of the total time. >>> >>> >>> 17 εδΊζ 2013 18:49:28,056 INFO >>> [SinkRunner-PollingRunner-DefaultSinkProcessor] >>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:489) - >>> HdfsSink-TIME-STAT sink[sink_hdfs_b] writers[5] eventcount[200000] >>> all[44513] take[38197] append[5647] sync[17] getFilenameTime[371] >>> >>> >>> >>> >>> >>> On Mon, Nov 25, 2013 at 4:46 PM, Jan Van Besien <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> Is anybody still looking into this question? >>>> >>>> Should I log it in jira such that somebody can look into it later? >>>> >>>> thanks, >>>> Jan >>>> >>>> >>>> >>>> On 11/18/2013 11:28 AM, Jan Van Besien wrote: >>>> > Hi, >>>> > >>>> > Sorry it took me a while to answer this. I compiled a small test case >>>> > using only off the shelve flume components that shows what is going >>>> on. >>>> > >>>> > The setup is a single agent with http source, null sink and file >>>> > channel. I am using the default configuration as much as possible. >>>> > >>>> > The test goes as follows: >>>> > >>>> > - start the agent without sink >>>> > - run a script that sends http requests in multiple threads to the >>>> http >>>> > source (the script simply calls the url >>>> http://localhost:8080/?key=value >>>> > over and over a gain, whereby value is a random string of 100 chars). >>>> > - this script does about 100 requests per second on my machine. I >>>> leave >>>> > it running for a while, such that the file channel contains about >>>> 20000 >>>> > events. >>>> > - add the null sink to the configuration (around 11:14:33 in the log). >>>> > - observe the logging of the null sink. You'll see in the log file >>>> that >>>> > it takes more than 10 seconds per 1000 events (until about even 5000, >>>> > around 11:15:33) >>>> > - stop the http request generating script (i.e. no more writing in >>>> file >>>> > channel) >>>> > - observer the logging of the null sink: events 5000 until 20000 are >>>> all >>>> > processed within a few seconds. >>>> > >>>> > In the attachment: >>>> > - flume log >>>> > - thread dumps while the ingest was running and the null sink was >>>> enabled >>>> > - config (agent1.conf) >>>> > >>>> > I also tried with more sinks (4), see agent2.conf. The results are >>>> the same. >>>> > >>>> > Thanks for looking into this, >>>> > Jan >>>> > >>>> > >>>> > On 11/14/2013 05:08 PM, Brock Noland wrote: >>>> >> On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <[email protected] >>>> >> <mailto:[email protected]>> wrote: >>>> >> >>>> >> On 11/13/2013 03:04 PM, Brock Noland wrote: >>>> >> > The file channel uses a WAL which sits on disk. Each time an >>>> >> event is >>>> >> > committed an fsync is called to ensure that data is durable. >>>> Without >>>> >> > this fsync there is no durability guarantee. More details >>>> here: >>>> >> > >>>> https://blogs.apache.org/flume/entry/apache_flume_filechannel >>>> >> >>>> >> Yes indeed. I was just not expecting the performance impact to >>>> be >>>> >> that big. >>>> >> >>>> >> >>>> >> > The issue is that when the source is committing one-by-one >>>> it's >>>> >> > consuming the disk doing an fsync for each event. I would >>>> find a >>>> >> way to >>>> >> > batch up the requests so they are not written one-by-one or >>>> use >>>> >> multiple >>>> >> > disks for the file channel. >>>> >> >>>> >> I am already using multiple disks for the channel (4). >>>> >> >>>> >> >>>> >> Can you share your configuration? >>>> >> >>>> >> Batching the >>>> >> requests is indeed what I am doing to prevent the filechannel >>>> to be the >>>> >> bottleneck (using a flume agent with a memory channel in front >>>> of the >>>> >> agent with the file channel), but it inheritely means that I >>>> loose >>>> >> end-to-end durability because events are buffered in memory >>>> before being >>>> >> flushed to disk. >>>> >> >>>> >> >>>> >> I would be curious to know though if you doubled the sinks if that >>>> would >>>> >> give more time to readers. Could you take three-four thread dumps of >>>> the >>>> >> JVM while it's in this state and share them? >>>> >> >>>> > >>>> >>>> >>> >>> >>> -- >>> have a good day! >>> chenshang'an >>> >>> > > > -- > have a good day! > chenshang'an > > -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
