Hi,

I would use FileChannel as opposed to RecoverableMemoryChannel.

Also, it sounds like your not batching somewhere since with batching
you will see a disk seek per event. 1000 ms / 100 events = 10 ms
(about a disk seek).

Brock

On Thu, Jul 12, 2012 at 3:55 PM, Raymond Ng <raymond...@gmail.com> wrote:
> Hi
>
> I'm trying to investigate whether I can use flume for streaming syslog data
> on a production environemnt, and investigating which channel will give me
> durability and also performance
>
> I've tested using memory channel and the performance is good (i.e. with a
> 1GB JVM, achieving 9000 events / sec, with 1 agent with a syslog source
> hopping to another agent which has a hdfs sink)
>
> however durability and recoverability are also important when it comes to
> production solution, and it seems both Jdbc and RecoverableMemory channels
> offer significantly slow performance (no more than 100 events / sec).  Also
> RecoverableMemory channel doesn't seem to resume the streaming after the
> agents were restarted
>
> below is my agent configs, could you advice how I can improve the
> performance for both jdbc and recoverableMemoery channels, is it possible to
> config it to achieve half the performance figure that the memory channel can
> achieve?
>
> Agent with Syslog source
>
> agent.sources = SysLogSrc
> #agent.channels = MemChannel
> #agent.channels = JdbcChannel
> agent.channels = RecovMemChannel
> agent.sinks = AvroSink
>
> # SysLogSrc
> agent.sources.SysLogSrc.type = syslogtcp
> agent.sources.SysLogSrc.host = localhost
> agent.sources.SysLogSrc.port = 10902
> #agent.sources.SysLogSrc.channels = MemChannel
> #agent.sources.SysLogSrc.channels = JdbcChannel
> agent.sources.SysLogSrc.channels = RecovMemChannel
> # MemChannel
> agent.channels.MemChannel.type = memory
> agent.channels.MemChannel.capacity = 1000000
> agent.channels.MemChannel.transactionCapacity = 10000
> agent.channels.MemChannel.keep-alive = 3
> # JdbcChannel
> agent.channels.JdbcChannel.type = jdbc
> agent.channels.JdbcChannel.db.type = DERBY
> agent.channels.JdbcChannel.driver.class =
> org.apache.derby.jdbc.EmbeddedDriver
> agent.channels.JdbcChannel.create.schema = true
> agent.channels.JdbcChannel.create.index = true
> agent.channels.JdbcChannel.create.foreignkey = true
> agent.channels.JdbcChannel.maximum.connections = 10
> agent.channels.JdbcChannel.maximum.capacity = 0
> agent.channels.JdbcChannel.sysprop.user.home = /flume/data
> # RecovMemChannel
> agent.channels.RecovMemChannel.type =
> org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
> agent.channels.RecovMemChannel.wal.dataDir =
> /flume/recoverable-memory-channel
> agent.channels.RecovMemChannel.wal.rollSize = 104857600
> agent.channels.RecovMemChannel.wal.minRetentionPeriod = 3600000
> agent.channels.RecovMemChannel.wal.workerInterval = 5000
> agent.channels.RecovMemChannel.wal.maxLogsSize = 1073741824
> agent.channels.RecovMemChannel.capacity = 1000000
> agent.channels.RecovMemChannel.transactionCapacity = 10000
> agent.channels.RecovMemChannel.keep-alive = 3
>
> # AvroSink
> agent.sinks.AvroSink.type = avro
> agent.sinks.AvroSink.hostname = 192.168.200.170
> agent.sinks.AvroSink.port = 10900
> agent.sinks.AvroSink.batch-size = 10000
> #agent.sinks.AvroSink.channel = JdbcChannel
> #agent.sinks.AvroSink.channel = MemChannel
> agent.sinks.AvroSink.channel = RecovMemChannel
>
>
> Agent with HDFS sink
>
> agent.sources = AvroSrc
> #agent.channels = MemChannel
> #agent.channels = JdbcChannel
> agent.channels = RecovMemChannel
> agent.sinks = HdfsSink
> # AvroSrc
> agent.sources.AvroSrc.type = avro
> agent.sources.AvroSrc.bind = 192.168.200.170
> agent.sources.AvroSrc.port = 10900
> agent.sources.AvroSrc.channels = RecovMemChannel
> #agent.sources.AvroSrc.channels = JdbcChannel
> #agent.sources.AvroSrc.channels = MemChannel
> # MemChannel
> agent.channels.MemChannel.type = memory
> agent.channels.MemChannel.capacity = 1000000
> agent.channels.MemChannel.transactionCapacity = 10000
> agent.channels.MemChannel.stay-alive = 3
> # JdbcChannel
> agent.channels.JdbcChannel.type = jdbc
> agent.channels.JdbcChannel.db.type = DERBY
> agent.channels.JdbcChannel.driver.class =
> org.apache.derby.jdbc.EmbeddedDriver
> agent.channels.JdbcChannel.create.schema = true
> agent.channels.JdbcChannel.create.index = true
> agent.channels.JdbcChannel.create.foreignkey = true
> agent.channels.JdbcChannel.maximum.connections = 10
> agent.channels.JdbcChannel.maximum.capacity = 0
> agent.channels.JdbcChannel.sysprop.user.home = /flume/data
> # RecovMemChannel
> agent.channels.RecovMemChannel.type =
> org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel
> agent.channels.RecovMemChannel.wal.dataDir =
> /flume/recoverable-memory-channel
> agent.channels.RecovMemChannel.wal.rollSize = 104857600
> agent.channels.RecovMemChannel.wal.minRetentionPeriod = 3600000
> agent.channels.RecovMemChannel.wal.workerInterval = 5000
> agent.channels.RecovMemChannel.wal.maxLogsSize = 1073741824
> agent.channels.RecovMemChannel.capacity = 1000000
> agent.channels.RecovMemChannel.transactionCapacity = 10000
> agent.channels.RecovMemChannel.keep-alive = 3
> # HdfsSink
> agent.sinks.HdfsSink.type = hdfs
> agent.sinks.HdfsSink.hdfs.path = hdfs://master:50070/data/flume
> agent.sinks.HdfsSink.hdfs.filePrefix = data_%Y%m%d
> #agent.sinks.HdfsSink.channel = MemChannel
> #agent.sinks.HdfsSink.channel = JdbcChannel
> agent.sources.AvroSrc.channels = RecovMemChannel
> agent.sinks.HdfsSink.hdfs.rollInterval = 300
> agent.sinks.HdfsSink.hdfs.rollSize = 209715200
> agent.sinks.HdfsSink.hdfs.rollCount = 0
> agent.sinks.HdfsSink.hdfs.batchSize = 1000
> agent.sinks.HdfsSink.hdfs.writeFormat = Text
> agent.sinks.HdfsSink.hdfs.fileType = DataStream
>
> --
> Rgds
> Ray



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Reply via email to