Hi, I would use FileChannel as opposed to RecoverableMemoryChannel.
Also, it sounds like your not batching somewhere since with batching you will see a disk seek per event. 1000 ms / 100 events = 10 ms (about a disk seek). Brock On Thu, Jul 12, 2012 at 3:55 PM, Raymond Ng <raymond...@gmail.com> wrote: > Hi > > I'm trying to investigate whether I can use flume for streaming syslog data > on a production environemnt, and investigating which channel will give me > durability and also performance > > I've tested using memory channel and the performance is good (i.e. with a > 1GB JVM, achieving 9000 events / sec, with 1 agent with a syslog source > hopping to another agent which has a hdfs sink) > > however durability and recoverability are also important when it comes to > production solution, and it seems both Jdbc and RecoverableMemory channels > offer significantly slow performance (no more than 100 events / sec). Also > RecoverableMemory channel doesn't seem to resume the streaming after the > agents were restarted > > below is my agent configs, could you advice how I can improve the > performance for both jdbc and recoverableMemoery channels, is it possible to > config it to achieve half the performance figure that the memory channel can > achieve? > > Agent with Syslog source > > agent.sources = SysLogSrc > #agent.channels = MemChannel > #agent.channels = JdbcChannel > agent.channels = RecovMemChannel > agent.sinks = AvroSink > > # SysLogSrc > agent.sources.SysLogSrc.type = syslogtcp > agent.sources.SysLogSrc.host = localhost > agent.sources.SysLogSrc.port = 10902 > #agent.sources.SysLogSrc.channels = MemChannel > #agent.sources.SysLogSrc.channels = JdbcChannel > agent.sources.SysLogSrc.channels = RecovMemChannel > # MemChannel > agent.channels.MemChannel.type = memory > agent.channels.MemChannel.capacity = 1000000 > agent.channels.MemChannel.transactionCapacity = 10000 > agent.channels.MemChannel.keep-alive = 3 > # JdbcChannel > agent.channels.JdbcChannel.type = jdbc > agent.channels.JdbcChannel.db.type = DERBY > agent.channels.JdbcChannel.driver.class = > org.apache.derby.jdbc.EmbeddedDriver > agent.channels.JdbcChannel.create.schema = true > agent.channels.JdbcChannel.create.index = true > agent.channels.JdbcChannel.create.foreignkey = true > agent.channels.JdbcChannel.maximum.connections = 10 > agent.channels.JdbcChannel.maximum.capacity = 0 > agent.channels.JdbcChannel.sysprop.user.home = /flume/data > # RecovMemChannel > agent.channels.RecovMemChannel.type = > org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel > agent.channels.RecovMemChannel.wal.dataDir = > /flume/recoverable-memory-channel > agent.channels.RecovMemChannel.wal.rollSize = 104857600 > agent.channels.RecovMemChannel.wal.minRetentionPeriod = 3600000 > agent.channels.RecovMemChannel.wal.workerInterval = 5000 > agent.channels.RecovMemChannel.wal.maxLogsSize = 1073741824 > agent.channels.RecovMemChannel.capacity = 1000000 > agent.channels.RecovMemChannel.transactionCapacity = 10000 > agent.channels.RecovMemChannel.keep-alive = 3 > > # AvroSink > agent.sinks.AvroSink.type = avro > agent.sinks.AvroSink.hostname = 192.168.200.170 > agent.sinks.AvroSink.port = 10900 > agent.sinks.AvroSink.batch-size = 10000 > #agent.sinks.AvroSink.channel = JdbcChannel > #agent.sinks.AvroSink.channel = MemChannel > agent.sinks.AvroSink.channel = RecovMemChannel > > > Agent with HDFS sink > > agent.sources = AvroSrc > #agent.channels = MemChannel > #agent.channels = JdbcChannel > agent.channels = RecovMemChannel > agent.sinks = HdfsSink > # AvroSrc > agent.sources.AvroSrc.type = avro > agent.sources.AvroSrc.bind = 192.168.200.170 > agent.sources.AvroSrc.port = 10900 > agent.sources.AvroSrc.channels = RecovMemChannel > #agent.sources.AvroSrc.channels = JdbcChannel > #agent.sources.AvroSrc.channels = MemChannel > # MemChannel > agent.channels.MemChannel.type = memory > agent.channels.MemChannel.capacity = 1000000 > agent.channels.MemChannel.transactionCapacity = 10000 > agent.channels.MemChannel.stay-alive = 3 > # JdbcChannel > agent.channels.JdbcChannel.type = jdbc > agent.channels.JdbcChannel.db.type = DERBY > agent.channels.JdbcChannel.driver.class = > org.apache.derby.jdbc.EmbeddedDriver > agent.channels.JdbcChannel.create.schema = true > agent.channels.JdbcChannel.create.index = true > agent.channels.JdbcChannel.create.foreignkey = true > agent.channels.JdbcChannel.maximum.connections = 10 > agent.channels.JdbcChannel.maximum.capacity = 0 > agent.channels.JdbcChannel.sysprop.user.home = /flume/data > # RecovMemChannel > agent.channels.RecovMemChannel.type = > org.apache.flume.channel.recoverable.memory.RecoverableMemoryChannel > agent.channels.RecovMemChannel.wal.dataDir = > /flume/recoverable-memory-channel > agent.channels.RecovMemChannel.wal.rollSize = 104857600 > agent.channels.RecovMemChannel.wal.minRetentionPeriod = 3600000 > agent.channels.RecovMemChannel.wal.workerInterval = 5000 > agent.channels.RecovMemChannel.wal.maxLogsSize = 1073741824 > agent.channels.RecovMemChannel.capacity = 1000000 > agent.channels.RecovMemChannel.transactionCapacity = 10000 > agent.channels.RecovMemChannel.keep-alive = 3 > # HdfsSink > agent.sinks.HdfsSink.type = hdfs > agent.sinks.HdfsSink.hdfs.path = hdfs://master:50070/data/flume > agent.sinks.HdfsSink.hdfs.filePrefix = data_%Y%m%d > #agent.sinks.HdfsSink.channel = MemChannel > #agent.sinks.HdfsSink.channel = JdbcChannel > agent.sources.AvroSrc.channels = RecovMemChannel > agent.sinks.HdfsSink.hdfs.rollInterval = 300 > agent.sinks.HdfsSink.hdfs.rollSize = 209715200 > agent.sinks.HdfsSink.hdfs.rollCount = 0 > agent.sinks.HdfsSink.hdfs.batchSize = 1000 > agent.sinks.HdfsSink.hdfs.writeFormat = Text > agent.sinks.HdfsSink.hdfs.fileType = DataStream > > -- > Rgds > Ray -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/