Ahmed, I’m pretty new to hadoop so I’m trying my best to debug this so I can’t pull the events yet.
We are on 15k disks across the board but your uncompress then compress led me to what I think is the right track. I’m going to try to send to the flume servers un-compressed and see if that helps. We are getting a lot of cpu wait when new files come in. For example Cpu0 : 14.8%us, 15.1%sy, 0.0%ni, 2.0%id, 65.1%wa, 0.0%hi, 3.0%si, 0.0%st Cpu1 : 4.0%us, 39.7%sy, 0.0%ni, 34.0%id, 22.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.3%us, 97.4%sy, 0.0%ni, 2.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu3 : 2.3%us, 75.5%sy, 0.0%ni, 13.2%id, 8.9%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 1.3%us, 51.8%sy, 0.0%ni, 30.9%id, 15.9%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 99.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu7 : 4.0%us, 40.7%sy, 0.0%ni, 41.7%id, 13.6%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 0.3%us, 99.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu9 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 0.0%us, 99.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu11 : 2.0%us, 72.0%sy, 0.0%ni, 4.0%id, 22.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 5.0%us, 33.3%sy, 0.0%ni, 26.3%id, 35.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 0.0%us, 99.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu16 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Thanks -- Mike Zupan On Wednesday, October 15, 2014 at 2:27 PM, Ahmed Vila wrote: > Hi Mike, > > It would be really helpful to provide number of events entering the source. > > Also, provided CPU utilization from top, the line that breaks down > utilization by user/system/iowait/idle. > If it has higher iowait then it might be that channel is utilizing more IO > than your storage can handle - especially if it's an NFS or iSCSI mount. > But, the most dependent factor is number of events. > > I see that you actually un-compress the event on arrival to the source and > compress it back at the sink. > It's well known that compression/decompression is above all CPU-bound task. > That might be a problem and reduce flume throughput greatly, especially > because you have 4 sinks each doing compression on it's own. > > Regards, > Ahmed Vila > > On Wed, Oct 15, 2014 at 5:32 PM, Mike Zupan <[email protected] > (mailto:[email protected])> wrote: > > I’m seeing issues with flume server using very high amounts of CPU. Just > > wondering if this is a common issue with a file channel. I’m pretty new to > > flume so sorry if this isn’t enough to debug the issue. > > > > Current top looks like > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 8509 root 20 0 22.0g 8.6g 675m S 1109.4 13.7 1682:45 java > > 8251 root 20 0 21.9g 8.3g 647m S 1083.5 13.2 1476:27 java > > 7593 root 20 0 12.4g 8.4g 18m S 1007.5 13.4 1866:18 java > > > > As you can see we have 3 out of 4 flume servers using 1000% cpu. > > > > Details are > > > > OS: CentOS 6.5 > > Java: Oracle "1.7.0_45" > > > > Flume: flume-1.4.0.2.1.1.0-385.el6.noarch > > > > Our config for the server looks like this > > > > ############################################### > > # Agent configuration for transactional data > > ############################################### > > nontx_host07_agent01.sources = avro > > nontx_host07_agent01.channels = fc > > nontx_host07_agent01.sinks = hdfs_sink_01 hdfs_sink_02 hdfs_sink_03 > > hdfs_sink_04 > > > > ################################################## > > # info is published to port 9991 > > ################################################## > > nontx_host07_agent01.sources.avro.type = avro > > nontx_host07_agent01.sources.avro.bind = 0.0.0.0 > > nontx_host07_agent01.sources.avro.port = 9991 > > nontx_host07_agent01.sources.avro.threads = 100 > > nontx_host07_agent01.sources.avro.compression-type = deflate > > nontx_host07_agent01.sources.avro.interceptors = ts id > > nontx_host07_agent01.sources.avro.interceptors.ts.type = timestamp > > nontx_host07_agent01.sources.avro.interceptors.ts.preserveExisting = false > > nontx_host07_agent01.sources.avro.interceptors.id.type = > > org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder > > nontx_host07_agent01.sources.avro.interceptors.id.preserveExisting = true > > > > > > ################################################## > > # The Channels > > ################################################## > > nontx_host07_agent01.channels.fc.type = file > > nontx_host07_agent01.channels.fc.checkpointDir = > > /flume/channels/checkpoint/nontx_host07_agent01 > > nontx_host07_agent01.channels.fc.dataDirs = > > /flume/channels/data/nontx_host07_agent01 > > nontx_host07_agent01.channels.fc.capacity = 140000000 > > nontx_host07_agent01.channels.fc.transactionCapacity = 240000 > > > > ################################################## > > # Sinks > > ################################################## > > nontx_host07_agent01.sinks.hdfs_sink_01.type = hdfs > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.path = > > hdfs://cluster01:8020/flume/%{log_type} > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.filePrefix = > > flume_nontx_host07_agent01_sink01_%Y%m%d%H > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.inUsePrefix=_ > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.inUseSuffix=.tmp > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.fileType = CompressedStream > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.codeC = snappy > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollSize = 0 > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollCount = 0 > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollInterval = 300 > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.idleTimeout = 30 > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.timeZone = America/Los_Angeles > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.callTimeout = 30000 > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.batchSize = 50000 > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.round = true > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.roundUnit = minute > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.roundValue = 5 > > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.threadsPoolSize = 2 > > nontx_host07_agent01.sinks.hdfs_sink_01.serializer = > > com.manage.flume.serialization.HeaderAndBodyJsonEventSerializer$Builder > > > > > > -- > > Mike Zupan > > > > > > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are not > an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other than > the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies.
