Ahmed,

I’m pretty new to hadoop so I’m trying my best to debug this so I can’t pull 
the events yet.

We are on 15k disks across the board but your uncompress then compress led me 
to what I think is the right track. I’m going to try to send to the flume 
servers un-compressed and see if that helps. We are getting a lot of cpu wait 
when new files come in.

For example

Cpu0  : 14.8%us, 15.1%sy,  0.0%ni,  2.0%id, 65.1%wa,  0.0%hi,  3.0%si,  0.0%st
Cpu1  :  4.0%us, 39.7%sy,  0.0%ni, 34.0%id, 22.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.3%us, 97.4%sy,  0.0%ni,  2.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  :  2.3%us, 75.5%sy,  0.0%ni, 13.2%id,  8.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  1.3%us, 51.8%sy,  0.0%ni, 30.9%id, 15.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us, 99.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  :  4.0%us, 40.7%sy,  0.0%ni, 41.7%id, 13.6%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.3%us, 99.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu9  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us, 99.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu11 :  2.0%us, 72.0%sy,  0.0%ni,  4.0%id, 22.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  5.0%us, 33.3%sy,  0.0%ni, 26.3%id, 35.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us, 99.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu16 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st


Thanks  

--  
Mike Zupan


On Wednesday, October 15, 2014 at 2:27 PM, Ahmed Vila wrote:

> Hi Mike,
>  
> It would be really helpful to provide number of events entering the source.
>  
> Also, provided CPU utilization from top, the line that breaks down 
> utilization by user/system/iowait/idle.
> If it has higher iowait then it might be that channel is utilizing more IO 
> than your storage can handle - especially if it's an NFS or iSCSI mount.
> But, the most dependent factor is number of events.
>  
> I see that you actually un-compress the event on arrival to the source and 
> compress it back at the sink.
> It's well known that compression/decompression is above all CPU-bound task.
> That might be a problem and reduce flume throughput greatly, especially 
> because you have 4 sinks each doing compression on it's own.
>  
> Regards,
> Ahmed Vila
>  
> On Wed, Oct 15, 2014 at 5:32 PM, Mike Zupan <[email protected] 
> (mailto:[email protected])> wrote:
> > I’m seeing issues with flume server using very high amounts of CPU. Just 
> > wondering if this is a common issue with a file channel. I’m pretty new to 
> > flume so sorry if this isn’t enough to debug the issue.
> >  
> > Current top looks like  
> >  
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >  8509 root      20   0 22.0g 8.6g 675m S 1109.4 13.7   1682:45 java
> >  8251 root      20   0 21.9g 8.3g 647m S 1083.5 13.2   1476:27 java
> >  7593 root      20   0 12.4g 8.4g  18m S 1007.5 13.4   1866:18 java
> >  
> > As you can see we have 3 out of 4 flume servers using 1000% cpu.  
> >  
> > Details are
> >  
> > OS: CentOS 6.5
> > Java: Oracle "1.7.0_45"
> >  
> > Flume: flume-1.4.0.2.1.1.0-385.el6.noarch
> >  
> > Our config for the server looks like this
> >  
> > ###############################################
> > # Agent configuration for transactional data
> > ###############################################
> > nontx_host07_agent01.sources = avro
> > nontx_host07_agent01.channels = fc
> > nontx_host07_agent01.sinks = hdfs_sink_01 hdfs_sink_02 hdfs_sink_03 
> > hdfs_sink_04
> >  
> > ##################################################
> > # info is published to port 9991
> > ##################################################
> > nontx_host07_agent01.sources.avro.type = avro
> > nontx_host07_agent01.sources.avro.bind = 0.0.0.0
> > nontx_host07_agent01.sources.avro.port = 9991
> > nontx_host07_agent01.sources.avro.threads = 100
> > nontx_host07_agent01.sources.avro.compression-type = deflate
> > nontx_host07_agent01.sources.avro.interceptors = ts id
> > nontx_host07_agent01.sources.avro.interceptors.ts.type = timestamp
> > nontx_host07_agent01.sources.avro.interceptors.ts.preserveExisting = false
> > nontx_host07_agent01.sources.avro.interceptors.id.type = 
> > org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
> > nontx_host07_agent01.sources.avro.interceptors.id.preserveExisting = true
> >  
> >  
> > ##################################################
> > # The Channels
> > ##################################################
> > nontx_host07_agent01.channels.fc.type = file
> > nontx_host07_agent01.channels.fc.checkpointDir = 
> > /flume/channels/checkpoint/nontx_host07_agent01
> > nontx_host07_agent01.channels.fc.dataDirs = 
> > /flume/channels/data/nontx_host07_agent01
> > nontx_host07_agent01.channels.fc.capacity = 140000000
> > nontx_host07_agent01.channels.fc.transactionCapacity = 240000
> >  
> > ##################################################
> > # Sinks
> > ##################################################
> > nontx_host07_agent01.sinks.hdfs_sink_01.type = hdfs
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.path = 
> > hdfs://cluster01:8020/flume/%{log_type}
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.filePrefix = 
> > flume_nontx_host07_agent01_sink01_%Y%m%d%H
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.inUsePrefix=_
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.inUseSuffix=.tmp
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.fileType = CompressedStream
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.codeC = snappy
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollSize = 0
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollCount = 0
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.rollInterval = 300
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.idleTimeout = 30
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.timeZone = America/Los_Angeles
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.callTimeout = 30000
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.batchSize = 50000
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.round = true
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.roundUnit = minute
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.roundValue = 5
> > nontx_host07_agent01.sinks.hdfs_sink_01.hdfs.threadsPoolSize = 2
> > nontx_host07_agent01.sinks.hdfs_sink_01.serializer = 
> > com.manage.flume.serialization.HeaderAndBodyJsonEventSerializer$Builder
> >  
> >  
> > --  
> > Mike Zupan
> >  
>  
>  
>  
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended 
> recipient(s) only. This email contains confidential information. It should 
> not be copied, disclosed to, retained or used by, any party other than the 
> intended recipient. Any unauthorised distribution, dissemination or copying 
> of this E-mail or its attachments, and/or any use of any information 
> contained in them, is strictly prohibited and may be illegal. If you are not 
> an intended recipient then please promptly delete this e-mail and any 
> attachment and all copies and inform the sender directly via email. Any 
> emails that you send to us may be monitored by systems or persons other than 
> the named communicant for the purposes of ascertaining whether the 
> communication complies with the law and company policies.  

Reply via email to