Hi, The error:
"*java.lang.OutOfMemoryError: unable to create new native thread*" doesn't have anything todo with virtual address space in a 64bit process. It's highly likely that your "nproc" setting is too low. I would increase this. For example Cloudera Manager sets this to 32K which is a much more reasonable number for machines that run java processes. Brock On Sun, Dec 29, 2013 at 4:49 PM, shibi S <[email protected]> wrote: > Thanks Matt and Brock. > > Once characteristics of my application is , it doesn't receive that much > data and might take long time to reach the rollsize 64mb. I guess that is > causing flume to consume more VM. when I change setting to roll over 10 > minutes, VM usage came down. But then smaller size files are being copied > to HDFS, which wont work well with Hadoop. > > Matt - I tried with lower thread count for AVRO source,and it brought down > the VM usage a little bit, but not much. > > Brock - I get "*java.lang.OutOfMemoryError: unable to create new native > thread*" while flume VM usage is above 16gb and doesn't allow other > applications to run. > > *Following setting uses 16.5g vm* > < a1.sinks.k1.hdfs.txnEventMax = 40000 > < a1.sinks.k1.hdfs.rollInterval = 0 > < a1.sinks.k1.hdfs.rollSize = 67108864 > < a1.sinks.k1.hdfs.rollCount = 1000 > < a1.sinks.k1.hdfs.batchSize = 1000 > --- > > *Following setting uses 11.5 g VM* > > > #a1.sinks.k1.hdfs.txnEventMax = 40000 > > a1.sinks.k1.hdfs.rollInterval = 10 > > a1.sinks.k2.hdfs.roundUnit = minute > > a1.sinks.k1.hdfs.rollSize = 0 > > a1.sinks.k1.hdfs.rollCount = 500 > > a1.sinks.k1.hdfs.batchSize = 500 > > a1.sinks.k1.hdfs.idleTimeout =0 > > a1.sinks.k1.hdfs.maxOpenFiles = 1000 > > Thanks > > Shibi > > ------------------------------ > From: [email protected] > Date: Sat, 14 Dec 2013 10:57:09 -0600 > Subject: Re: Flume uses high Virtual memory > To: [email protected] > > > Additionally I'd note that worrying about virtual memory on 64 bit > machines is probably not worth your time. The newer versions of malloc() do > arena allocation and reserve virtual memory for each thread. This does not > however, actually consume memory. > > > On Sat, Dec 14, 2013 at 10:49 AM, Matt Wise <[email protected]> wrote: > > We ran into an issue just like this when we did not limit our source > 'thread' counts. The Avro source seems to spawn potentially thousands of > threads if you don't limit it: > a1.sources.r1.threads = 50 > (you can validate this with 'htop') > > Matt Wise > Sr. Systems Architect > Nextdoor.com > > > On Fri, Dec 13, 2013 at 2:58 PM, shibi S <[email protected]> wrote: > > > Flume Agent that is writing to HDFS is high on virtual memory usage > (15.6g). Agent writes to 3 different directories in HDFS based on type of > data that is received. Configuration is given below. Any idea why VM usage > is high? I see high VM usage only on the Agents that is writing to HDFS. > Other Agents are low in VM usage. > > Flume version : apache-flume-1.4.0 (I tested with 1.5 version as well). > > * PID USER PR NI VIRT RES SHR S %CPU %MEM > TIME+ COMMAND * > > 38663 deploy 20 0 15.6g 576m 15m S 2.6 > 0.2 225:19.29 java > > *Configuration:* > a1.sources.r1.selector.type = multiplexing > a1.sources.r1.selector.header = header1 > a1.sources.r1.selector.mapping.red_cancel = c1 > > > *Source Configuration:*a1.sources.r1.type = avro > a1.sources.r1.bind = 0.0.0.0 > a1.sources.r1.port = 60000 > > *Sink configuration:* > a1.sinks.k1.type=hdfs > a1.sinks.k1.hdfs.path=hdfs://<HDFS PATH>/%Y/%m/%d/%H > a1.sinks.k1.hdfs.fileType = DataStream > a1.sinks.k1.hdfs.filePrefix = filetype1- > a1.sinks.k1.hdfs.useLocalTimeStamp = true > #a1.sinks.k1.hdfs.txnEventMax = 40000 > a1.sinks.k1.hdfs.rollInterval = 10 > a1.sinks.k2.hdfs.roundUnit = minute > a1.sinks.k1.hdfs.rollSize = 0 > a1.sinks.k1.hdfs.rollCount = 500 > a1.sinks.k1.hdfs.batchSize = 500 > a1.sinks.k1.hdfs.idleTimeout =0 > a1.sinks.k1.hdfs.maxOpenFiles = 1000 > > *Channel configuration:* > a1.channels.c2.type=file > a1.channels.c2.checkpointDir =/x/home/deploy/flume/checkpoint2 > a1.channels.c2.dataDirs = /x/home/deploy/flume/data2 > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org > -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
