Re: Flume uses high Virtual memory

Brock Noland Mon, 30 Dec 2013 07:18:35 -0800

Hi,

The error:


"*java.lang.OutOfMemoryError: unable to create new native thread*"

doesn't have anything todo with virtual address space in a 64bit process.
It's highly likely that your "nproc" setting is too low. I would increase
this. For example Cloudera Manager sets this to 32K which is a much more
reasonable number for machines that run java processes.

Brock


On Sun, Dec 29, 2013 at 4:49 PM, shibi S <[email protected]> wrote:

> Thanks Matt and Brock.
>
> Once characteristics of my application is , it doesn't receive that much
> data and might take long time to reach the rollsize 64mb. I guess that is
> causing flume to consume more VM. when I change setting to roll over 10
> minutes, VM usage came down. But then smaller size files are being copied
> to HDFS, which wont work well with Hadoop.
>
> Matt - I tried with lower thread count for AVRO source,and it brought down
> the VM usage a little bit, but not much.
>
> Brock - I get "*java.lang.OutOfMemoryError: unable to create new native
> thread*" while flume VM usage is above 16gb and doesn't allow other
> applications to run.
>
> *Following setting uses 16.5g vm*
> < a1.sinks.k1.hdfs.txnEventMax = 40000
> < a1.sinks.k1.hdfs.rollInterval = 0
> < a1.sinks.k1.hdfs.rollSize = 67108864
> < a1.sinks.k1.hdfs.rollCount = 1000
> < a1.sinks.k1.hdfs.batchSize = 1000
> ---
>
> *Following setting uses 11.5 g VM*
>
> > #a1.sinks.k1.hdfs.txnEventMax = 40000
> > a1.sinks.k1.hdfs.rollInterval = 10
> > a1.sinks.k2.hdfs.roundUnit = minute
> > a1.sinks.k1.hdfs.rollSize = 0
> > a1.sinks.k1.hdfs.rollCount = 500
> > a1.sinks.k1.hdfs.batchSize = 500
> > a1.sinks.k1.hdfs.idleTimeout =0
> > a1.sinks.k1.hdfs.maxOpenFiles = 1000
>
> Thanks
>
> Shibi
>
> ------------------------------
> From: [email protected]
> Date: Sat, 14 Dec 2013 10:57:09 -0600
> Subject: Re: Flume uses high Virtual memory
> To: [email protected]
>
>
> Additionally I'd note that worrying about virtual memory on 64 bit
> machines is probably not worth your time. The newer versions of malloc() do
> arena allocation and reserve virtual memory for each thread.  This does not
> however, actually consume memory.
>
>
> On Sat, Dec 14, 2013 at 10:49 AM, Matt Wise <[email protected]> wrote:
>
> We ran into an issue just like this when we did not limit our source
> 'thread' counts. The Avro source seems to spawn potentially thousands of
> threads if you don't limit it:
> a1.sources.r1.threads = 50
> (you can validate this with 'htop')
>
> Matt Wise
> Sr. Systems Architect
>  Nextdoor.com
>
>
> On Fri, Dec 13, 2013 at 2:58 PM, shibi S <[email protected]> wrote:
>
>
> Flume Agent that is writing to HDFS is high on virtual memory usage
> (15.6g).  Agent writes to 3 different directories in HDFS based on type of
> data that is received. Configuration is given below. Any idea why VM usage
> is high?  I see high VM usage only on the Agents that is writing to HDFS.
> Other Agents are low in VM usage.
>
> Flume version : apache-flume-1.4.0 (I tested with 1.5 version as well).
>
> * PID      USER         PR  NI   VIRT    RES       SHR   S  %CPU %MEM
> TIME+          COMMAND        *
>
> 38663  deploy      20   0    15.6g  576m   15m  S   2.6
> 0.2         225:19.29    java
>
> *Configuration:*
> a1.sources.r1.selector.type = multiplexing
> a1.sources.r1.selector.header = header1
> a1.sources.r1.selector.mapping.red_cancel = c1
>
>
> *Source Configuration:*a1.sources.r1.type = avro
> a1.sources.r1.bind = 0.0.0.0
> a1.sources.r1.port = 60000
>
> *Sink configuration:*
> a1.sinks.k1.type=hdfs
> a1.sinks.k1.hdfs.path=hdfs://<HDFS PATH>/%Y/%m/%d/%H
> a1.sinks.k1.hdfs.fileType = DataStream
> a1.sinks.k1.hdfs.filePrefix = filetype1-
> a1.sinks.k1.hdfs.useLocalTimeStamp = true
> #a1.sinks.k1.hdfs.txnEventMax = 40000
> a1.sinks.k1.hdfs.rollInterval = 10
> a1.sinks.k2.hdfs.roundUnit = minute
> a1.sinks.k1.hdfs.rollSize = 0
> a1.sinks.k1.hdfs.rollCount = 500
> a1.sinks.k1.hdfs.batchSize = 500
> a1.sinks.k1.hdfs.idleTimeout =0
> a1.sinks.k1.hdfs.maxOpenFiles = 1000
>
> *Channel configuration:*
> a1.channels.c2.type=file
> a1.channels.c2.checkpointDir =/x/home/deploy/flume/checkpoint2
> a1.channels.c2.dataDirs = /x/home/deploy/flume/data2
>
>
>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Re: Flume uses high Virtual memory

Reply via email to