Flume Agent that is writing to HDFS is high on virtual memory usage (15.6g).
Agent writes to 3 different directories in HDFS based on type of data that is
received. Configuration is given below. Any idea why VM usage is high? I see
high VM usage only on the Agents that is writing to HDFS. Other Agents are low
in VM usage.
Flume version : apache-flume-1.4.0 (I tested with 1.5 version as well).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
38663 deploy 20 0 15.6g 576m 15m S 2.6 0.2
225:19.29 java
Configuration:
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = header1
a1.sources.r1.selector.mapping.red_cancel = c1
Source Configuration:
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 60000
Sink configuration:
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://<HDFS PATH>/%Y/%m/%d/%H
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = filetype1-
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#a1.sinks.k1.hdfs.txnEventMax = 40000
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 500
a1.sinks.k1.hdfs.batchSize = 500
a1.sinks.k1.hdfs.idleTimeout =0
a1.sinks.k1.hdfs.maxOpenFiles = 1000
Channel configuration:
a1.channels.c2.type=file
a1.channels.c2.checkpointDir =/x/home/deploy/flume/checkpoint2
a1.channels.c2.dataDirs = /x/home/deploy/flume/data2