Lain,
I am using file channel. Source is spoolDir and Sinks are Solr and HDFS
Please find below my Code
#Flume Configuration Starts
agent.sources = SpoolDirSrc
agent.channels = Channel1 Channel2
agent.sinks = SolrSink HDFSsink
# Configure Source
agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sources.SpoolDirSrc.type = spooldir
#agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/sources_tmp2
#agent.sources.SpoolDirSrc.spoolDir =
/app/home/eventsvc/source/processed_emails/
agent.sources.SpoolDirSrc.spoolDir =
/app/home/eventsvc/source/processed_emails2/
agent.sources.SpoolDirSrc.basenameHeader = true
agent.sources.SpoolDirSrc.selector.type = replicating
#agent.sources.SpoolDirSrc.batchSize = 100000
agent.sources.SpoolDirSrc.fileHeader = true
#agent.sources.src1.fileSuffix = .COMPLETED
agent.sources.SpoolDirSrc.deserializer =
org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
# Use a channel that buffers events in file
#
agent.channels.Channel1.type = file
agent.channels.Channel2.type = file
agent.channels.Channel1.capacity = 5000
agent.channels.Channel2.capacity = 5000
agent.channels.Channel1.transactionCapacity = 5000
agent.channels.Channel2.transactionCapacity = 5000
agent.channels.Channel1.checkpointDir =
/app/home/flume/.flume/file-channel/checkpoint1
agent.channels.Channel2.checkpointDir =
/app/home/flume/.flume/file-channel/checkpoint2
agent.channels.Channel1.dataDirs = /app/home/flume/.flume/file-channel/data1
agent.channels.Channel2.dataDirs = /app/home/flume/.flume/file-channel/data2
#agent.channels.Channel.transactionCapacity = 10000
# Configure Solr Sink
agent.sinks.SolrSink.type =
org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
agent.sinks.SolrSink.batchsize = 10
agent.sinks.SolrSink.batchDurationMillis = 10
agent.sinks.SolrSink.channel = Channel1
agent.sinks.SolrSink.morphlineId = morphline1
agent.sinks.SolrSink.tika.config = tikaConfig.xml
#agent.sinks.SolrSink.fileType = DataStream
#agent.sinks.SolrSink.hdfs.batchsize = 5
agent.sinks.SolrSink.rollCount = 0
agent.sinks.SolrSink.rollInterval = 0
#agent.sinks.SolrSink.rollsize = 100000000
agent.sinks.SolrSink.idleTimeout = 0
#agent.sinks.SolrSink.txnEventMax = 5000
# Configure HDFS Sink
agent.sinks.HDFSsink.channel = Channel2
agent.sinks.HDFSsink.type = hdfs
#agent.sinks.HDFSsink.hdfs.path =
hdfs://codehdplak-po-r10p.sys.comcast.net:8020/user/solr/emails
agent.sinks.HDFSsink.hdfs.path = hdfs://codehann/user/solr/emails
#agent.sinks.HDFSsink.hdfs.fileType = DataStream
agent.sinks.HDFSsink.hdfs.fileType = CompressedStream
agent.sinks.HDFSsink.hdfs.batchsize = 1000
agent.sinks.HDFSsink.hdfs.rollCount = 0
agent.sinks.HDFSsink.hdfs.rollInterval = 0
agent.sinks.HDFSsink.hdfs.rollsize = 10485760
agent.sinks.HDFSsink.hdfs.idleTimeout = 0
agent.sinks.HDFSsink.hdfs.maxOpenFiles = 1
agent.sinks.HDFSsink.hdfs.filePrefix = %{basename}
agent.sinks.HDFSsink.hdfs.codeC = gzip
agent.sources.SpoolDirSrc.channels = Channel1 Channel2
agent.sinks.SolrSink.channel = Channel1
agent.sinks.HDFSsink.channel = Channel2
Morhphine Code :
solrLocator: {
collection : esearch
#zkHost : "127.0.0.1:9983"
#zkHost :
"codesolr-as-r1p.sys.comcast.net:2181,codesolr-as-r2p.sys.comcast.net:2182"
#zkHost : "codesolr-as-r2p:2181"
zkHost :
"codesolr-wc-r1p.sys.comcast.net:2181,codesolr-wc-r2p.sys.comcast.net:2181,codesolr-wc-r3p.sys.comcast.net:2181"
}
morphlines :
[
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands :
[
{ detectMimeType { includeDefaultMimeTypes : true } }
{
solrCell {
solrLocator : ${solrLocator}
captureAttr : true
lowernames : true
capture : [_attachment_body, _attachment_mimetype, basename, content,
content_encoding, content_type, file, meta,text]
parsers : [ # { parser : org.apache.tika.parser.txt.TXTParser }
# { parser : org.apache.tika.parser.AutoDetectParser }
#{ parser : org.apache.tika.parser.asm.ClassParser }
#{ parser : org.gagravarr.tika.FlacParser }
#{ parser :
org.apache.tika.parser.executable.ExecutableParser }
#{ parser : org.apache.tika.parser.font.TrueTypeParser }
#{ parser : org.apache.tika.parser.xml.XMLParser }
#{ parser : org.apache.tika.parser.html.HtmlParser }
#{ parser : org.apache.tika.parser.image.TiffParser }
# { parser : org.apache.tika.parser.mail.RFC822Parser }
#{ parser : org.apache.tika.parser.mbox.MboxParser,
additionalSupportedMimeTypes : [message/x-emlx] }
#{ parser : org.apache.tika.parser.microsoft.OfficeParser
}
#{ parser : org.apache.tika.parser.hdf.HDFParser }
#{ parser : org.apache.tika.parser.odf.OpenDocumentParser
}
#{ parser : org.apache.tika.parser.pdf.PDFParser }
#{ parser : org.apache.tika.parser.rtf.RTFParser }
{ parser : org.apache.tika.parser.txt.TXTParser }
#{ parser : org.apache.tika.parser.chm.ChmParser }
]
fmap : { content : text }
}
}
{ generateUUID { field : id } }
{ sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }
{ logDebug { format : "output record: {}", args : ["@{}"] } }
{ loadSolr: { solrLocator : ${solrLocator} } }
]
}
]
I am not sure How I can get the flume metrics.
Thank you for looking into it
Regards,
~Sri
From: iain wright [mailto:[email protected]]
Sent: Wednesday, July 26, 2017 2:37 PM
To: [email protected]
Subject: Re: Flume consumes all memory - { OutOfMemoryError: GC overhead limit
exceeded }
Hi Sri,
Are you using a memory channel? What source/sink?
Can you please paste/link your obfuscated config
What does the metrics endpoint say in terms of channel size, sinkdrainsuccess
etc, for the period leading up to the OOM?
Best,
Iain
Sent from my iPhone
On Jul 26, 2017, at 8:00 AM, Anantharaman, Srinatha (Contractor)
<[email protected]<mailto:[email protected]>>
wrote:
Hi All,
Though I have mentioned the -Xms and -Xmx values Flume is consuming all memory
and failing at the end
I have tried adding above parameters in command line as below
a. /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f
/etc/flume/conf/flumeSolr.conf -n agent -Dproperty="-Xms1024m -Xmx4048m"
b. /usr/hdp/current/flume-server/bin/flume-ng agent -c /etc/flume/conf -f
/etc/flume/conf/flumeSolr.conf -n agent -Xms1024m -Xmx4048m
And also using flume-env.sh file as below
export JAVA_OPTS="-Xms2048m -Xmx4048m -Dcom.sun.management.jmxremote
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC"
I am using HDP 2.5 and flume 1.5.2.2.5
Kindly let me know how to resolve this issue
Regards,
~Sri