Hi,

We have to read and parse the log files generated by our application server to 
look for events that need to be processed. There are multiple application 
servers generating this log file. For flume agent all log files are available 
under one directory using NFS mount. I am using TAILDIR source and couple of 
interceptors. I am using file based channel and custom Sink to process the 
events. We are expecting approximately 1M events in a day that need to be 
processed. I see few events missing (between 10 and 50) in a day (24 hour 
period) and they seem to happen in bunch (e.g. at some time say 9am 5-10 files 
will be missing).

To debug this issue we created file_roll sink and memory channel to log the 
events to check if this issue is related to source. I see the same events 
missing in file_roll sink as well. It seems like the issue may be in the 
TAILDIR source. How do I further debug this issue? Any help in this regard will 
be highly appreciated.

BTW, We are using flume 1.8 version and I noticed there is missing events in 
TAILDIR source which was resolved in this release. BTW, I have also set the 
idletimeout to 600000 (as noted in TAILDIR jira issue).

Please find below flume source and channel configuration.

Appreciate your help and support to root cause this issue. Please feel free to 
ask for more information. Thanks in advance.

Thanks,
Ganesh

# Describe/configure the source
stats-agent.sources.r1.type = TAILDIR
stats-agent.sources.r1.positionFile = /tmp/stats_taildir_position.json
stats-agent.sources.r1.filegroups = f1
stats-agent.sources.r1.filegroups.f1 = /tmp/smf-.*prod-.*_uploads\.log
stats-agent.sources.r1.idleTimeout = 600000

stats-agent.sources.r1.interceptors = i1 i2
stats-agent.sources.r1.interceptors.i1.type = regex_filter
stats-agent.sources.r1.interceptors.i1.regex = contentType=stats-contents
stats-agent.sources.r1.interceptors.i1.excludeEvents = false

stats-agent.sources.r1.interceptors.i2.type = search_replace
stats-agent.sources.r1.interceptors.i2.searchPattern = ^.*savedPath=
stats-agent.sources.r1.interceptors.i2.replaceString =

# File based channel for custom sink.
stats-agent.channels.c1.type = file
stats-agent.channels.c1.checkpointDir=/tmp/checkpoint
stats-agent.channels.c1.dataDirs=/tmp/data

# Memory based channel to log events using file_roll sink.
stats-agent.channels.c2.type = memory
stats-agent.channels.c2.capacity = 1000
stats-agent.channels.c2.transactionCapacity = 100

# Event Logging sink
stats-agent.sinks.k3.type = file_roll
stats-agent.sinks.k3.sink.directory = /tmp/flume/stats-contents
stats-agent.sinks.k3.sink.rollInterval = 0
stats-agent.sinks.k3.sink.batchSize = 10
stats-agent.sinks.k3.sink.pathManager.extension = log
stats-agent.sinks.k3.sink.pathManager.prefix = stats-contents-

This email and any attachments thereto may contain private, confidential, 
and/or privileged material for the sole use of the intended recipient. Any 
review, copying, or distribution of this email (or any attachments thereto) by 
others is strictly prohibited. If you are not the intended recipient, please 
contact the sender immediately and permanently delete the original and any 
copies of this email and any attachments thereto.

Reply via email to