Hi, We have to read and parse the log files generated by our application server to look for events that need to be processed. There are multiple application servers generating this log file. For flume agent all log files are available under one directory using NFS mount. I am using TAILDIR source and couple of interceptors. I am using file based channel and custom Sink to process the events. We are expecting approximately 1M events in a day that need to be processed. I see few events missing (between 10 and 50) in a day (24 hour period) and they seem to happen in bunch (e.g. at some time say 9am 5-10 files will be missing).
To debug this issue we created file_roll sink and memory channel to log the events to check if this issue is related to source. I see the same events missing in file_roll sink as well. It seems like the issue may be in the TAILDIR source. How do I further debug this issue? Any help in this regard will be highly appreciated. BTW, We are using flume 1.8 version and I noticed there is missing events in TAILDIR source which was resolved in this release. BTW, I have also set the idletimeout to 600000 (as noted in TAILDIR jira issue). Please find below flume source and channel configuration. Appreciate your help and support to root cause this issue. Please feel free to ask for more information. Thanks in advance. Thanks, Ganesh # Describe/configure the source stats-agent.sources.r1.type = TAILDIR stats-agent.sources.r1.positionFile = /tmp/stats_taildir_position.json stats-agent.sources.r1.filegroups = f1 stats-agent.sources.r1.filegroups.f1 = /tmp/smf-.*prod-.*_uploads\.log stats-agent.sources.r1.idleTimeout = 600000 stats-agent.sources.r1.interceptors = i1 i2 stats-agent.sources.r1.interceptors.i1.type = regex_filter stats-agent.sources.r1.interceptors.i1.regex = contentType=stats-contents stats-agent.sources.r1.interceptors.i1.excludeEvents = false stats-agent.sources.r1.interceptors.i2.type = search_replace stats-agent.sources.r1.interceptors.i2.searchPattern = ^.*savedPath= stats-agent.sources.r1.interceptors.i2.replaceString = # File based channel for custom sink. stats-agent.channels.c1.type = file stats-agent.channels.c1.checkpointDir=/tmp/checkpoint stats-agent.channels.c1.dataDirs=/tmp/data # Memory based channel to log events using file_roll sink. stats-agent.channels.c2.type = memory stats-agent.channels.c2.capacity = 1000 stats-agent.channels.c2.transactionCapacity = 100 # Event Logging sink stats-agent.sinks.k3.type = file_roll stats-agent.sinks.k3.sink.directory = /tmp/flume/stats-contents stats-agent.sinks.k3.sink.rollInterval = 0 stats-agent.sinks.k3.sink.batchSize = 10 stats-agent.sinks.k3.sink.pathManager.extension = log stats-agent.sinks.k3.sink.pathManager.prefix = stats-contents- This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.