Hi Guys, So I disabled puppet on one of the boxes and yes there have been no replays.
So essentially to summarize - - Puppet was re-writing the flume config file [expand a template] every 30 mins. - This caused Flume to reload the configuration. - As a result the exec source was getting re-executed which caused the replay of events since the logFile had no changes. [From Flume's point of view these are still new events :)] *@Mike - Your suggestion totally makes sense and I'm going to try that now in a test environment. Do you recommend it for production use?* Sagar On Mon, Mar 4, 2013 at 7:10 PM, Mike Percy <mpe...@apache.org> wrote: > Sagar, > Just try "tail -F" on the same file over and over on the command line. It > will display the last few lines. > > If you want to avoid this, try "tail -F -n 0 filename" and you should not > see this. Every time you reload your configuration file, the specified > command is re-executed by the source. > > Regards, > Mike > > > > On Mon, Mar 4, 2013 at 4:13 PM, Hari Shreedharan < > hshreedha...@cloudera.com> wrote: > >> Flume will reload the configuration file every time it is modified. >> Since puppet rewrites it, Flume reloads it. The events are probably >> replayed because of the transactions being incomplete or something like >> that. File Channel will not replay the events if they have been completely >> persisted to HDFS and transaction closed. If pupper does not rewrite the >> config file, do you see this issue? >> >> -- >> Hari Shreedharan >> >> On Monday, March 4, 2013 at 3:06 PM, Sagar Mehta wrote: >> >> I think we found the issue, not sure if this is the root cause but looks >> highly correlated. >> >> So we manage configs using puppet which currently runs in a cron mode >> with following configuration >> >> ## puppetrun Cron Job >> 20,50 * * * * root sleep $((RANDOM\%60)) > /dev/null 2>&1; puppet agent >> --onetime --no-daemonize --logdest syslog > /dev/null 2>&1 >> >> *Note - the times at which puppet is run along with the time-stamps in >> the listing below.* >> >> Also after combing through flume logs, we noticed Flume is reloading the >> configuration after every puppet run >> >> sagar@drspock ~/temp $ cat flume.log.2013-03-03 | egrep -i "reloading" | >> head -5 >> 2013-03-03 00:20:44,174 [conf-file-poller-0] INFO >> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - >> Reloading configuration file:/opt/flume/conf/hdfs.conf >> 2013-03-03 00:51:14,374 [conf-file-poller-0] INFO >> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - >> Reloading configuration file:/opt/flume/conf/hdfs.conf >> 2013-03-03 01:21:15,072 [conf-file-poller-0] INFO >> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - >> Reloading configuration file:/opt/flume/conf/hdfs.conf >> 2013-03-03 01:51:15,778 [conf-file-poller-0] INFO >> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - >> Reloading configuration file:/opt/flume/conf/hdfs.conf >> 2013-03-03 02:20:46,481 [conf-file-poller-0] INFO >> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider - >> Reloading configuration file:/opt/flume/conf/hdfs.conf >> >> The way we have our current setup, the flume config file >> namely /opt/flume/conf/hdfs.conf is re-written after every puppet run due >> to variable interpolation in the template. >> >> *We are still not sure what is causing Flume to reload the config file, >> and even if the file is reloaded why are the same events getting replayed >> [the state should be saved somewhere on disk - thats what the file channel >> is for I thought]* >> >> Any pointers/insights appreciated. >> >> Sagar >> >> >> On Mon, Mar 4, 2013 at 2:42 PM, Sagar Mehta <sagarme...@gmail.com> wrote: >> >> Guys, >> >> Yes this issue was also seen in the memory channel. In fact when we moved >> to File based channel, we initially thought this issue won't occur since >> it stores check points. >> >> Anyways below are all files for collector110 [whose source didn't receive >> any events] and you can see all the replays below. I have attached the >> corresponding flume log file for the same day. >> >> hadoop@jobtracker301:/home/smehta$ hls >> /ngpipes-raw-logs/2013-03-03/*/collector110* | head -5 >> -rw-r--r-- 3 hadoop supergroup 1594 2013-03-03 00:20 >> /ngpipes-raw-logs/2013-03-03/0000/collector110.ngpipes.sac.ngmoco.com.1362270044367.gz >> -rw-r--r-- 3 hadoop supergroup 1594 2013-03-03 00:51 >> /ngpipes-raw-logs/2013-03-03/0000/collector110.ngpipes.sac.ngmoco.com.1362271875065.gz >> -rw-r--r-- 3 hadoop supergroup 1594 2013-03-03 01:21 >> /ngpipes-raw-logs/2013-03-03/0100/collector110.ngpipes.sac.ngmoco.com.1362273675770.gz >> -rw-r--r-- 3 hadoop supergroup 1594 2013-03-03 01:51 >> /ngpipes-raw-logs/2013-03-03/0100/collector110.ngpipes.sac.ngmoco.com.1362275476474.gz >> -rw-r--r-- 3 hadoop supergroup 1594 2013-03-03 02:20 >> /ngpipes-raw-logs/2013-03-03/0200/collector110.ngpipes.sac.ngmoco.com.1362277246704.gz >> >> Also in the attached flume log, you can see the replay stuff I'm talking >> about - Please note the source received no events during this time. >> >> sagar@drspock ~/temp $ cat flume.log.2013-03-03 | egrep -i "Queue Size >> after replay" | head >> 2013-03-03 00:20:44,355 [lifecycleSupervisor-1-3] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10 >> [channel=channel1] >> 2013-03-03 00:20:44,356 [lifecycleSupervisor-1-4] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10 >> [channel=channel2] >> 2013-03-03 00:51:14,571 [lifecycleSupervisor-1-7] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 0 >> [channel=channel2] >> 2013-03-03 00:51:14,577 [lifecycleSupervisor-1-1] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10 >> [channel=channel1] >> 2013-03-03 01:21:15,276 [lifecycleSupervisor-1-8] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 0 >> [channel=channel2] >> 2013-03-03 01:21:15,281 [lifecycleSupervisor-1-7] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10 >> [channel=channel1] >> 2013-03-03 01:51:15,979 [lifecycleSupervisor-1-9] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 0 >> [channel=channel2] >> 2013-03-03 01:51:15,985 [lifecycleSupervisor-1-5] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10 >> [channel=channel1] >> 2013-03-03 02:20:46,697 [lifecycleSupervisor-1-2] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10 >> [channel=channel1] >> 2013-03-03 02:20:46,697 [lifecycleSupervisor-1-8] INFO >> org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10 >> [channel=channel2] >> >> As for the contents of the file, yes they are exactly the same 10 lines >> of events replayed over and over - I checked that. >> >> Let me know if you guys have any insights into this or if this is a bug >> in Flume Ng. >> >> Sagar >> >> >> On Thu, Feb 28, 2013 at 2:59 PM, Hari Shreedharan < >> hshreedha...@cloudera.com> wrote: >> >> Can also send the flume agent logs? Did you check the contents of the >> files? >> >> -- >> Hari Shreedharan >> >> On Thursday, February 28, 2013 at 2:43 PM, Roshan Naik wrote: >> >> would you be able to you verify if the same problem can be reproduced by >> using the memory channel instead in a test setup ? >> >> >> On Wed, Feb 27, 2013 at 11:37 AM, Sagar Mehta <sagarme...@gmail.com>wrote: >> >> Hi Guys, >> >> I'm using Flume-Ng and it is working pretty well except for a weird >> situation which I observed lately. In essence I'm using an exec source for >> doing tail -F on a logfile and using two HDFS sinks with a File channel. >> >> However I have observed that when the source [ logfile of a jetty based >> collector] is idle - that is no new events are pushed to the logFile, >> FlumeNg seems to replay the same set of events. >> >> For example collector110 received no events for 2 subsequent hours and >> below are the corresponding Flume written files at the HDFS sink >> >> hadoop@jobtracker301:/home/hadoop/sagar$ hls >> /ngpipes-raw-logs/2013-02-27/1400/collector110* >> -rw-r--r-- 3 hadoop supergroup 441 2013-02-27 14:20 >> /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361974853210.gz >> -rw-r--r-- 3 hadoop supergroup 441 2013-02-27 14:50 >> /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361976653432.gz >> >> hadoop@jobtracker301:/home/hadoop/sagar$ hls >> /ngpipes-raw-logs/2013-02-27/1500/collector110* >> -rw-r--r-- 3 hadoop supergroup 441 2013-02-27 15:20 >> /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361978454123.gz >> -rw-r--r-- 3 hadoop supergroup 441 2013-02-27 15:50 >> /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361980254338.gz >> >> hadoop@jobtracker301:/home/hadoop/sagar$ md5sum * >> c7360ef5c8deaee3ce9f4c92e9d9be63 >> collector110.ngpipes.sac.ngmoco.com.1361974853210.gz >> c7360ef5c8deaee3ce9f4c92e9d9be63 >> collector110.ngpipes.sac.ngmoco.com.1361976653432.gz >> c7360ef5c8deaee3ce9f4c92e9d9be63 >> collector110.ngpipes.sac.ngmoco.com.1361978454123.gz >> c7360ef5c8deaee3ce9f4c92e9d9be63 >> collector110.ngpipes.sac.ngmoco.com.1361980254338.gz >> >> As you can see above the md5sums match. >> >> I'm using a File channel which has checkpoints, so I'm not sure what is >> going on. Btw looks like the difference in timestamps of the two replays is >> exactly 30 mins. >> >> *Is this a known bug or am I missing something?* >> * >> * >> *Below is my Flume config file* >> >> smehta@collector110:/opt/flume/conf$ cat hdfs.conf >> # An hdfs sink to write events to the hdfs on the test cluster >> # A memory based channel to connect the above source and sink >> >> # Name the components on this agent >> collector110.sources = source1 >> collector110.sinks = sink1 sink2 >> collector110.channels = channel1 channel2 >> >> # Configure the source >> collector110.sources.source1.type = exec >> collector110.sources.source1.command = tail -F /opt/jetty/logFile.log >> >> # Configure the interceptors >> collector110.sources.source1.interceptors = TimestampInterceptor >> HostInterceptor >> >> # We use the Timestamp interceptor to get timestamps of when flume >> receives events >> # This is used for figuring out the bucket to which an event goes >> collector110.sources.source1.interceptors.TimestampInterceptor.type = >> timestamp >> >> # We use the Host interceptor to populate the host header with the fully >> qualified domain name of the collector. >> # That way we know which file in the sink respresents which collector. >> collector110.sources.source1.interceptors.HostInterceptor.type = >> org.apache.flume.interceptor.HostInterceptor$Builder >> collector110.sources.source1.interceptors.HostInterceptor.preserveExisting >> = false >> collector110.sources.source1.interceptors.HostInterceptor.useIP = false >> collector110.sources.source1.interceptors.HostInterceptor.hostHeader = >> host >> >> >> # Configure the sink >> >> collector110.sinks.sink1.type = hdfs >> >> # Configure the bucketing >> collector110.sinks.sink1.hdfs.path=hdfs:// >> namenode3001.ngpipes.milp.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00 >> >> # Prefix the file with the source so that we know where the events in the >> file came from >> collector110.sinks.sink1.hdfs.filePrefix = %{host} >> >> # We roll the flume output file based on time interval - currently every >> 5 minutes >> collector110.sinks.sink1.hdfs.rollSize = 0 >> collector110.sinks.sink1.hdfs.rollCount = 0 >> collector110.sinks.sink1.hdfs.rollInterval = 300 >> >> #gzip compression related settings >> collector110.sinks.sink1.hdfs.codeC = gzip >> collector110.sinks.sink1.hdfs.fileType = CompressedStream >> collector110.sinks.sink1.hdfs.fileSuffix = .gz >> >> # Configure the sink >> >> collector110.sinks.sink2.type = hdfs >> >> # Configure the bucketing >> collector110.sinks.sink2.hdfs.path=hdfs:// >> namenode5001.ngpipes.sac.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00 >> >> # Prefix the file with the source so that we know where the events in the >> file came from >> collector110.sinks.sink2.hdfs.filePrefix = %{host} >> >> # We roll the flume output file based on time interval - currently every >> 5 minutes >> collector110.sinks.sink2.hdfs.rollSize = 0 >> collector110.sinks.sink2.hdfs.rollCount = 0 >> collector110.sinks.sink2.hdfs.rollInterval = 300 >> >> #gzip compression related settings >> collector110.sinks.sink2.hdfs.codeC = gzip >> collector110.sinks.sink2.hdfs.fileType = CompressedStream >> collector110.sinks.sink2.hdfs.fileSuffix = .gz >> >> # Configure the channel that connects the source to the sink >> >> # Use a channel which buffers events in filesystem >> collector110.channels.channel1.type = file >> collector110.channels.channel1.checkpointDir = >> /data/flume_data/channel1/checkpoint >> collector110.channels.channel1.dataDirs = /data/flume_data/channel1/data >> >> # Use a channel which buffers events in filesystem >> collector110.channels.channel2.type = file >> collector110.channels.channel2.checkpointDir = >> /data/flume_data/channel2/checkpoint >> collector110.channels.channel2.dataDirs = /data/flume_data/channel2/data >> >> # Bind the source and sink to the channel configured above >> collector110.sources.source1.channels = channel1 channel2 >> collector110.sinks.sink1.channel = channel1 >> collector110.sinks.sink2.channel = channel2 >> >> Sagar >> >> >> >> >> >> >> >> >