What version of Flume are you using?
On Tue, Aug 12, 2014 at 1:51 PM, Mangtani, Kushal < [email protected]> wrote: > Bumping this up; to make sure someone answers this. > > P.S: let me know if i need to post these questions on a seperate thread. > > Thanks, > Kushal Mangtani > > ------------------------------ > *From:* Mangtani, Kushal > *Sent:* Friday, August 08, 2014 12:39 PM > *To:* [email protected] > *Subject:* RE: File Channel Exception "Failed to obtain lock for writing > to the log.Try increasing the log write timeout value" > > Hello FlumeTeam, > > I have recently seen a bug/weird behaviour in File Channel. I am using > FileChannel in my prod env; so save me from hickups in my prod. Recently, I > got my file Channel Full. > So; the only ways of fixing this was: > > 1. restart the flume process. > 2. twaek the transactionCapacity of fileChannel. > > i went with 1) .However, after doing so; my flume ps was stuck and the > logs were: > > 08 Aug 2014 19:03:54,014 INFO [lifecycleSupervisor-1-4] > (org.apache.flume.channel.file.LogFile$SequentialReader.next:597) - File > position exceeds the threshold: 1623195647, position: 1623195649 > > 08 Aug 2014 19:03:54,015 INFO [lifecycleSupervisor-1-4] > (org.apache.flume.channel.file.LogFile$SequentialReader.next:608) - > Encountered EOF at 1623195649 in > /usr/lib/flume-ng/datastore/channel1/logs/log-5802 > > > Looks like for some reason FilePointer was at a position > than the > FileSize. Ultimately; I had to delete the logs,checkpoint,backup-checkpoint > for my flume process to process events. > > Sp; the whole purpose of FileChannel i.e better durability vs average > performance was defeated here. > > > Questions: > > > 1. Is there something I can have done to preserve this data Loss ? > 2. Also; I believ Flume-ng is push -pull mechanism; where source > pushes events to channels and sinks pulls events from channels which is > contradictory to flume-og (push only mechanism). Correct me if im wrong? > Was there a reason for this push-pull architecture in flume-land ? > > Thanks, > Kushal Mangtani > > ------------------------------ > *From:* Hari Shreedharan [[email protected]] > *Sent:* Friday, February 28, 2014 11:38 AM > *To:* [email protected] > *Subject:* Re: File Channel Exception "Failed to obtain lock for writing > to the log.Try increasing the log write timeout value" > > It is currently in trunk, so it will be in flume 1.5 > > > Thanks, > Hari > > On Friday, February 28, 2014 at 11:30 AM, Mangtani, Kushal wrote: > > Hari, > > > > Thanks for the feedback.This was really helpful. I am going to use > provisioned IO for a while to make sure the exception does not comes back. > > > > Also, from the comments section of the Jira ticket given below, I noticed > that you were able to identify the reason of the exception perhaps old logs > are never deleted. Are you guys going to put a patch to in flume 1.5 so > that this exception is resolved? > > > > -Kushal mangtani > > > > *From:* Hari Shreedharan [mailto:[email protected] > <[email protected]>] > *Sent:* Thursday, February 27, 2014 11:19 AM > *To:* [email protected] > *Subject:* Re: File Channel Exception "Failed to obtain lock for writing > to the log.Try increasing the log write timeout value" > > > > See https://issues.apache.org/jira/browse/FLUME-2307 > <https://urldefense.proofpoint.com/v1/url?u=https://issues.apache.org/jira/browse/FLUME-2307&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=Ige9%2FQENXuGqSGiXpuvHakVLuIySu7e10oNaj%2FGB%2B0I%3D%0A&m=PM9%2FMPLJ2TJ%2Fh%2BBMW%2BqpQ1UrxcZbZNPwx5%2FdhkJpEaw%3D%0A&s=91453e467ee8ed73fb29bace503614ae8091d624bdba0f77dedaf43b18e46c41> > > > > This jira removed the write-timeout, but that only makes sure that there > is no transaction in limbo. The real reason like I said is slow IO. Try > using provisioned IO for better throughput. > > > > > > Thanks, > > Hari > > > > On Thursday, February 27, 2014 at 10:48 AM, Mangtani, Kushal wrote: > > Hari, > > > > Thanks for the prompt reply. The current file channel’s write-timeout = > 30 sec .EBS drive current capacity = 200 GB . The rate of writes is 60 > events/min; where each event is approx. 40 KB. > > > > I am thinking of increase file channel write-timeout to 60 sec. What do > you suggest? > > Also,one strange thing I noticed all the flume-collectors also get the > same exception.However, all have a separate ebs drive. Any inputs? > > > > Thanks, > > Kushal Mangtani > > > > *From:* Hari Shreedharan [mailto:[email protected] > <[email protected]>] > *Sent:* Thursday, February 27, 2014 10:35 AM > *To:* [email protected] > *Subject:* Re: File Channel Exception "Failed to obtain lock for writing > to the log.Try increasing the log write timeout value" > > > > For now, increase the file channel’s write-timeout parameter to around 30 > or so (basically file channel is timing out while writing to disk). But the > basic problem you are seeing is that your EBS instance is very slow and IO > is taking too long. You either need to increase your EBS IO capacity, or > reduce the rate or writes. > > > > > > Thanks, > > Hari > > > > On Thursday, February 27, 2014 at 10:28 AM, Mangtani, Kushal wrote: > > > > > > *From:* Mangtani, Kushal > *Sent:* Wednesday, February 26, 2014 4:51 PM > *To:* '[email protected]'; '[email protected]' > *Cc:* Rangnekar, Rohit; '[email protected]' > *Subject:* File Channel Exception "Failed to obtain lock for writing to > the log.Try increasing the log write timeout value" > > > > Hi, > > > > I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs. > > I am running a 2 tier(agent,collector) Flume Configuration with custom > plugins. There are approximately 20 agents (receiving data) and 6 collector > flume (writing to HDFS) machines all running independenly. However, I have > been facing some File Channel Exceptions on the collector side. The agent > appears to be working fine. > > > > Error stacktrace: > > org.apache.flume.ChannelException: Failed to > obtain lock for writing to the log. Try increasing the log write timeout > value. [channel=c2] > > at > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > > at > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > > at > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421) > > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > > at > org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > > ….. > > And I keep on getting the same error > > > > P.S :This same exception is repated in most > of the flume collector machines.But, not at the same duration. There is > usually a difference of a couple of hours or more. > > > > 1. HDFS sinks are written in the Amazon EC2 cloud instance. > > 2. datadir and checkpoint dir of file channel in all flume collector > instances are mounted to a separate hadoop ebs drive .This makes sure that > two separate collectors do not overlap their log and checkpoint dir. There > is a symbolic link i.e /usr/lib/flume-ng/datasource à /hadoop/ebs/mnt-1 > > 3. The Flume works fine for a couple of days and all the agent,collector > are initialized properly without exceptions. > > > > Questions: > > Exception “Failed to obtain lock for writing to the log. Try increasing > the log write timeout value . [channel=c2]” . According to the > documentation, such an exception occurs only if two processes are acceesing > the same file/directory. However, each channel is configured separately so > No two channels should access the same dir. Hence, this exception does not > indicates anything. Please correct me, if im wrong. > > Also, HDFS.CallTimeout – indicates calling HDFS for open,write operations. > If no response within a duration, it timeouts. And , if its timeouts; it > closes the File. Please correct me, if im wrong. Also, if there is a way > to specify the number of retries before it closes the file? > > > > Your inputs/suggestions will be thoroughly appreciated. > > > > > > Regards > > Kushal Mangtani > > Software Engineer > > > > > > > > >
