It is currently in trunk, so it will be in flume 1.5
Thanks, Hari On Friday, February 28, 2014 at 11:30 AM, Mangtani, Kushal wrote: > Hari, > > Thanks for the feedback.This was really helpful. I am going to use > provisioned IO for a while to make sure the exception does not comes back. > > Also, from the comments section of the Jira ticket given below, I noticed > that you were able to identify the reason of the exception perhaps old logs > are never deleted. Are you guys going to put a patch to in flume 1.5 so that > this exception is resolved? > > -Kushal mangtani > > From: Hari Shreedharan [mailto:[email protected]] > Sent: Thursday, February 27, 2014 11:19 AM > To: [email protected] (mailto:[email protected]) > Subject: Re: File Channel Exception "Failed to obtain lock for writing to the > log.Try increasing the log write timeout value" > > See https://issues.apache.org/jira/browse/FLUME-2307 > (https://urldefense.proofpoint.com/v1/url?u=https://issues.apache.org/jira/browse/FLUME-2307&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=Ige9%2FQENXuGqSGiXpuvHakVLuIySu7e10oNaj%2FGB%2B0I%3D%0A&m=PM9%2FMPLJ2TJ%2Fh%2BBMW%2BqpQ1UrxcZbZNPwx5%2FdhkJpEaw%3D%0A&s=91453e467ee8ed73fb29bace503614ae8091d624bdba0f77dedaf43b18e46c41) > > > > > This jira removed the write-timeout, but that only makes sure that there is > no transaction in limbo. The real reason like I said is slow IO. Try using > provisioned IO for better throughput. > > > > > Thanks, > > Hari > > > > > On Thursday, February 27, 2014 at 10:48 AM, Mangtani, Kushal wrote: > > > > Hari, > > > > > > > > > > > > Thanks for the prompt reply. The current file channel’s write-timeout = 30 > > sec .EBS drive current capacity = 200 GB . The rate of writes is 60 > > events/min; where each event is approx. 40 KB. > > > > > > > > > > > > I am thinking of increase file channel write-timeout to 60 sec. What do you > > suggest? > > > > > > Also,one strange thing I noticed all the flume-collectors also get the > > same exception.However, all have a separate ebs drive. Any inputs? > > > > > > > > > > > > Thanks, > > > > > > Kushal Mangtani > > > > > > > > > > > > From: Hari Shreedharan [mailto:[email protected]] > > Sent: Thursday, February 27, 2014 10:35 AM > > To: [email protected] (mailto:[email protected]) > > Subject: Re: File Channel Exception "Failed to obtain lock for writing to > > the log.Try increasing the log write timeout value" > > > > > > > > > > > > For now, increase the file channel’s write-timeout parameter to around 30 > > or so (basically file channel is timing out while writing to disk). But the > > basic problem you are seeing is that your EBS instance is very slow and IO > > is taking too long. You either need to increase your EBS IO capacity, or > > reduce the rate or writes. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Hari > > > > > > > > > > > > > > > > On Thursday, February 27, 2014 at 10:28 AM, Mangtani, Kushal wrote: > > > > > > > > > > > > > > > > > > > > > > > > From: Mangtani, Kushal > > > Sent: Wednesday, February 26, 2014 4:51 PM > > > To: '[email protected] (mailto:[email protected])'; > > > '[email protected] (mailto:[email protected])' > > > Cc: Rangnekar, Rohit; '[email protected] > > > (mailto:[email protected])' > > > Subject: File Channel Exception "Failed to obtain lock for writing to the > > > log.Try increasing the log write timeout value" > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs. > > > > > > > > > I am running a 2 tier(agent,collector) Flume Configuration with custom > > > plugins. There are approximately 20 agents (receiving data) and 6 > > > collector flume (writing to HDFS) machines all running independenly. > > > However, I have been facing some File Channel Exceptions on the collector > > > side. The agent appears to be working fine. > > > > > > > > > > > > > > > > > > > > > Error stacktrace: > > > > > > > > > org.apache.flume.ChannelException: Failed to > > > obtain lock for writing to the log. Try increasing the log write timeout > > > value. [channel=c2] > > > > > > > > > at > > > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621) > > > > > > > > > at > > > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168) > > > > > > > > > at > > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421) > > > > > > > > > at > > > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > > > > > > > > > at > > > org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > > > > > > > > > ….. > > > > > > > > > And I keep on getting the same error > > > > > > > > > > > > > > > > > > P.S :This same exception is repated in most > > > of the flume collector machines.But, not at the same duration. There is > > > usually a difference of a couple of hours or more. > > > > > > > > > > > > > > > > > > > > > > > > 1. HDFS sinks are written in the Amazon EC2 cloud instance. > > > > > > > > > > > > 2. datadir and checkpoint dir of file channel in all flume collector > > > instances are mounted to a separate hadoop ebs drive .This makes sure > > > that two separate collectors do not overlap their log and checkpoint dir. > > > There is a symbolic link i.e /usr/lib/flume-ng/datasource à > > > /hadoop/ebs/mnt-1 > > > > > > > > > > > > 3. The Flume works fine for a couple of days and all the agent,collector > > > are initialized properly without exceptions. > > > > > > > > > > > > > > > > > > > > > Questions: > > > > > > > > > Exception “Failed to obtain lock for writing to the log. Try increasing > > > the log write timeout value . [channel=c2]” . According to the > > > documentation, such an exception occurs only if two processes are > > > acceesing the same file/directory. However, each channel is configured > > > separately so No two channels should access the same dir. Hence, this > > > exception does not indicates anything. Please correct me, if im wrong. > > > > > > > > > Also, HDFS.CallTimeout – indicates calling HDFS for open,write > > > operations. If no response within a duration, it timeouts. And , if its > > > timeouts; it closes the File. Please correct me, if im wrong. Also, if > > > there is a way to specify the number of retries before it closes the file? > > > > > > > > > > > > > > > > > > Your inputs/suggestions will be thoroughly appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > Kushal Mangtani > > > > > > > > > Software Engineer > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
