Re: File Channel Exception "Failed to obtain lock for writing to the log.Try increasing the log write timeout value"

Hari Shreedharan Fri, 28 Feb 2014 11:40:28 -0800

It is currently in trunk, so it will be in flume 1.5


Thanks,
Hari


On Friday, February 28, 2014 at 11:30 AM, Mangtani, Kushal wrote:

> Hari,
>   
> Thanks for the feedback.This was really helpful. I am going to use 
> provisioned IO for a while to make sure the exception does not comes back.
>   
> Also, from the comments section of the Jira ticket given below, I noticed 
> that you were able to identify the reason of the exception perhaps old logs 
> are never deleted. Are you guys going to put a patch to in flume 1.5 so that 
> this exception is resolved?
>   
> -Kushal mangtani
>   
> From: Hari Shreedharan [mailto:[email protected]]  
> Sent: Thursday, February 27, 2014 11:19 AM
> To: [email protected] (mailto:[email protected])
> Subject: Re: File Channel Exception "Failed to obtain lock for writing to the 
> log.Try increasing the log write timeout value"  
>   
> See https://issues.apache.org/jira/browse/FLUME-2307 
> (https://urldefense.proofpoint.com/v1/url?u=https://issues.apache.org/jira/browse/FLUME-2307&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=Ige9%2FQENXuGqSGiXpuvHakVLuIySu7e10oNaj%2FGB%2B0I%3D%0A&m=PM9%2FMPLJ2TJ%2Fh%2BBMW%2BqpQ1UrxcZbZNPwx5%2FdhkJpEaw%3D%0A&s=91453e467ee8ed73fb29bace503614ae8091d624bdba0f77dedaf43b18e46c41)
>   
>  
>   
>  
> This jira removed the write-timeout, but that only makes sure that there is 
> no transaction in limbo. The real reason like I said is slow IO. Try using 
> provisioned IO for better throughput.
>  
>   
>  
>   
> Thanks,
>  
> Hari
>  
>   
>  
>  
> On Thursday, February 27, 2014 at 10:48 AM, Mangtani, Kushal wrote:
> >  
> > Hari,
> >  
> >  
> >   
> >  
> >  
> > Thanks for the prompt reply. The current file channel’s  write-timeout = 30 
> > sec .EBS drive current  capacity = 200 GB . The rate of writes is 60 
> > events/min; where each event is approx. 40 KB.
> >  
> >  
> >   
> >  
> >  
> > I am thinking of increase file channel write-timeout to 60 sec. What do you 
> > suggest?
> >  
> >  
> > Also,one strange thing I noticed all the flume-collectors  also get the 
> > same exception.However, all have a separate ebs drive. Any inputs?
> >  
> >  
> >   
> >  
> >  
> > Thanks,
> >  
> >  
> > Kushal Mangtani
> >  
> >  
> >   
> >  
> >  
> > From: Hari Shreedharan [mailto:[email protected]]  
> > Sent: Thursday, February 27, 2014 10:35 AM
> > To: [email protected] (mailto:[email protected])
> > Subject: Re: File Channel Exception "Failed to obtain lock for writing to 
> > the log.Try increasing the log write timeout value"
> >  
> >  
> >   
> >  
> >  
> > For now, increase the file channel’s write-timeout parameter to around 30 
> > or so (basically file channel is timing out while writing to disk). But the 
> > basic problem you are seeing is that your EBS instance is very slow and IO 
> > is taking too long. You either need to increase your EBS IO capacity, or 
> > reduce the rate or writes.
> >  
> >  
> >  
> >   
> >  
> >  
> >  
> >   
> >  
> >  
> > Thanks,
> >  
> >  
> >  
> > Hari
> >  
> >  
> >  
> >   
> >  
> >  
> >  
> > On Thursday, February 27, 2014 at 10:28 AM, Mangtani, Kushal wrote:
> > >  
> > >   
> > >  
> > >  
> > >   
> > >  
> > >  
> > > From: Mangtani, Kushal  
> > > Sent: Wednesday, February 26, 2014 4:51 PM
> > > To: '[email protected] (mailto:[email protected])'; 
> > > '[email protected] (mailto:[email protected])'
> > > Cc: Rangnekar, Rohit; '[email protected] 
> > > (mailto:[email protected])'
> > > Subject: File Channel Exception "Failed to obtain lock for writing to the 
> > > log.Try increasing the log write timeout value"
> > >  
> > >  
> > >  
> > >  
> > >   
> > >  
> > >  
> > > Hi,
> > >  
> > >  
> > >   
> > >  
> > >  
> > > I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs.
> > >  
> > >  
> > > I am running a 2 tier(agent,collector) Flume Configuration with custom 
> > > plugins. There are approximately 20 agents (receiving data) and 6 
> > > collector flume (writing to HDFS) machines all running independenly. 
> > > However, I have been facing some File Channel Exceptions on the collector 
> > > side. The agent appears to be working fine.
> > >  
> > >  
> > >   
> > >  
> > >  
> > >  
> > >  Error  stacktrace:
> > >  
> > >  
> > >                              org.apache.flume.ChannelException: Failed to 
> > > obtain lock for writing to the log. Try increasing the log write timeout 
> > > value. [channel=c2]
> > >  
> > >  
> > >                              at 
> > > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> > >  
> > >  
> > >                              at 
> > > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> > >  
> > >  
> > >                              at 
> > > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421)
> > >  
> > >  
> > >                              at 
> > > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> > >  
> > >  
> > >                              at 
> > > org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> > >  
> > >  
> > >                              …..
> > >  
> > >  
> > >                              And I keep on getting the same error
> > >  
> > >  
> > >   
> > >  
> > >  
> > >                              P.S :This same exception is repated in most 
> > > of the flume collector machines.But, not at the same duration. There is 
> > > usually a difference of a couple of hours or more.
> > >  
> > >  
> > >   
> > >  
> > >  
> > >  
> > >  
> > > 1.  HDFS sinks are written in  the Amazon EC2 cloud instance.
> > >  
> > >  
> > >  
> > > 2. datadir and checkpoint dir of file channel in all flume collector 
> > > instances are mounted to a separate hadoop ebs drive .This makes sure 
> > > that two separate collectors do not overlap their log and checkpoint dir. 
> > > There is a symbolic link i.e /usr/lib/flume-ng/datasource à 
> > > /hadoop/ebs/mnt-1
> > >  
> > >  
> > >  
> > > 3. The Flume works fine for a couple of days and all the agent,collector 
> > > are initialized properly without exceptions.
> > >  
> > >  
> > >  
> > >   
> > >  
> > >  
> > > Questions:
> > >  
> > >  
> > > Exception “Failed to obtain lock for writing to the log. Try increasing 
> > > the log write timeout value . [channel=c2]” . According to the 
> > > documentation, such an exception occurs only if two processes are 
> > > acceesing the same file/directory. However, each channel is configured 
> > > separately so No two channels should access the same dir. Hence, this 
> > > exception does not indicates anything. Please correct me, if im wrong.  
> > >  
> > >  
> > > Also, HDFS.CallTimeout – indicates calling HDFS for open,write 
> > > operations. If no response within a duration, it timeouts. And , if its 
> > > timeouts; it closes the File. Please correct me, if im wrong.  Also, if 
> > > there is a way to specify the number of retries before it closes the file?
> > >  
> > >  
> > >   
> > >  
> > >  
> > > Your inputs/suggestions will be thoroughly appreciated.  
> > >  
> > >  
> > >   
> > >  
> > >  
> > >   
> > >  
> > >  
> > > Regards
> > >  
> > >  
> > > Kushal Mangtani
> > >  
> > >  
> > > Software Engineer
> > >  
> > >  
> > >   
> > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> >  
> >  
> >   
> >  
> >  
> >  
> >  
> >  
>  
>   
>  
>  
>  
>

Re: File Channel Exception "Failed to obtain lock for writing to the log.Try increasing the log write timeout value"

Reply via email to