See https://issues.apache.org/jira/browse/FLUME-2307  

This jira removed the write-timeout, but that only makes sure that there is no 
transaction in limbo. The real reason like I said is slow IO. Try using 
provisioned IO for better throughput.  


Thanks,
Hari


On Thursday, February 27, 2014 at 10:48 AM, Mangtani, Kushal wrote:

> Hari,
>   
> Thanks for the prompt reply. The current file channel’s  write-timeout = 30 
> sec .EBS drive current  capacity = 200 GB . The rate of writes is 60 
> events/min; where each event is approx. 40 KB.
>   
> I am thinking of increase file channel write-timeout to 60 sec. What do you 
> suggest?
> Also,one strange thing I noticed all the flume-collectors  also get the same 
> exception.However, all have a separate ebs drive. Any inputs?
>   
> Thanks,
> Kushal Mangtani
>   
> From: Hari Shreedharan [mailto:[email protected]]  
> Sent: Thursday, February 27, 2014 10:35 AM
> To: [email protected] (mailto:[email protected])
> Subject: Re: File Channel Exception "Failed to obtain lock for writing to the 
> log.Try increasing the log write timeout value"  
>   
> For now, increase the file channel’s write-timeout parameter to around 30 or 
> so (basically file channel is timing out while writing to disk). But the 
> basic problem you are seeing is that your EBS instance is very slow and IO is 
> taking too long. You either need to increase your EBS IO capacity, or reduce 
> the rate or writes.
>  
>   
>  
>   
> Thanks,
>  
> Hari
>  
>   
>  
>  
> On Thursday, February 27, 2014 at 10:28 AM, Mangtani, Kushal wrote:
> >  
> >   
> >  
> >  
> >   
> >  
> >  
> > From: Mangtani, Kushal  
> > Sent: Wednesday, February 26, 2014 4:51 PM
> > To: '[email protected] (mailto:[email protected])'; 
> > '[email protected] (mailto:[email protected])'
> > Cc: Rangnekar, Rohit; '[email protected] (mailto:[email protected])'
> > Subject: File Channel Exception "Failed to obtain lock for writing to the 
> > log.Try increasing the log write timeout value"
> >  
> >  
> >  
> >  
> >   
> >  
> >  
> > Hi,
> >  
> >  
> >   
> >  
> >  
> > I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs.
> >  
> >  
> > I am running a 2 tier(agent,collector) Flume Configuration with custom 
> > plugins. There are approximately 20 agents (receiving data) and 6 collector 
> > flume (writing to HDFS) machines all running independenly. However, I have 
> > been facing some File Channel Exceptions on the collector side. The agent 
> > appears to be working fine.
> >  
> >  
> >   
> >  
> >  
> >  
> >  Error  stacktrace:
> >  
> >  
> >                              org.apache.flume.ChannelException: Failed to 
> > obtain lock for writing to the log. Try increasing the log write timeout 
> > value. [channel=c2]
> >  
> >  
> >                              at 
> > org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> >  
> >  
> >                              at 
> > org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> >  
> >  
> >                              at 
> > org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421)
> >  
> >  
> >                              at 
> > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> >  
> >  
> >                              at 
> > org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> >  
> >  
> >                              …..
> >  
> >  
> >                              And I keep on getting the same error
> >  
> >  
> >   
> >  
> >  
> >                              P.S :This same exception is repated in most of 
> > the flume collector machines.But, not at the same duration. There is 
> > usually a difference of a couple of hours or more.
> >  
> >  
> >   
> >  
> >  
> >  
> >  
> > 1.  HDFS sinks are written in  the Amazon EC2 cloud instance.
> >  
> >  
> >  
> > 2. datadir and checkpoint dir of file channel in all flume collector 
> > instances are mounted to a separate hadoop ebs drive .This makes sure that 
> > two separate collectors do not overlap their log and checkpoint dir. There 
> > is a symbolic link i.e /usr/lib/flume-ng/datasource à /hadoop/ebs/mnt-1
> >  
> >  
> >  
> > 3. The Flume works fine for a couple of days and all the agent,collector 
> > are initialized properly without exceptions.
> >  
> >  
> >  
> >   
> >  
> >  
> > Questions:
> >  
> >  
> > Exception “Failed to obtain lock for writing to the log. Try increasing the 
> > log write timeout value . [channel=c2]” . According to the documentation, 
> > such an exception occurs only if two processes are acceesing the same 
> > file/directory. However, each channel is configured separately so No two 
> > channels should access the same dir. Hence, this exception does not 
> > indicates anything. Please correct me, if im wrong.  
> >  
> >  
> > Also, HDFS.CallTimeout – indicates calling HDFS for open,write operations. 
> > If no response within a duration, it timeouts. And , if its timeouts; it 
> > closes the File. Please correct me, if im wrong.  Also, if there is a way 
> > to specify the number of retries before it closes the file?
> >  
> >  
> >   
> >  
> >  
> > Your inputs/suggestions will be thoroughly appreciated.  
> >  
> >  
> >   
> >  
> >  
> >   
> >  
> >  
> > Regards
> >  
> >  
> > Kushal Mangtani
> >  
> >  
> > Software Engineer
> >  
> >  
> >   
> >  
> >  
> >  
> >  
> >  
> >  
>  
>   
>  
>  
>  
>  


Reply via email to