Sorry forgot to mention , We are using Hadoop 1.2.0
On 26 November 2013 11:07, Snehal Nagmote <[email protected]> wrote: > Hello All, > > We are using HDFS sink with Flume and it goes into HDFS IO Exception very > often . > > I am using apache Flume HDP 1.4.0. we have two tier topology and Collector > is not on datanode ,Collector fails often and it > throws java.io.IOException: DFSOutputStream is closed > > java.io.IOException: DFSOutputStream is closed > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4097) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4084) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:117) > at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356) > at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353) > at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536) > at > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160) > at > org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56) > at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > This is how configuration looks like > > > agent.sinks.hdfs-sink.type = hdfs > agent.sinks.hdfs-sink.hdfs.filePrefix = %Y%m%d%H-events-1 > agent.sinks.hdfs-sink.hdfs.path = hdfs:// > bi-hdnn01.sjc.kixeye.com:8020/flume/logs/%Y%m%d/%H/ > agent.sinks.hdfs-sink.hdfs.fileSuffix = .done > agent.sinks.hdfs-sink.hdfs.fileType =DataStream > agent.sinks.hdfs-sink.hdfs.writeFormat = Text > agent.sinks.hdfs-sink.hdfs.rollInterval = 0 > agent.sinks.hdfs-sink.hdfs.rollSize = 0 > agent.sinks.hdfs-sink.hdfs.rollCount = 0 > agent.sinks.hdfs-sink.hdfs.batchSize = 10000 > agent.sinks.hdfs-sink.hdfs.threadsPoolSize=10000 > agent.sinks.hdfs-sink.hdfs.rollTimerPoolSize=10 > agent.sinks.hdfs-sink.hdfs.callTimeout = 500000 > > > Earlier , I was using rollInterval=30 , I changed it to 0 because of above > exception and then I started seeing new exception. > > Failed to renew lease for [DFSClient_NONMAPREDUCE_1307546979_31] for 30 > seconds. Will retry shortly ... > java.io.IOException: Call to bi-hdnn01.sjc.kixeye.com/10.54.208.14:8020failed > on local exception: java.io.IOException: > > Caused by: java.io.IOException: Connection reset by peer > > > Because of these exception , our production downstream process gets lot > slower and need frequent restarts and upstream process fills channels , > Does anyone know , what could be the cause and how we can avoid this ? > > Any thoughts would be really helpful , its been extremely difficult to > debug this > > > Thanks, > Snehal > > > > > >
