Hello All, We are using HDFS sink with Flume and it goes into HDFS IO Exception very often .
I am using apache Flume HDP 1.4.0. we have two tier topology and Collector is not on datanode ,Collector fails often and it throws java.io.IOException: DFSOutputStream is closed java.io.IOException: DFSOutputStream is closed at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4097) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4084) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:117) at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356) at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353) at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536) at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160) at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56) at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This is how configuration looks like agent.sinks.hdfs-sink.type = hdfs agent.sinks.hdfs-sink.hdfs.filePrefix = %Y%m%d%H-events-1 agent.sinks.hdfs-sink.hdfs.path = hdfs:// bi-hdnn01.sjc.kixeye.com:8020/flume/logs/%Y%m%d/%H/ agent.sinks.hdfs-sink.hdfs.fileSuffix = .done agent.sinks.hdfs-sink.hdfs.fileType =DataStream agent.sinks.hdfs-sink.hdfs.writeFormat = Text agent.sinks.hdfs-sink.hdfs.rollInterval = 0 agent.sinks.hdfs-sink.hdfs.rollSize = 0 agent.sinks.hdfs-sink.hdfs.rollCount = 0 agent.sinks.hdfs-sink.hdfs.batchSize = 10000 agent.sinks.hdfs-sink.hdfs.threadsPoolSize=10000 agent.sinks.hdfs-sink.hdfs.rollTimerPoolSize=10 agent.sinks.hdfs-sink.hdfs.callTimeout = 500000 Earlier , I was using rollInterval=30 , I changed it to 0 because of above exception and then I started seeing new exception. Failed to renew lease for [DFSClient_NONMAPREDUCE_1307546979_31] for 30 seconds. Will retry shortly ... java.io.IOException: Call to bi-hdnn01.sjc.kixeye.com/10.54.208.14:8020failed on local exception: java.io.IOException: Caused by: java.io.IOException: Connection reset by peer Because of these exception , our production downstream process gets lot slower and need frequent restarts and upstream process fills channels , Does anyone know , what could be the cause and how we can avoid this ? Any thoughts would be really helpful , its been extremely difficult to debug this Thanks, Snehal
