In fact looking at your error the timeout looks like the hdfs.callTimeout, so that's where I'd focus. Is your HDFS cluster particularily unperformant? 10s to respond to a call is pretty slow.
-- Chris Horrocks On Wed, Jul 20, 2016 at 9:25 am, Chris Horrocks <'chris@hor.rocks'> wrote: You could look at tuning either hdfs.idleTimeout, hdfs.callTimeout, or hdfs.retryInterval which can all be found at: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink -- Chris Horrocks On Wed, Jul 20, 2016 at 9:01 am, no jihun <'jees...@gmail.com'> wrote: @chirs If you meant hdfs.callTimeout Now I am doing a test on that. I can increase the value. When timeout occur while close, It will never retried? ( as logs above ) 2016-07-20 16:50 GMT+09:00 Chris Horrocks <chris@hor.rocks>: Have you tried increasing the HDFS sink timeouts? -- Chris Horrocks On Wed, Jul 20, 2016 at 8:03 am, no jihun <'jees...@gmail.com'> wrote: Hi. I found some files on hdfs left as OPEN_FOR_WRITE state. This is flume's log about the file. 01 18 7 2016 16:12:02,765 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:234) 02 - Creating 1468825922758.avro.tmp 03 18 7 2016 16:22:39,812 INFO [hdfs-hdfs2-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$5.call:429) 04 - Closing idle bucketWriter 1468825922758.avro.tmp at 1468826559812 05 18 7 2016 16:22:39,812 INFO [hdfs-hdfs2-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:363) 06 - Closing 1468825922758.avro.tmp 07 18 7 2016 16:22:49,813 WARN [hdfs-hdfs2-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:370) 08 - failed to close() HDFSWriter for file (1468825922758.avro.tmp). Exception follows. 09 java.io.IOException: Callable timed out after 10000 ms on file: 1468825922758.avro.tmp 10 18 7 2016 16:22:49,816 INFO [hdfs-hdfs2-call-runner-7] (org.apache.flume.sink.hdfs.BucketWriter$8.call:629) 11 - Renaming 1468825922758.avro.tmp to 1468825922758.avro - seems close never retried - flume just renamed which still opened. 2 day later I've found that file by this command hdfs fsck /data/flume -openforwrite | grep "OPENFORWRITE" | grep "2016/07/18" | sed 's//data/flume// /data/flume//g' | grep -v ".avro.tmp" | sed -n 's/.*(/data/flume/.*avro).*/ /p' So, reverseLease-ed hdfs debug recoverLease -path 1468825922758.avro -retries 3 recoverLease returned false. Retrying in 5000 ms... Retry #1 recoverLease SUCCEEDED on 1468825922758.avro My hdfs sink configuration hadoop2.sinks.hdfs2.type = hdfs hadoop2.sinks.hdfs2.channel = fileCh1 hadoop2.sinks.hdfs2.hdfs.fileType = DataStream hadoop2.sinks.hdfs2.serializer = .... hadoop2.sinks.hdfs2.serializer.compressionCodec = snappy hadoop2.sinks.hdfs2.hdfs.filePrefix = %{type}_%Y-%m-%d_%{host} hadoop2.sinks.hdfs2.hdfs.fileSuffix = .avro hadoop2.sinks.hdfs2.hdfs.rollInterval = 3700 #hadoop2.sinks.hdfs2.hdfs.rollSize = 67000000 hadoop2.sinks.hdfs2.hdfs.rollSize = 800000000 hadoop2.sinks.hdfs2.hdfs.rollCount = 0 hadoop2.sinks.hdfs2.hdfs.batchSize = 10000 hadoop2.sinks.hdfs2.hdfs.idleTimeout = 300 hdfs.closeTries, retryInterval both not set. My question is why '1468825922758.avro' left OPEN_FOR_WRITE? even though renamed to .avro succesufully. Is this expected behavior? so , what should I do to eliminate these anomal OPENFORWRITE files? Regards, Jihun. -- ---------------------------------------------- Jihun No ( 노지훈 ) ---------------------------------------------- Twitter : @nozisim Facebook : nozisim Website : [http://jeesim2.godohosting.com](http://jeesim2.godohosting.com/) --------------------------------------------------------------------------------- Market Apps : [android market products.](https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88)