> > In fact looking at your error the timeout looks like the hdfs.callTimeout, > so that's where I'd focus. Is your HDFS cluster particularily unperformant? > 10s to respond to a call is pretty slow.
you are right. At that time hdfs disks fully utiliized by Map/Reduce jobs. I expected that even flume failed to close file one time , a while later, when disk under utilized , then close retry processed by flume and file closed succefully. 2016-07-20 17:36 GMT+09:00 no jihun <jees...@gmail.com>: > I know about idleTimeout. rollingSize, rollingCount ( which about roll > over writing file). > > I didn't set callTimeout, so the default 10s will be applied. > also closeTries, retryInterval haven't set too. > > So, I think even close failed one time, close retries will be retried > after 180s(default retryInterval) > But as you can see at the logs above, close retry never happen. > > am I wrong? > > 2016-07-20 17:25 GMT+09:00 Chris Horrocks <chris@hor.rocks>: > >> You could look at tuning either hdfs.idleTimeout, hdfs.callTimeout, or >> hdfs.retryInterval which can all be found at: >> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink >> >> -- >> Chris Horrocks >> >> >> On Wed, Jul 20, 2016 at 9:01 am, no jihun <'jees...@gmail.com'> wrote: >> >> @chirs If you meant hdfs.callTimeout >> Now I am doing a test on that. >> >> I can increase the value. >> When timeout occur while close, It will never retried? ( as logs above ) >> >> 2016-07-20 16:50 GMT+09:00 Chris Horrocks <chris@hor.rocks>: >> >>> Have you tried increasing the HDFS sink timeouts? >>> >>> -- >>> Chris Horrocks >>> >>> >>> On Wed, Jul 20, 2016 at 8:03 am, no jihun <'jees...@gmail.com'> wrote: >>> >>> Hi. >>> >>> I found some files on hdfs left as OPEN_FOR_WRITE state. >>> >>> *This is flume's log about the file.* >>> >>> >>> 01 18 7 2016 16:12:02,765 INFO >>>> [SinkRunner-PollingRunner-DefaultSinkProcessor] >>>> (org.apache.flume.sink.hdfs.BucketWriter.open:234) >>> >>> 02 - Creating 1468825922758.avro.tmp >>> >>> >>>> 03 18 7 2016 16:22:39,812 INFO [hdfs-hdfs2-roll-timer-0] >>>> (org.apache.flume.sink.hdfs.BucketWriter$5.call:429) >>> >>> 04 - Closing idle bucketWriter 1468825922758.avro.tmp at 1468826559812 >>> >>> >>>> 05 18 7 2016 16:22:39,812 INFO [hdfs-hdfs2-roll-timer-0] >>>> (org.apache.flume.sink.hdfs.BucketWriter.close:363) >>> >>> 06 - Closing 1468825922758.avro.tmp >>> >>> >>>> 07 18 7 2016 16:22:49,813 WARN [hdfs-hdfs2-roll-timer-0] >>>> (org.apache.flume.sink.hdfs.BucketWriter.close:370) >>> >>> 08 - failed to close() HDFSWriter for file (1468825922758.avro.tmp). >>>> Exception follows. >>> >>> 09 java.io.IOException: Callable timed out after 10000 ms on file: >>>> 1468825922758.avro.tmp >>> >>> >>>> 10 18 7 2016 16:22:49,816 INFO [hdfs-hdfs2-call-runner-7] >>>> (org.apache.flume.sink.hdfs.BucketWriter$8.call:629) >>> >>> 11 - Renaming 1468825922758.avro.tmp to 1468825922758.avro >>> >>> >>> - seems close never retried >>> - flume just renamed which still opened. >>> >>> >>> *2 day later I've found that file by this command* >>> >>> hdfs fsck /data/flume -openforwrite | grep "OPENFORWRITE" | grep >>>> "2016/07/18" | sed 's//data/flume// /data/flume//g' | grep -v ".avro.tmp" | >>>> sed -n 's/.*(/data/flume/.*avro).*/ /p' >>> >>> >>> >>> *So, reverseLease-ed* >>> >>> hdfs debug recoverLease -path 1468825922758.avro -retries 3 >>>> recoverLease returned false. >>>> Retrying in 5000 ms... >>>> Retry #1 >>>> recoverLease SUCCEEDED on 1468825922758.avro >>> >>> >>> >>> *My hdfs sink configuration* >>> >>> hadoop2.sinks.hdfs2.type = hdfs >>>> hadoop2.sinks.hdfs2.channel = fileCh1 >>>> hadoop2.sinks.hdfs2.hdfs.fileType = DataStream >>>> hadoop2.sinks.hdfs2.serializer = .... >>>> hadoop2.sinks.hdfs2.serializer.compressionCodec = snappy >>>> hadoop2.sinks.hdfs2.hdfs.filePrefix = %{type}_%Y-%m-%d_%{host} >>>> hadoop2.sinks.hdfs2.hdfs.fileSuffix = .avro >>>> hadoop2.sinks.hdfs2.hdfs.rollInterval = 3700 >>>> #hadoop2.sinks.hdfs2.hdfs.rollSize = 67000000 >>>> hadoop2.sinks.hdfs2.hdfs.rollSize = 800000000 >>>> hadoop2.sinks.hdfs2.hdfs.rollCount = 0 >>>> hadoop2.sinks.hdfs2.hdfs.batchSize = 10000 >>>> hadoop2.sinks.hdfs2.hdfs.idleTimeout = 300 >>> >>> >>> hdfs.closeTries, retryInterval both not set. >>> >>> >>> *My question is * >>> why '1468825922758.avro' left OPEN_FOR_WRITE? even though renamed to >>> .avro succesufully. >>> Is this expected behavior? so , what should I do to eliminate these >>> anomal OPENFORWRITE files? >>> >>> Regards, >>> Jihun. >>> >>> >> >> >> -- >> ---------------------------------------------- >> Jihun No ( 노지훈 ) >> ---------------------------------------------- >> Twitter : @nozisim >> Facebook : nozisim >> Website : http://jeesim2.godohosting.com >> >> --------------------------------------------------------------------------------- >> Market Apps : android market products. >> <https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88> >> >> > > > -- > ---------------------------------------------- > Jihun No ( 노지훈 ) > ---------------------------------------------- > Twitter : @nozisim > Facebook : nozisim > Website : http://jeesim2.godohosting.com > > --------------------------------------------------------------------------------- > Market Apps : android market products. > <https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88> > -- ---------------------------------------------- Jihun No ( 노지훈 ) ---------------------------------------------- Twitter : @nozisim Facebook : nozisim Website : http://jeesim2.godohosting.com --------------------------------------------------------------------------------- Market Apps : android market products. <https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88>