In fact looking at your error the timeout looks like the hdfs.callTimeout, so 
that's where I'd focus. Is your HDFS cluster particularily unperformant? 10s to 
respond to a call is pretty slow.


--
Chris Horrocks


On Wed, Jul 20, 2016 at 9:25 am, Chris Horrocks <'chris@hor.rocks'> wrote:

You could look at tuning either hdfs.idleTimeout, hdfs.callTimeout, or 
hdfs.retryInterval which can all be found at: 
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink


--
Chris Horrocks


On Wed, Jul 20, 2016 at 9:01 am, no jihun <'jees...@gmail.com'> wrote:

@chirs If you meant hdfs.callTimeout
Now I am doing a test on that.

I can increase the value.
When timeout occur while close, It will never retried? ( as logs above )

2016-07-20 16:50 GMT+09:00 Chris Horrocks <chris@hor.rocks>:

Have you tried increasing the HDFS sink timeouts?


--
Chris Horrocks



On Wed, Jul 20, 2016 at 8:03 am, no jihun <'jees...@gmail.com'> wrote:
Hi.



I found some files on hdfs left as OPEN_FOR_WRITE state.


This is flume's log about the file.




01  18 7 2016 16:12:02,765 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.open:234)  02  - Creating 
1468825922758.avro.tmp
03   18 7 2016 16:22:39,812 INFO [hdfs-hdfs2-roll-timer-0] 
(org.apache.flume.sink.hdfs.BucketWriter$5.call:429)  04   - Closing idle 
bucketWriter 1468825922758.avro.tmp at 1468826559812
05   18 7 2016 16:22:39,812 INFO [hdfs-hdfs2-roll-timer-0] 
(org.apache.flume.sink.hdfs.BucketWriter.close:363)  06   - Closing 
1468825922758.avro.tmp
07   18 7 2016 16:22:49,813 WARN [hdfs-hdfs2-roll-timer-0] 
(org.apache.flume.sink.hdfs.BucketWriter.close:370)  08   - failed to close() 
HDFSWriter for file (1468825922758.avro.tmp). Exception follows.  09  
java.io.IOException: Callable timed out after 10000 ms on file: 
1468825922758.avro.tmp
10   18 7 2016 16:22:49,816 INFO [hdfs-hdfs2-call-runner-7] 
(org.apache.flume.sink.hdfs.BucketWriter$8.call:629)   11    - Renaming 
1468825922758.avro.tmp to 1468825922758.avro


- seems close never retried
- flume just renamed which still opened.




2 day later I've found that file by this command


hdfs fsck /data/flume -openforwrite | grep "OPENFORWRITE" | grep "2016/07/18" | 
sed 's//data/flume// /data/flume//g' | grep -v ".avro.tmp" | sed -n 
's/.*(/data/flume/.*avro).*/ /p'






So, reverseLease-ed


hdfs debug recoverLease -path 1468825922758.avro -retries 3
recoverLease returned false.
Retrying in 5000 ms...
Retry #1
recoverLease SUCCEEDED on 1468825922758.avro


My hdfs sink configuration


hadoop2.sinks.hdfs2.type = hdfs
hadoop2.sinks.hdfs2.channel = fileCh1
hadoop2.sinks.hdfs2.hdfs.fileType = DataStream
hadoop2.sinks.hdfs2.serializer = ....
hadoop2.sinks.hdfs2.serializer.compressionCodec = snappy
hadoop2.sinks.hdfs2.hdfs.filePrefix = %{type}_%Y-%m-%d_%{host}
hadoop2.sinks.hdfs2.hdfs.fileSuffix = .avro
hadoop2.sinks.hdfs2.hdfs.rollInterval = 3700
#hadoop2.sinks.hdfs2.hdfs.rollSize = 67000000
hadoop2.sinks.hdfs2.hdfs.rollSize = 800000000
hadoop2.sinks.hdfs2.hdfs.rollCount = 0
hadoop2.sinks.hdfs2.hdfs.batchSize = 10000
hadoop2.sinks.hdfs2.hdfs.idleTimeout = 300

hdfs.closeTries, retryInterval both not set.


My question is
why '1468825922758.avro' left OPEN_FOR_WRITE? even though renamed to .avro 
succesufully.


Is this expected behavior? so , what should I do to eliminate these anomal 
OPENFORWRITE files?


Regards,
Jihun.



--

----------------------------------------------
Jihun No ( 노지훈 )
----------------------------------------------
Twitter : @nozisim
Facebook : nozisim
Website : [http://jeesim2.godohosting.com](http://jeesim2.godohosting.com/)
---------------------------------------------------------------------------------
Market Apps : [android market 
products.](https://market.android.com/developer?pub=%EB%85%B8%EC%A7%80%ED%9B%88)

Reply via email to