Hi Bijoy,
Because of short callTimeout, where the HDFS cluster does not complete
the call in the time for which the HDFS sink in Flume waits for the call
to complete.
So Flume retries the entire transaction, and events that were written as
part of the previous failed transaction, are again written to HDFS as
part of the retried transaction.
Increase the timeout value, as I use 150000 in production env.
Thanks
Anand.
On 08/06/2015 02:38 PM, Bijoy Deb wrote:
Hi,
I have a Flume process that transfers multiple files (around 10 files
of 400GB each) per day from a specific source directory to an HDFS
sink.I am facing an issue when there is an HDFS IO error while Flume
is in the process of copying the files from source to sink.The issue
is that Flume is copying the same file twice to sink,with 2 different
timestamps,resulting in duplication of data in my downstream
processes,which is not what I want.
Can anyone kindly let me know if this is a known issue with Flume,and
if yes,is there any workaround to this?
Relevant details:
Flume version: 1.3.1
1. Source/Spool dir File location: /test/part1/2015072110_layer2_1.gz
2. HDFS sink/destination: hdfs:///staging/test/
3. Files dumped in sink by Flume:
/staging/test/2015072110_layer2_1.1437634754144.gz
/staging/test/2015072110_layer2_1.1437634754145.gz
4. Flume agent logs:
(SpoolingFileReader.java:170)] File is
processed....************************/test/part1/2015072110_layer2_1.gz
2015-07-23 02:59:09,392 (pool-14-thread-1) [INFO -
com.flume.spool.zip.SpoolingFileReader.retireCurrentFile(SpoolingFileReader.java:270)]
Preparing to move file /test/part1/2015072110_layer2_1.gz to
/test/part1/2015072110_layer2_1.gz.COMPLETED
2015-07-23 02:59:09,395 (pool-14-thread-1) [INFO -
com.flume.spool.zip.SpoolingFileReader.readEvents(SpoolingFileReader.java:176)]
flag was set as true
2015-07-23 02:59:14,808 (hdfs-c1s1-call-runner-8) [INFO -
org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:208)]
Creating /staging/test/2015072110_layer2_1.1437634754144.gz.tmp
2015-07-23 02:59:32,144
(SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:456)]
HDFS IO error
java.io.IOException: Callable timed out after 18000 ms
at
org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:352)
at
org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:727)
at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:430)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:853)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:212)
at
org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:345)
... 5 more
2015-07-23 02:59:37,269 (hdfs-c1s1-call-runner-9) [INFO -
org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:427)]
Renaming /staging/test/2015072110_layer2_1.1437634754144.gz.tmp to
/staging/test/2015072110_layer2_1.1437634754144.gz
2015-07-23 02:59:38,513 (hdfs-c1s1-call-runner-9) [INFO -
org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:208)]
Creating /staging/test/2015072110_layer2_1.1437634754145.gz.tmp
2015-07-23 02:59:56,333 (hdfs-c1s1-roll-timer-0) [INFO -
org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:322)]
Closing idle bucketWriter /staging/test/2015072110_layer2_1
2015-07-23 02:59:56,340 (hdfs-c1s1-roll-timer-0) [INFO -
org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:427)]
Renaming /staging/test/2015072110_layer2_1.1437634754145.gz.tmp to
/staging/test/2015072110_layer2_1.1437634754145.gz
Thanks,
Bijoy