Re: Flume issue: Copying the same source file multiple times with different timestamps in case of HDFS IO error

Anandkumar Lakshmanan Thu, 06 Aug 2015 02:39:07 -0700


Hi Bijoy,

Because of short callTimeout, where the HDFS cluster does not completethe call in the time for which the HDFS sink in Flume waits for the callto complete.So Flume retries the entire transaction, and events that were written aspart of the previous failed transaction, are again written to HDFS aspart of the retried transaction.

Increase the timeout value, as I use 150000 in production env.


Thanks
Anand.

On 08/06/2015 02:38 PM, Bijoy Deb wrote:

Hi,
I have a Flume process that transfers multiple files (around 10 filesof 400GB each) per day from a specific source directory to an HDFSsink.I am facing an issue when there is an HDFS IO error while Flumeis in the process of copying the files from source to sink.The issueis that Flume is copying the same file twice to sink,with 2 differenttimestamps,resulting in duplication of data in my downstreamprocesses,which is not what I want.
Can anyone kindly let me know if this is a known issue with Flume,andif yes,is there any workaround to this?
Relevant details:
Flume version: 1.3.1
1. Source/Spool dir File location: /test/part1/2015072110_layer2_1.gz

2. HDFS sink/destination: hdfs:///staging/test/

3. Files dumped in sink by Flume:
/staging/test/2015072110_layer2_1.1437634754144.gz
/staging/test/2015072110_layer2_1.1437634754145.gz

4. Flume agent logs:
(SpoolingFileReader.java:170)] File isprocessed....************************/test/part1/2015072110_layer2_1.gz2015-07-23 02:59:09,392 (pool-14-thread-1) [INFO -com.flume.spool.zip.SpoolingFileReader.retireCurrentFile(SpoolingFileReader.java:270)]Preparing to move file /test/part1/2015072110_layer2_1.gz to/test/part1/2015072110_layer2_1.gz.COMPLETED2015-07-23 02:59:09,395 (pool-14-thread-1) [INFO -com.flume.spool.zip.SpoolingFileReader.readEvents(SpoolingFileReader.java:176)]flag was set as true2015-07-23 02:59:14,808 (hdfs-c1s1-call-runner-8) [INFO -org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:208)]Creating /staging/test/2015072110_layer2_1.1437634754144.gz.tmp2015-07-23 02:59:32,144(SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:456)]HDFS IO error
java.io.IOException: Callable timed out after 18000 ms
atorg.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:352)atorg.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:727)atorg.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:430)atorg.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:853)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.FutureTask.get(FutureTask.java:212)
atorg.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:345)
    ... 5 more
2015-07-23 02:59:37,269 (hdfs-c1s1-call-runner-9) [INFO -org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:427)]Renaming /staging/test/2015072110_layer2_1.1437634754144.gz.tmp to/staging/test/2015072110_layer2_1.1437634754144.gz2015-07-23 02:59:38,513 (hdfs-c1s1-call-runner-9) [INFO -org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:208)]Creating /staging/test/2015072110_layer2_1.1437634754145.gz.tmp2015-07-23 02:59:56,333 (hdfs-c1s1-roll-timer-0) [INFO -org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:322)]Closing idle bucketWriter /staging/test/2015072110_layer2_12015-07-23 02:59:56,340 (hdfs-c1s1-roll-timer-0) [INFO -org.apache.flume.sink.hdfs.BucketWriter.renameBucket(BucketWriter.java:427)]Renaming /staging/test/2015072110_layer2_1.1437634754145.gz.tmp to/staging/test/2015072110_layer2_1.1437634754145.gz
Thanks,
Bijoy

Re: Flume issue: Copying the same source file multiple times with different timestamps in case of HDFS IO error

Reply via email to