I am running into the following problem.
30 Oct 2014 18:43:26,375 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:463) - HDFS IO error
java.io.IOException: Callable timed out after 10000 ms on file:
hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
at
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:201)
at
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
... 6 more
30 Oct 2014 18:43:27,717 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.BucketWriter.open:261) - Creating
hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
30 Oct 2014 18:43:46,971 INFO [agent-shutdown-hook]
(org.apache.flume.lifecycle.LifecycleSupervisor.stop:79) - Stopping lifecycle
supervisor 10
The following is my configuration. The source is just a script running a curl
command and downloading files from S3.
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Configure the source: STACK_S3
a1.sources.r1.type = exec
a1.sources.r1.command = ./conf/FlumeAgent.1.sh
a1.sources.r1.channels = c1
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm
a1.sinks.k1.hdfs.filePrefix = dm-1-20
a1.sinks.k1.hdfs.fileSuffix = .ds
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.serializer = TEXT
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.batchSize = 10
I had the HDFS batch size at the default (100) but this issue was still
happening. Does anyone know what parameters I should change to make this error
go away?
No data is lost but I end up with a 0 byte file.
Thanks,
Ed