Please check if ur sinks i.e. hdfs data nodes that were receiving the writes are not having any bad blocks . Secondly I think you should also set hdfs roll interval or size to a higher value. The reason this problem happens is because flume sink is not able to right to a data pipeline that was initially presented by hdfs. The solution in this case should be for hdfs to initialize a new pipeline and present to flume. The hack currently Is to restart the flume process which then initializes a new hdfs pipeline enabling the sink to push backlogged events. There is a fix to this incorporated In flume 1.5 (i havent test it yet) but if u are on anything older the only way to make this work is restart the flume process On Oct 30, 2014 11:54 AM, "Ed Judge" <[email protected]> wrote:
> I am running into the following problem. > > 30 Oct 2014 18:43:26,375 WARN > [SinkRunner-PollingRunner-DefaultSinkProcessor] > (org.apache.flume.sink.hdfs.HDFSEventSink.process:463) - HDFS IO error > java.io.IOException: Callable timed out after 10000 ms on file: > hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp > at > org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732) > at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262) > at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554) > at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:201) > at > org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725) > ... 6 more > 30 Oct 2014 18:43:27,717 INFO > [SinkRunner-PollingRunner-DefaultSinkProcessor] > (org.apache.flume.sink.hdfs.BucketWriter.open:261) - Creating > hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp > 30 Oct 2014 18:43:46,971 INFO [agent-shutdown-hook] > (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79) - Stopping > lifecycle supervisor 10 > > > The following is my configuration. The source is just a script running a > curl command and downloading files from S3. > > > # Name the components on this agent > a1.sources = r1 > a1.sinks = k1 > a1.channels = c1 > > # Configure the source: STACK_S3 > a1.sources.r1.type = exec > a1.sources.r1.command = ./conf/FlumeAgent.1.sh > a1.sources.r1.channels = c1 > > # Use a channel which buffers events in memory > a1.channels.c1.type = memory > a1.channels.c1.capacity = 1000000 > a1.channels.c1.transactionCapacity = 100 > > # Describe the sink > a1.sinks.k1.type = hdfs > a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm > a1.sinks.k1.hdfs.filePrefix = dm-1-20 > a1.sinks.k1.hdfs.fileSuffix = .ds > a1.sinks.k1.hdfs.rollInterval = 0 > a1.sinks.k1.hdfs.rollSize = 0 > a1.sinks.k1.hdfs.rollCount = 0 > a1.sinks.k1.hdfs.fileType = DataStream > a1.sinks.k1.serializer = TEXT > a1.sinks.k1.channel = c1 > a1.sinks.k1.hdfs.minBlockReplicas = 1 > a1.sinks.k1.hdfs.batchSize = 10 > > > I had the HDFS batch size at the default (100) but this issue was still > happening. Does anyone know what parameters I should change to make this > error go away? > No data is lost but I end up with a 0 byte file. > > Thanks, > Ed > >
