Re: HDFS IO error

Asim Zafir Thu, 30 Oct 2014 12:49:40 -0700

Please check if ur sinks i.e. hdfs data nodes that were receiving the
writes are not having any bad blocks . Secondly I think you should also set
hdfs roll interval or size to a higher value.  The reason this problem
happens is because flume sink is not able to right to a data pipeline that
was initially presented by hdfs. The solution in this case should be for
hdfs to  initialize a new pipeline and present to flume. The hack currently
Is to restart the flume process which then initializes a new hdfs pipeline
enabling the sink to push backlogged events. There is a fix to this
incorporated In flume 1.5 (i havent test it yet) but if u are on anything
older the only way to make this work is restart the flume process
On Oct 30, 2014 11:54 AM, "Ed Judge" <[email protected]> wrote:


> I am running into the following problem.
>
> 30 Oct 2014 18:43:26,375 WARN
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
> java.io.IOException: Callable timed out after 10000 ms on file:
> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
> at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
> at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:201)
> at
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
> ... 6 more
> 30 Oct 2014 18:43:27,717 INFO
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating
> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook]
> (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping
> lifecycle supervisor 10
>
>
> The following is my configuration.  The source is just a script running a
> curl command and downloading files from S3.
>
>
> # Name the components on this agent
> a1.sources = r1
> a1.sinks = k1
> a1.channels = c1
>
> # Configure the source: STACK_S3
> a1.sources.r1.type = exec
> a1.sources.r1.command = ./conf/FlumeAgent.1.sh
> a1.sources.r1.channels = c1
>
> # Use a channel which buffers events in memory
> a1.channels.c1.type = memory
> a1.channels.c1.capacity = 1000000
> a1.channels.c1.transactionCapacity = 100
>
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm
> a1.sinks.k1.hdfs.filePrefix = dm-1-20
> a1.sinks.k1.hdfs.fileSuffix = .ds
> a1.sinks.k1.hdfs.rollInterval = 0
> a1.sinks.k1.hdfs.rollSize = 0
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.fileType = DataStream
> a1.sinks.k1.serializer = TEXT
> a1.sinks.k1.channel = c1
> a1.sinks.k1.hdfs.minBlockReplicas = 1
> a1.sinks.k1.hdfs.batchSize = 10
>
>
> I had the HDFS batch size at the default (100) but this issue was still
> happening.  Does anyone know what parameters I should change to make this
> error go away?
> No data is lost but I end up with a 0 byte file.
>
> Thanks,
> Ed
>
>

Re: HDFS IO error

Reply via email to