Re: HDFS IO error

Ed Judge Mon, 03 Nov 2014 17:42:03 -0800

How do I check for bad blocks?  I am now seeing this quite regularly?  I have a 
unique Hadoop setup in that I have 1 local datanode.  In addition I am running 
the flume instance within a Docker container.
I have looked at the hadoop logs and don’t see anything but INFO messages.  
What could be taking more than 10 seconds?


Thanks,
Ed

On Oct 30, 2014, at 9:14 PM, Ed Judge <[email protected]> wrote:

> I have been using 1.5 all along. I end up with a 0 length file which is a 
> little concerning. Not to mention that the timeout is adding 10 seconds to 
> the overall transfer. Is this normal or is there something I can do to 
> prevent the timeout?
> 
> Thanks,
> Ed. 
> 
> Sent from my iPhone
> 
> 
> On Oct 30, 2014, at 5:58 PM, Asim Zafir <[email protected]> wrote:
> 
>> Ed, 
>> 
>> Are you saying you resolved the problem with 1.5.0 or you still have an 
>> issue?
>> 
>> Thanks, 
>> 
>> Asim Zafir.
>> 
>> On Thu, Oct 30, 2014 at 1:47 PM, Ed Judge <[email protected]> wrote:
>> Thanks for the replies.  We are using 1.5.0.
>> My observation is that Flume retries automatically (without my intervention) 
>> and that no data is lost.  
>> The impact is a) a delay of 10 seconds due to the timeout and b) a zero 
>> length file.
>> 
>> -Ed
>> 
>> On Oct 30, 2014, at 3:46 PM, Asim Zafir <[email protected]> wrote:
>> 
>>> Please check if ur sinks i.e. hdfs data nodes that were receiving the 
>>> writes are not having any bad blocks . Secondly I think you should also set 
>>> hdfs roll interval or size to a higher value.  The reason this problem 
>>> happens is because flume sink is not able to right to a data pipeline that 
>>> was initially presented by hdfs. The solution in this case should be for 
>>> hdfs to  initialize a new pipeline and present to flume. The hack currently 
>>> Is to restart the flume process which then initializes a new hdfs pipeline 
>>> enabling the sink to push backlogged events. There is a fix to this 
>>> incorporated In flume 1.5 (i havent test it yet) but if u are on anything 
>>> older the only way to make this work is restart the flume process
>>> 
>>> On Oct 30, 2014 11:54 AM, "Ed Judge" <[email protected]> wrote:
>>> I am running into the following problem.
>>> 
>>> 30 Oct 2014 18:43:26,375 WARN  
>>> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:463)  - HDFS IO error
>>> java.io.IOException: Callable timed out after 10000 ms on file: 
>>> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596209.ds.tmp
>>>     at 
>>> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:732)
>>>     at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:262)
>>>     at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:554)
>>>     at 
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:426)
>>>     at 
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>     at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>     at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.util.concurrent.TimeoutException
>>>     at java.util.concurrent.FutureTask.get(FutureTask.java:201)
>>>     at 
>>> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:725)
>>>     ... 6 more
>>> 30 Oct 2014 18:43:27,717 INFO  
>>> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
>>> (org.apache.flume.sink.hdfs.BucketWriter.open:261)  - Creating 
>>> hdfs://localhost:9000/tmp/dm/dm-1-19.1414694596210.ds.tmp
>>> 30 Oct 2014 18:43:46,971 INFO  [agent-shutdown-hook] 
>>> (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79)  - Stopping 
>>> lifecycle supervisor 10
>>> 
>>> 
>>> The following is my configuration.  The source is just a script running a 
>>> curl command and downloading files from S3.
>>> 
>>> 
>>> # Name the components on this agent
>>> a1.sources = r1
>>> a1.sinks = k1
>>> a1.channels = c1
>>> 
>>> # Configure the source: STACK_S3
>>> a1.sources.r1.type = exec
>>> a1.sources.r1.command = ./conf/FlumeAgent.1.sh 
>>> a1.sources.r1.channels = c1
>>> 
>>> # Use a channel which buffers events in memory
>>> a1.channels.c1.type = memory
>>> a1.channels.c1.capacity = 1000000
>>> a1.channels.c1.transactionCapacity = 100
>>> 
>>> # Describe the sink
>>> a1.sinks.k1.type = hdfs
>>> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/tmp/dm 
>>> a1.sinks.k1.hdfs.filePrefix = dm-1-20 
>>> a1.sinks.k1.hdfs.fileSuffix = .ds
>>> a1.sinks.k1.hdfs.rollInterval = 0
>>> a1.sinks.k1.hdfs.rollSize = 0
>>> a1.sinks.k1.hdfs.rollCount = 0
>>> a1.sinks.k1.hdfs.fileType = DataStream
>>> a1.sinks.k1.serializer = TEXT
>>> a1.sinks.k1.channel = c1
>>> a1.sinks.k1.hdfs.minBlockReplicas = 1
>>> a1.sinks.k1.hdfs.batchSize = 10
>>> 
>>> 
>>> I had the HDFS batch size at the default (100) but this issue was still 
>>> happening.  Does anyone know what parameters I should change to make this 
>>> error go away?
>>> No data is lost but I end up with a 0 byte file.
>>> 
>>> Thanks,
>>> Ed
>>> 
>> 
>>

Re: HDFS IO error

Reply via email to