This is due to the handling of HFile.Reader being wrapped in a try-finally
block. However, there is no check as to whether the reader operation
encounters any exception which should determine what to do next.

Please file a JIRA.

Thanks Adam.

On Thu, Jun 23, 2011 at 4:40 PM, Adam Phelps <[email protected]> wrote:

> (As a note, this is with CDH3u0 which is based on HBase 0.90.1)
>
> We've been seeing intermittent failures of calls to LoadIncrementalHFiles.
>  When this happens the node that made the call will see a
> FileNotFoundException such as this:
>
> 2011-06-23 15:47:34.379566500 java.net.**SocketTimeoutException: Call to
> s8.XXX/67.215.90.38:60020 failed on socket timeout exception:
> java.net.SocketTi
> meoutException: 60000 millis timeout while waiting for channel to be ready
> for read. ch : java.nio.channels.**SocketChannel[connected local=/
> 67.215.90.51:51605 remo
> te=s8.XXX/67.215.90.38:60020]
> 2011-06-23 15:47:34.379570500 java.io.FileNotFoundException:
> java.io.FileNotFoundException: File does not exist: /hfiles/2011/06/23/14/
> **domainsranked/TopDomainsRan
> k.r3v5PRvK/handling/**3557032074765091256
> 2011-06-23 15:47:34.379573500   at org.apache.hadoop.hdfs.**
> DFSClient$DFSInputStream.**openInfo(DFSClient.java:1602)
> 2011-06-23 15:47:34.379573500   at org.apache.hadoop.hdfs.**
> DFSClient$DFSInputStream.<**init>(DFSClient.java:1593)
>
> Over on the regionserver that was loading this we see that it attempted to
> load and hit a 60 second timeout:
>
> 2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.**regionserver.Store:
> Validating hfile at hdfs://namenode.XXX/hfiles/**
> 2011/06/23/14/domainsranked/**TopDomainsRank.r3v5PRvK/**handling/3557032074765091256
> for inclusion in store handling region domainsranked,368449:2011/0/**
> 03/23:category:ffffffff:com.**zynga.static.fishville.**
> facebook,1305890318961.**d4925aca7852bed32613a509215d42**b
> 8.
> ...
> 2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.**DFSClient: Failed to
> connect to /67.215.90.38:50010, add to deadNodes and continue
> java.net.**SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch : java.nio.channels.**SocketChannel[connected
> local=/67.215.90.38:42199 remote=/67.215.90.38:50010]
> at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
> SocketIOWithTimeout.java:164)
> at org.apache.hadoop.net.**SocketInputStream.read(**
> SocketInputStream.java:155)
> at org.apache.hadoop.net.**SocketInputStream.read(**
> SocketInputStream.java:128)
> at java.io.BufferedInputStream.**fill(BufferedInputStream.java:**218)
> at java.io.BufferedInputStream.**read(BufferedInputStream.java:**237)
> at java.io.DataInputStream.**readShort(DataInputStream.**java:295)
>
> We suspect this particular problem is a resource contention issue on our
> side.  However, the loading process proceeds to rename the file despite the
> failure:
>
> 2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.**regionserver.Store:
> Renaming bulk load file hdfs://namenode.XXX/hfiles/**
> 2011/06/23/14/domainsranked/**TopDomainsRank.r3v5PRvK/**handling/3557032074765091256
> to hdfs://namenode.XXX:8020/**hbase/domainsranked/**
> d4925aca7852bed32613a509215d42**b8/handling/**3615917062821145533
>
> And then the LoadIncrementalHFiles tries to load the hfile again:
>
> 2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.**regionserver.Store:
> Validating hfile at hdfs://namenode.XXX/hfiles/**
> 2011/06/23/14/domainsranked/**TopDomainsRank.r3v5PRvK/**handling/3557032074765091256
> for inclusion in store handling region domainsranked,368449:2011/05/**
> 03/23:category:ffffffff:com.**zynga.static.fishville.**
> facebook,1305890318961.**d4925aca7852bed32613a509215d42**b8.
>
> 2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.**HBaseServer: IPC
> Server handler 147 on 60020, call bulkLoadHFile(hdfs://namenode.**
> XXX/hfiles/2011/06/23/14/**domainsranked/TopDomainsRank.**
> r3v5PRvK/handling/**3557032074765091256, [B@4224508b, [B@5e23f799) from
> 67.215.90.51:51856: error: java.io.FileNotFoundException: File does not
> exist: /hfiles/2011/06/23/14/**domainsranked/TopDomainsRank.**
> r3v5PRvK/handling/**3557032074765091256
>
> This eventually leads to the load command failing.  It feels to me like
> HBase continuing to rename the hfile despite the failure, and then
> LoadIncrementalHFiles trying again is a bug.  I figured I'd ask here before
> opening a ticket for this.
>
> - Adam
>

Reply via email to