Todd Lipcon created HDFS-4417:
---------------------------------

             Summary: HDFS-347: fix case where local reads get disabled 
incorrectly
                 Key: HDFS-4417
                 URL: https://issues.apache.org/jira/browse/HDFS-4417
             Project: Hadoop HDFS
          Issue Type: Sub-task
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
following case:
- a workload is running which puts a bunch of local sockets in the PeerCache
- the workload abates for a while, causing the sockets to go "stale" (ie the DN 
side disconnects after the keepalive timeout)
- the workload starts again

In this case, the local socket retrieved from the cache failed the 
newBlockReader call, and it incorrectly disabled local sockets on that host. 
This is similar to an earlier bug HDFS-3376, but not quite the same.

The next issue we ran into is that, once this happened, it never tried local 
sockets again, because the cache held lots of TCP sockets. Since we always 
managed to get a cached socket to the local node, it didn't bother trying local 
read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to