[ 
https://issues.apache.org/jira/browse/HDFS-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reassigned HDFS-16155:
----------------------------------

    Assignee: Bryan Beaudreault

> Allow configurable exponential backoff in DFSInputStream refetchLocations
> -------------------------------------------------------------------------
>
>                 Key: HDFS-16155
>                 URL: https://issues.apache.org/jira/browse/HDFS-16155
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfsclient
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The retry policy in 
> [DFSInputStream#refetchLocations|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1018-L1040]
>  was first written many years ago. It allows configuration of the base time 
> window, but subsequent retries double in an un-configurable way. This retry 
> strategy makes sense in some clusters as it's very conservative and will 
> avoid DDOSing the namenode in certain systemic failure modes – for example, 
> if a  file is being read by a large hadoop job and the underlying blocks are 
> moved by the balancer. In this case, enough datanodes would be added to the 
> deadNodes list and all hadoop tasks would simultaneously try to refetch the 
> blocks. The 3s doubling with random factor helps break up that stampeding 
> herd.
> However, not all cluster use-cases are created equal, so there are other 
> cases where a more aggressive initial backoff is preferred. For example in a 
> low-latency single reader scenario. In this case, if the balancer moves 
> enough blocks, the reader hits this 3s backoff which is way too long for a 
> low latency use-case.
> One could configure the the window very low (10ms), but then you can hit 
> other systemic failure modes which would result in readers DDOSing the 
> namenode again. For example, if blocks went missing due to truly dead 
> datanodes. In this case, many readers might be refetching locations for 
> different files with retry backoffs like 10ms, 20ms, 40ms, etc. It takes a 
> while to backoff enough to avoid impacting the namenode with that strategy.
> I suggest adding a configurable multiplier to the backoff strategy so that 
> operators can tune this as they see fit for their use-case. In the above low 
> latency case, one could set the base very low (say 2ms) and the multiplier 
> very high (say 50). This gives an aggressive first retry that very quickly 
> backs off.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to