[jira] [Comment Edited] (HDFS-15162) Optimize frequency of regular block reports

JiangHua Zhu (Jira) Wed, 27 Jan 2021 22:22:20 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273266#comment-17273266
 ]


JiangHua Zhu edited comment on HDFS-15162 at 1/28/21, 6:21 AM:
---------------------------------------------------------------

[~ayushtkn] , I noticed your opinion.
 I agree with what you said. When the DN connects to the NN abnormally, it 
means that the NN is under pressure or the midway connection fails.
 Recently I encountered a problem. When DN connected to NN, after frequent 
retries many times (for example, 50 times), an exception broke out. The log is 
as follows:
 2021-01-01 17:55:21,099 [15993307503]-INFO [clusterxxxx lifeline to 
xxxx/xxxx:port:Client$Connection@948]-Retrying connect to server: 
xxxx/xxxx:port. Already tried 49 time(s) ; retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
 2021-01-01 17:55:21,100 [15993307504]-WARN [clusterxxxx lifeline to 
xxxx/xxxx:port:BPServiceActor$LifelineSender@1008]-IOException in 
LifelineSender for Block pool xxxx (Datanode Uuid xxxx) service to xxxx/xxxx: 
port
 java.net.ConnectException: Call From xxxx/xxxx to xxxx:port failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see: [http://wiki.apache.org/hadoop/ConnectionRefused]
 at sun.reflect.GeneratedConstructorAccessor68.newInstance(Unknown Source)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:754)
 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
 at org.apache.hadoop.ipc.Client.call(Client.java:1453)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
 at com.sun.proxy.$Proxy21.sendLifeline(Unknown Source)
 at 
org.apache.hadoop.hdfs.protocolPB.DatanodeLifelineProtocolClientSideTranslatorPB.sendLifeline(DatanodeLifelineProtocolClientSideTranslatorPB.java:100)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.sendLifeline(BPServiceActor.java:1074)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.sendLifelineIfDue(BPServiceActor.java:1058)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.run(BPServiceActor.java:1003)

FBR should not be triggered at this time.


was (Author: jianghuazhu):
[~ayushtkn] , I noticed your opinion.
I agree with what you said. When the DN connects to the NN abnormally, it means 
that the NN is under pressure or the midway connection fails.
Recently I encountered a problem. When DN connected to NN, after frequent 
retries many times (for example, 50 times), an exception broke out. The log is 
as follows:
2021-01-01 17:55:21,099 [15993307503]-INFO [clusterxxxx lifeline to 
xxxx/xxxx:port:Client$Connection@948]-Retrying connect to server: 
xxxx/xxxx:port. Already tried 49 time(s) ; retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2021-01-12 17:55:21,100 [15993307504]-WARN [clusterxxxx lifeline to 
xxxx/xxxx:port:BPServiceActor$LifelineSender@1008]-IOException in 
LifelineSender for Block pool xxxx (Datanode Uuid xxxx) service to xxxx/xxxx: 
port
java.net.ConnectException: Call From xxxx/xxxx to xxxx:port failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor68.newInstance(Unknown Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:754)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1453)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy21.sendLifeline(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeLifelineProtocolClientSideTranslatorPB.sendLifeline(DatanodeLifelineProtocolClientSideTranslatorPB.java:100)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.sendLifeline(BPServiceActor.java:1074)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.sendLifelineIfDue(BPServiceActor.java:1058)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$LifelineSender.run(BPServiceActor.java:1003)

FBR should not be triggered at this time.

> Optimize frequency of regular block reports
> -------------------------------------------
>
>                 Key: HDFS-15162
>                 URL: https://issues.apache.org/jira/browse/HDFS-15162
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Critical
>
> Avoid sending block report at regular interval, if there is no failover, 
> DiskError or any exception encountered in connecting to the Namenode.
> This JIRA intents to limit the regular block reports to be sent only in case 
> of the above scenarios and during re-registration  of datanode, to eliminate 
> the overhead of processing BlockReports at Namenode in case of huge clusters.
> *Eg.* If a block report was sent at 0000 hours and the next was scheduled at 
> 0600 hours if there is no above mentioned scenario, it will skip sending the 
> BR, and schedule it to next 1200 hrs. if something of such sort happens 
> between 06:- 12: it would send the BR normally.
> *NOTE*: This would be optional and can be turned off by default. Would add a 
> configuration to enable this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15162) Optimize frequency of regular block reports

Reply via email to