Hello Tom. Sorry for my absence of answers, I don't know why gmail puts your mail into spam -_-.
To answer you : - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC metric are all OK - Threads are plenty sufficient (I can see the metrics also for them and I am below 200, the number I have for 8020 RPC server) Did you see my other answers about this problem ? I would be interested to have your opinion about that ! Best regards. T@le Le mar. 15 févr. 2022 à 02:16, tom lee <tomlees...@gmail.com> a écrit : > It might be helpful to analyze namenode metrics and logs. > > What about some key metrics? Examples are callQueueLength, avgQueueTime, > avgProcessingTime and GC metrics. > > In addition, is the number of threads(dfs.namenode.service.handler.count) > in the namenode sufficient? > > Hopefully this will help. > > Best regards. > Tom > > Tale Hive <tale2.h...@gmail.com> 于2022年2月14日周一 23:57写道: > >> Hello. >> >> I encounter a strange problem with my namenode. I have the following >> architecture : >> - Two namenodes in HA >> - 600 datanodes >> - HDP 3.1.4 >> - 150 millions of files and folders >> >> Sometimes, when I query the namenode with the hdfs client, I got a >> timeout error like this : >> hdfs dfs -ls -d /user/myuser >> >> 22/02/14 15:07:44 INFO retry.RetryInvocationHandler: >> org.apache.hadoop.net.ConnectTimeoutException: Call From >> <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020 >> failed on socket timeout exception: >> org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout >> while waiting for channel to be ready for connect. ch : >> java.nio.channels.SocketChannel[connection-pending >> remote=<active-namenode-hostname>/<active-namenode-ip>:8020]; >> For more details see: http://wiki.apache.org/hadoop/SocketTimeout, >> while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over >> <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover >> attempts. Trying to failover after sleeping for 2694ms. >> >> I checked the heap of the namenode and there is no problem (I have 75 GB >> of max heap, I'm around 50 used GB). >> I checked the threads of the clientRPC for the namenode and I'm at 200 >> which respects the recommandations from hadoop operations book. >> I have serviceRPC enabled to prevent any problem which could be coming >> from datanodes or ZKFC. >> General resources seems OK, CPU usage is pretty fine, same for memory, >> network or IO. >> No firewall is enabled on my namenodes nor my client. >> >> I was wondering what could cause this problem, please ? >> >> Thank you in advance for your help ! >> >> Best regards. >> >> T@le >> >