Hello Tom.

Sorry for my absence of answers, I don't know why gmail puts your mail into
spam -_-.

To answer you :

   - The metrics callQueueLength, avgQueueTime, avgProcessingTime and GC
   metric are all OK
   - Threads are plenty sufficient (I can see the metrics also for them and
   I  am below 200, the number I have for 8020 RPC server)

Did you see my other answers about this problem ?
I would be interested to have your opinion about that !

Best regards.

T@le


Le mar. 15 févr. 2022 à 02:16, tom lee <tomlees...@gmail.com> a écrit :

> It might be helpful to analyze namenode metrics and logs.
>
> What about some key metrics? Examples are callQueueLength, avgQueueTime,
> avgProcessingTime and GC metrics.
>
> In addition, is the number of threads(dfs.namenode.service.handler.count)
> in the namenode sufficient?
>
> Hopefully this will help.
>
> Best regards.
> Tom
>
> Tale Hive <tale2.h...@gmail.com> 于2022年2月14日周一 23:57写道:
>
>> Hello.
>>
>> I encounter a strange problem with my namenode. I have the following
>> architecture :
>> - Two namenodes in HA
>> - 600 datanodes
>> - HDP 3.1.4
>> - 150 millions of files and folders
>>
>> Sometimes, when I query the namenode with the hdfs client, I got a
>> timeout error like this :
>> hdfs dfs -ls -d /user/myuser
>>
>> 22/02/14 15:07:44 INFO retry.RetryInvocationHandler:
>> org.apache.hadoop.net.ConnectTimeoutException: Call From
>> <my-client-hostname>/<my-client-ip> to <active-namenode-hostname>:8020
>> failed on socket timeout exception:
>>   org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout
>> while waiting for channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending
>> remote=<active-namenode-hostname>/<active-namenode-ip>:8020];
>>   For more details see:  http://wiki.apache.org/hadoop/SocketTimeout,
>> while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over
>> <active-namenode-hostname>/<active-namenode-ip>:8020 after 2 failover
>> attempts. Trying to failover after sleeping for 2694ms.
>>
>> I checked the heap of the namenode and there is no problem (I have 75 GB
>> of max heap, I'm around 50 used GB).
>> I checked the threads of the clientRPC for the namenode and I'm at 200
>> which respects the recommandations from hadoop operations book.
>> I have serviceRPC enabled to prevent any problem which could be coming
>> from datanodes or ZKFC.
>> General resources seems OK, CPU usage is pretty fine, same for memory,
>> network or IO.
>> No firewall is enabled on my namenodes nor my client.
>>
>> I was wondering what could cause this problem, please ?
>>
>> Thank you in advance for your help !
>>
>> Best regards.
>>
>> T@le
>>
>

Reply via email to