Re: HDFS HA(Based on QJM) Failover Frequently with Large FSimage andBusy Requests

Chackravarthy Esakkimuthu Thu, 27 Apr 2017 21:30:51 -0700

Client failures due to failover gets handled seamlessly by having retries,
so need not worry about that.


And by increasing ha.health-monitor.rpc-timeout.ms to a slightly larger
value, you are just avoiding unnecessary failover when namenode busy
processing other client/service requests. This will get into effect only
when namenode is busy and not able to process zkfc rpc calls and other
times when active namenode shutdown for some reason, failover will be
instant and it will not wait for this much configured time.

On Thu, Apr 27, 2017 at 5:46 PM, <gu.yiz...@zte.com.cn> wrote:

> 1. Is service-rpc configured in namenode?
>
> *Not yet, I was considered to configure servicerpc, but I was thinking
> about the possible disadvantages as well. *
>
> *When failover is happened  because of too many waiting rpcs, if zkfc gets
> normal process from another port, is it possiable that the clients get a
> lot of failures?*
>
>
> 2. ha.health-monitor.rpc-timeout.ms - Also consider increasing zkfc rpc
> call timeout to namenode.
>
> *The same worry,  is it possiable that the clients get a lot of failures?*
>
>
> Thanks very much,
>
> Doris
>
>
>
> ------------------------------------------------------------
> ---------------------------
>
>
> 1. Is service-rpc configured in namenode?
> (dfs.namenode.servicerpc-address - this will create another RPC server
> listening on another port (say 8021) to handle all service (non-client)
> requests and hence default rpc address (say 8020) will handle only client
> requests.)
>
> By doing this way, you would be able to decouple client and service
> requests. Here service requests corresponds to rpc calls from DN, ZKFC etc.
> Hence when cluster is too busy because of too many client operations, ZKFC
> requests will get processed by different rpc and hence need not wait in
> same queue as client requests.)
>
> 2. ha.health-monitor.rpc-timeout.ms - Also consider increasing zkfc rpc
> call timeout to namenode.
>
> By default this is 45 secs. You can consider increasing it to 1 or 2 mins
> depending upon your cluster usage.
>
> Thanks,
> Chackra
>
> On Wed, Apr 26, 2017 at 11:50 AM,  ＜gu.yiz...@zte.com.cn＞ wrote:
>
>>
>> *Hi All,*
>>
>>     HDFS HA (Based on QJM) , 5 journalnodes, Apache 2.5.0 on Redhat 6.5
>> with JDK1.7.
>>
>>     Put 1P+ data into HDFS with FSimage about 10G, then keep on making
>> more requests to this HDFS, namenodes failover frequently. Wanna to know
>> something as follows:
>>
>>
>>  *   1.ANN(active namenode) downloading fsimage.ckpt_* from SNN(standby
>> namenode) leads to very high disk io, at the same time, zkfc fails to
>> monitor the health of ann due to timeout. Is there any releationship
>> between high disk io and zkfc monitor request timeout? Every failover
>> happened when ckpt download, but not every ckpt download leads to failover.*
>>
>>
>>
>> 2017-03-15 09:27:05,750 WARN org.apache.hadoop.ha.HealthMonitor:
>> Transport-level exception trying to monitor health of NameNode at
>> nn1/ip:8020: Call From nn1/ip to nn1:8020 failed on socket timeout
>> exception: java.net.SocketTimeoutException: 45000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/ip:48536
>> remote=nn1/ip:8020]; For more details see:  http://wiki.apache.org/hadoop
>> /SocketTimeout
>>
>> 2017-03-15 09:27:05,750 INFO org.apache.hadoop.ha.HealthMonitor:
>> Entering state SERVICE_NOT_RESPONDING
>>
>>
>> *    2.Due to SERVICE_NOT_RESPONDING, another zkfc fences the old
>> ann(configed sshfence), before restart by my additional monitor, old ann
>> log sometimes shows like this, what is "Rescan of
>> postponedMisreplicatedBlocks"? Does this have any reletionships with
>> failover?*
>>
>> 2017-03-15 04:36:00,866 INFO org.apache.hadoop.hdfs.server.
>> blockmanagement.CacheReplicationMonitor: Rescanning after 30000
>> milliseconds
>>
>> 2017-03-15 04:36:00,931 INFO org.apache.hadoop.hdfs.server.
>> blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0
>> block(s) in 65 millisecond(s).
>>
>> 2017-03-15 04:36:01,127 INFO 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
>> Rescan of postponedMisreplicatedBlocks completed in 23 msecs. 247361 blocks
>> are left. 0 blocks are removed.
>>
>> 2017-03-15 04:36:04,145 INFO 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
>> Rescan of postponedMisreplicatedBlocks completed in 17 msecs. 247361 blocks
>> are left. 0 blocks are removed.
>>
>> 2017-03-15 04:36:07,159 INFO 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
>> Rescan of postponedMisreplicatedBlocks completed in 14 msecs. 247361 blocks
>> are left. 0 blocks are removed.
>>
>> 2017-03-15 04:36:10,173 INFO 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
>> Rescan of postponedMisreplicatedBlocks completed in 14 msecs. 247361 blocks
>> are left. 0 blocks are removed.
>>
>> 2017-03-15 04:36:13,188 INFO 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
>> Rescan of postponedMisreplicatedBlocks completed in 14 msecs. 247361 blocks
>> are left. 0 blocks are removed.
>>
>> 2017-03-15 04:36:16,211 INFO 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
>> Rescan of postponedMisreplicatedBlocks completed in 23 msecs. 247361 blocks
>> are left. 0 blocks are removed.
>>
>> 2017-03-15 04:36:19,234 INFO 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
>> Rescan of postponedMisreplicatedBlocks completed in 22 msecs. 247361 blocks
>> are left. 0 blocks are removed.
>>
>> 2017-03-15 04:36:28,994 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
>> STARTUP_MSG:
>>
>>
>>     *3.I config two dfs.namenode.name.dir and
>> one dfs.journalnode.edits.dir(which shares one disk with nn), is it
>> suitable? Or does this have any disadvantage?*
>>
>>
>> ＜property＞
>>
>> ＜name＞dfs.namenode.name.dir.nameservice.nn1＜/name＞
>>
>> ＜value＞/data1/hdfs/dfs/name,/data2/hdfs/dfs/name＜/value＞
>>
>> ＜/property＞
>>
>> ＜property＞
>>
>> ＜name＞dfs.namenode.name.dir.nameservice.nn2＜/name＞
>>
>> ＜value＞/data1/hdfs/dfs/name,/data2/hdfs/dfs/name＜/value＞
>>
>> ＜/property＞
>>
>>
>> ＜property＞
>>
>> ＜name＞dfs.journalnode.edits.dir＜/name＞
>>
>> ＜value＞/data1/hdfs/dfs/journal＜/value＞
>>
>> ＜/property＞
>>
>>
>>
>>    * 4.Interested in design of checkpoint and edit logs transmission,any
>> explanation,issues or documents?*
>>
>>
>> *Thanks in advance,*
>>
>> *Doris*
>>
>
>
>

Re: HDFS HA(Based on QJM) Failover Frequently with Large FSimage andBusy Requests

Reply via email to