[ 
https://issues.apache.org/jira/browse/IOTDB-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609813#comment-17609813
 ] 

Jinrui Zhang commented on IOTDB-4526:
-------------------------------------

The exception won't lead to failure of DataRegion Migration. 

According to the log, the data regions migration succeeded finally, although it 
costed much time.

 

This PR is used to fix the exception 
https://github.com/apache/iotdb/pull/7449/files

> [ remove datanode ] ERROR o.a.i.d.s.t.i.DataNodeInternalRPCServiceImpl:806 - 
> change region DataRegion[xx] leader failed
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: IOTDB-4526
>                 URL: https://issues.apache.org/jira/browse/IOTDB-4526
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: mpp-cluster
>    Affects Versions: 0.14.0-SNAPSHOT
>            Reporter: 刘珍
>            Assignee: Jinrui Zhang
>            Priority: Major
>         Attachments: image-2022-09-27-09-54-40-156.png, more_dev.conf, 
> screenshot-1.png
>
>
> m_0924_04d9a4a
> schemaregion : ratis
> dataregion :multiLeader
> 均3副本,3C3D,bm写入完成(5万dev,600 sensor / dev , 1万points/sensor),增加2个节点ip75,ip76 
> ,再缩容ip72,{color:#DE350B}change region DataRegion[xx] leader failed{color};new 
> peer的data/consensus/data_region 文件夹大({color:#DE350B}需确认是否正常{color});ip72 
> 开始removing后还有新的compaction操作({color:#DE350B}需确认是否必要{color})。
> ip72 datanode error:
> 2022-09-27 09:33:37,558 [pool-21-IoTDB-DataNodeInternalRPC-Processor-150] 
> ERROR o.a.i.d.s.t.i.DataNodeInternalRPCServiceImpl:806 -{color:#DE350B} 
> change region DataRegion[13] leader failed{color}
> 2022-09-27 09:33:37,562 [pool-21-IoTDB-DataNodeInternalRPC-Processor-150] 
> ERROR o.a.t.ProcessFunction:47 - Internal error processing changeRegionLeader
> {color:#DE350B}java.lang.NullPointerException: null{color}
>         at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.transferLeader(DataNodeInternalRPCServiceImpl.java:808)
>         at 
> org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.changeRegionLeader(DataNodeInternalRPCServiceImpl.java:790)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$changeRegionLeader.getResult(IDataNodeRPCService.java:3212)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$changeRegionLeader.getResult(IDataNodeRPCService.java:3192)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> ip72(leader)confignode error:
> 2022-09-27 09:33:37,585 [ProcExecWorker-8] ERROR 
> o.a.i.c.c.s.d.SyncDataNodeClientPool:147 - Change regions leader error on 
> Date node: TEndPoint(ip:192.168.10.72, port:9003)
> org.apache.thrift.TException: Error in calling method changeRegionLeader
>         at 
> org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.intercept(SyncThriftClientWithErrorHandler.java:94)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1.changeRegionLeader(<generated>)
>         at 
> org.apache.iotdb.confignode.client.sync.datanode.SyncDataNodeClientPool.changeRegionLeader(SyncDataNodeClientPool.java:141)
>         at 
> org.apache.iotdb.confignode.procedure.env.DataNodeRemoveHandler.changeRegionLeader(DataNodeRemoveHandler.java:540)
>         at 
> org.apache.iotdb.confignode.procedure.impl.RegionMigrateProcedure.executeFromState(RegionMigrateProcedure.java:104)
>         at 
> org.apache.iotdb.confignode.procedure.impl.RegionMigrateProcedure.executeFromState(RegionMigrateProcedure.java:46)
>         at 
> org.apache.iotdb.confignode.procedure.StateMachineProcedure.execute(StateMachineProcedure.java:185)
>         at 
> org.apache.iotdb.confignode.procedure.Procedure.doExecute(Procedure.java:365)
>         at 
> org.apache.iotdb.confignode.procedure.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:414)
>         at 
> org.apache.iotdb.confignode.procedure.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:373)
>         at 
> org.apache.iotdb.confignode.procedure.ProcedureExecutor.access$300(ProcedureExecutor.java:50)
>         at 
> org.apache.iotdb.confignode.procedure.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:741)
> Caused by: org.apache.thrift.TException: Error in calling method 
> recv_changeRegionLeader
>         at 
> org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.intercept(SyncThriftClientWithErrorHandler.java:94)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1.recv_changeRegionLeader(<generated>)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Client.changeRegionLeader(IDataNodeRPCService.java:741)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1.CGLIB$changeRegionLeader$133(<generated>)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1$$FastClassByCGLIB$$bb86de5d.invoke(<generated>)
>         at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
>         at 
> org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.intercept(SyncThriftClientWithErrorHandler.java:55)
>         ... 11 common frames omitted
> Caused by: org.apache.thrift.TException: Error in calling method receiveBase
>         at 
> org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.intercept(SyncThriftClientWithErrorHandler.java:94)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1.receiveBase(<generated>)
>         at 
> org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Client.recv_changeRegionLeader(IDataNodeRPCService.java:754)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1.CGLIB$recv_changeRegionLeader$59(<generated>)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1$$FastClassByCGLIB$$bb86de5d.invoke(<generated>)
>         at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
>         at 
> org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.intercept(SyncThriftClientWithErrorHandler.java:55)
>         ... 17 common frames omitted
> Caused by: org.apache.thrift.TApplicationException: Internal error processing 
> changeRegionLeader
>         at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1.CGLIB$receiveBase$139(<generated>)
>         at 
> org.apache.iotdb.commons.client.sync.SyncDataNodeInternalServiceClient$$EnhancerByCGLIB$$986af3c1$$FastClassByCGLIB$$bb86de5d.invoke(<generated>)
>         at net.sf.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
>         at 
> org.apache.iotdb.commons.client.sync.SyncThriftClientWithErrorHandler.intercept(SyncThriftClientWithErrorHandler.java:55)
>         ... 23 common frames omitted
> 测试环境:
> 1. 192.168.10.72/73/74/75/76   48CPU 384GB
> 集群配置参数
> ConfigNode
> MAX_HEAP_SIZE="8G"
> schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus
> data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus
> time_partition_interval_for_routing=8640000
> schema_replication_factor=3
> data_replication_factor=3
> connection_timeout_ms=3600000
> DataNode配置
> MAX_HEAP_SIZE="256G"
> MAX_DIRECT_MEMORY_SIZE="32G"
> connection_timeout_ms=36000000
> max_connection_for_internal_service=300
> enable_timed_flush_seq_memtable=true
> seq_memtable_flush_interval_in_ms=3600000
> seq_memtable_flush_check_interval_in_ms=600000
> enable_timed_flush_unseq_memtable=true
> unseq_memtable_flush_interval_in_ms=3600000
> unseq_memtable_flush_check_interval_in_ms=600000
> max_waiting_time_when_insert_blocked=3600000
> query_timeout_threshold=36000000
> 启动3C: ip72,ip73,ip74
> 启动3D: ip72,ip73,ip74
> 2. benchmark 执行写入,写入完成
> 配置文件见附件
> 3. 启动ip75 ,ip76的datanode服务
> 4. 缩容ip72
> 5.查看缩容节点ip72的日志,new peer ip75的日志,confignode leader ip72的日志
> 6. new peer ip75 consensus文件夹大
>  !image-2022-09-27-09-54-40-156.png! 
> ip72 的data下各文件夹大小:
>  !screenshot-1.png! 
> 7. ip72 缩容置位removing状态后,还有新的合并执行



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to