刘珍 created IOTDB-5244: ------------------------- Summary: [ratis][remove datanode]installSnapshot failed Key: IOTDB-5244 URL: https://issues.apache.org/jira/browse/IOTDB-5244 Project: Apache IoTDB Issue Type: Bug Components: mpp-cluster Affects Versions: master branch, 1.0.0 Reporter: 刘珍 Assignee: Song Ziyang
rel/1.0 1216_c92440f 1. 启动3副本3C5D集群,config/schema/data 均是ratis协议。 2. BM写入数据,完成。 配置见附件。 3.缩容节点(ip73)调用stop-datanode.sh,再start, 再stop-datanode.sh,再start。 执行缩容。 4.ip68 datanode 报错 2022-12-19 20:25:22,705 [grpc-default-executor-4936] ERROR o.a.r.s.i.SnapshotInstallationHandler:96 - 5@group-00010000001E: installSnapshot failed org.apache.ratis.io.CorruptedFileException: File /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource (exist? false, length=0) is corrupted: MD5 mismatch for snapshot-158230 installation. Renamed temporary snapshot file /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource to /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource.corrupt20221219-202522_690 at org.apache.ratis.server.storage.SnapshotManager.installSnapshot(SnapshotManager.java:155) at org.apache.ratis.server.impl.ServerState.installSnapshot(ServerState.java:480) at org.apache.ratis.server.impl.SnapshotInstallationHandler.checkAndInstallSnapshot(SnapshotInstallationHandler.java:181) at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:120) at org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:94) at org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1517) at org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:640) at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:242) at org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:239) at org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:124) at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262) at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:332) at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:315) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 测试环境 1. 192.168.10.62/66/68 3ConfigNode 72cpu 256GB 192.168.10.62/66/68/64/73 5DataNode 73机器:48CPU 384GB 2.数据库配置参数 COMMON配置 schema_replication_factor=3 data_replication_factor=3 data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus query_timeout_threshold=3600000 ConfigNode配置 cn_connection_timeout_ms=120000 MAX_HEAP_SIZE="8G" DataNode配置 MAX_HEAP_SIZE="192G" MAX_DIRECT_MEMORY_SIZE="32G" dn_max_connection_for_internal_service=300 3.BM配置见附件 写入完成 4.ip73 stop-datanode.sh 清缓存,启动datanode stop-datanode.sh 启动datanode 执行缩容。查看节点状态及日志。 -- This message was sent by Atlassian Jira (v8.20.10#820010)