[ https://issues.apache.org/jira/browse/IOTDB-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651923#comment-17651923 ]
Song Ziyang commented on IOTDB-5244: ------------------------------------ Corrupted MD5 是运行时可能出现的小概率事件。出现这个问题之后,系统会进行snapshot传输重试。因此,这个问题并不会影响系统的正常使用。这一条错误日志是可以忽略的。 > [ratis][remove datanode]installSnapshot failed > ---------------------------------------------- > > Key: IOTDB-5244 > URL: https://issues.apache.org/jira/browse/IOTDB-5244 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: master branch, 1.0.0 > Reporter: 刘珍 > Assignee: Song Ziyang > Priority: Major > Attachments: iotdb_5244.conf > > > rel/1.0 1216_c92440f > 1. 启动3副本3C5D集群,config/schema/data 均是ratis协议。 > 2. BM写入数据,完成。 > 配置见附件。 > 3.缩容节点(ip73)调用stop-datanode.sh,再start, > 再stop-datanode.sh,再start。 > 执行缩容。 > 4.ip68 datanode 报错 > 2022-12-19 20:25:22,705 [grpc-default-executor-4936] ERROR > o.a.r.s.i.SnapshotInstallationHandler:96 - 5@group-00010000001E: > installSnapshot failed > org.apache.ratis.io.CorruptedFileException: File > /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource > (exist? false, length=0) is corrupted: MD5 mismatch for snapshot-158230 > installation. Renamed temporary snapshot file > /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource > to > /data/liuzhen_test/master_1216_d426f7a/data/datanode/data/snapshot/.tmp.group-00010000001E/snapshot-c01d9ca8-3f9a-4b02-9fb4-fa680eae89e0/66_158230/sequence/root.test.g_3/30/2538/1671443330163-38-0-0.tsfile.resource.corrupt20221219-202522_690 > at > org.apache.ratis.server.storage.SnapshotManager.installSnapshot(SnapshotManager.java:155) > at > org.apache.ratis.server.impl.ServerState.installSnapshot(ServerState.java:480) > at > org.apache.ratis.server.impl.SnapshotInstallationHandler.checkAndInstallSnapshot(SnapshotInstallationHandler.java:181) > at > org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:120) > at > org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:94) > at > org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1517) > at > org.apache.ratis.server.impl.RaftServerProxy.installSnapshot(RaftServerProxy.java:640) > at > org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:242) > at > org.apache.ratis.grpc.server.GrpcServerProtocolService$2.process(GrpcServerProtocolService.java:239) > at > org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onNext(GrpcServerProtocolService.java:124) > at > org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:332) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:315) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:834) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 测试环境 > 1. 192.168.10.62/66/68 3ConfigNode 72cpu 256GB > 192.168.10.62/66/68/64/73 5DataNode > 73机器:48CPU 384GB > 2.数据库配置参数 > COMMON配置 > schema_replication_factor=3 > data_replication_factor=3 > data_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > query_timeout_threshold=3600000 > ConfigNode配置 > cn_connection_timeout_ms=120000 > MAX_HEAP_SIZE="8G" > DataNode配置 > MAX_HEAP_SIZE="192G" > MAX_DIRECT_MEMORY_SIZE="32G" > dn_max_connection_for_internal_service=300 > 3.BM配置见附件 > 写入完成 > 4.ip73 > stop-datanode.sh > 清缓存,启动datanode > stop-datanode.sh > 启动datanode > 执行缩容。查看节点状态及日志。 -- This message was sent by Atlassian Jira (v8.20.10#820010)