[ https://issues.apache.org/jira/browse/IOTDB-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623646#comment-17623646 ]
Gaofei Cao commented on IOTDB-4631: ----------------------------------- Executing remove-datanode.sh simultaneously lead to this problem, we need adding a lock to resolve it. > [ remove datanode ] The number of nodes that can be removed is not determined > ----------------------------------------------------------------------------- > > Key: IOTDB-4631 > URL: https://issues.apache.org/jira/browse/IOTDB-4631 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 0.14.0-SNAPSHOT > Reporter: 刘珍 > Assignee: Gaofei Cao > Priority: Major > Attachments: more_dev.conf, screenshot-1.png, screenshot-2.png, > screenshot-3.png > > > m_1012_d7ed1c1 > 3副本,3C5D,multiLeader > 可缩容的节点数没有进行判定,缩容3个datanode都成功,导致可用datanode只有2个,小于副本数(3) > 第2个,第3个缩容操作对应的节点的数据没迁移成功,缩容后查询5万dev(select count(s_0),count(s_599) from > root.** align by device;),只返回1000条记录 > 缩容3个节点后的集群状态 : > !screenshot-2.png! > !screenshot-1.png! > 测试环境 > 1. 192.168.10.72/73/74/75/76 5个物理机 48cpu 384GB > 3C:72/73/74 > 5D: 72/73/74/75/76 > bm在ip1 > ConfigNode > MAX_HEAP_SIZE="8G" > schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus > data_region_consensus_protocol_class=org.apache.iotdb.consensus.multileader.MultiLeaderConsensus > schema_replication_factor=3 > data_replication_factor=3 > DataNode > MAX_HEAP_SIZE="256G" > MAX_DIRECT_MEMORY_SIZE="32G" > max_connection_for_internal_service=300 > enable_timed_flush_seq_memtable=true > seq_memtable_flush_interval_in_ms=3600000 > seq_memtable_flush_check_interval_in_ms=600000 > enable_timed_flush_unseq_memtable=true > unseq_memtable_flush_interval_in_ms=3600000 > unseq_memtable_flush_check_interval_in_ms=600000 > query_timeout_threshold=36000000 > 启动3C5D集群 > 2. bm写入完成 > 配置见附件 > 3. 执行缩容脚本 > 脚本在ip72,缩容3个datanode,在第3个缩容操作应该报错,实际处理了缩容,进程退出(data没迁移) > liuzhen@fit-72:/data/mpp_test/m_1012_d7ed1c1/datanode$ cat rm.sh > #!/bin.bash > ./sbin/remove-datanode.sh 192.168.10.72:6667 > rm_ip72.out > sleep 2h > ./sbin/remove-datanode.sh 192.168.10.73:6667 > rm_ip73.out > ./sbin/remove-datanode.sh 5 > rm_ip74.out > ip74的log: > 2022-10-13 00:23:37,887 [pool-22-IoTDB-DataNodeInternalRPC-Processor-67] INFO > o.a.i.c.conf.CommonConfig:315 -{color:#DE350B}* Change system status to > Removing! The current Node is being removed from cluster!*{color} > 另外 ip73的data也没迁移成功,ip73缩容后的data: > !screenshot-3.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)