Jinrui Zhang created IOTDB-5928: ----------------------------------- Summary: DeadLock between TTL and Compaction Key: IOTDB-5928 URL: https://issues.apache.org/jira/browse/IOTDB-5928 Project: Apache IoTDB Issue Type: Improvement Reporter: Jinrui Zhang Assignee: Jinrui Zhang Fix For: 1.1.1
h4. 版本 {panel} Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1) {panel} h4. 复现步骤 {panel} 问题描述: TTL 和 合并并发产生死锁,数据写入不进去(没报错信息)。 测试流程如下: 1. 测试版本 Enterprise version 1.1.1-SNAPSHOT (Build: a8387f1) 启动3副本3C5D集群,配置参数以ip74为例: liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/confignode-env.sh MAX_HEAP_SIZE="8G" liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-confignode.properties cn_internal_address=192.168.10.74 cn_target_config_node_list=192.168.10.72:10710 cn_connection_timeout_ms=120000 cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9081 liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/datanode-env.sh MAX_HEAP_SIZE="256G" MAX_DIRECT_MEMORY_SIZE="32G" liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-datanode.properties dn_rpc_address=192.168.10.74 dn_internal_address=192.168.10.74 dn_target_config_node_list=192.168.10.72:10710,192.168.10.73:10710,192.168.10.74:10710 dn_connection_timeout_ms=120000 dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT liuzhen@fit-74:/data/mpp_test/t_rc4_0516_a8387f1$ conf/iotdb-common.properties schema_replication_factor=3 data_replication_factor=3 series_slot_num=1000 schema_region_group_extension_policy=CUSTOM default_schema_region_group_num_per_database=10 data_region_group_extension_policy=CUSTOM default_data_region_group_num_per_database=20 disk_space_warning_threshold=0.01 query_timeout_threshold=36000000 iot_consensus_throttle_threshold_in_byte=536870912000 *2. 启动benchmark 读写,配置文件见附件 0517_rc4_lt.conf* *3.启动TTL 脚本,配置文件见附件set_ttl.sh* {*}每48小时{*},先把集群置为READONLY, 再设置TTL 删除所有的tsfile(没flush,没封口的tsfile不删除),unset ttl ,设置集群为RUNNING。({*}这期间benchmark客户端读写操作不停{*}) !image-2023-05-25-15-07-11-025.png! *4.{color:#de350b}运行4 day,出现死锁,数据写入不进去。{color}* 监控看到的write point per second 为0 !image-2023-05-25-15-02-55-676.png! {panel} h4. Bug 现象 {panel} TTL 和 合并并发产生死锁 {panel} h4. 预期结果 {panel} 无死锁 {panel} -- This message was sent by Atlassian Jira (v8.20.10#820010)