QiangShaowei created IOTDB-6167:
-----------------------------------

             Summary: DataNode can't register to cluster when fetch system 
configuration throws NPE
                 Key: IOTDB-6167
                 URL: https://issues.apache.org/jira/browse/IOTDB-6167
             Project: Apache IoTDB
          Issue Type: Bug
          Components: Core/Cluster
            Reporter: QiangShaowei
             Fix For: master branch


In some special circumstances,DataNode register failed.

the reason is : when DN fistst register , it will  fetch system configuration 
from ConfigNode, if ConfigNode has some error or leader is not ready. the 
fetched configuration will be null, so PNE will abort DN register process, and 
the

'SYSTEM_PROPERTIES.deleteOnExit();' skiped.

so when restart the DN again , it restart failed beacause nodeId is -1

 

在一些极端特殊的情况下,DN会注册失败

原因是,DN首次注册时,会从CN端拉取系统配置,如果碰巧CN有异常或者leader没有准备好,获取的系统配置是Null,DN侧没有判断就直接使用,会抛空指针异常,就中断了注册流程。跳过了'SYSTEM_PROPERTIES.deleteOnExit();'逻辑

当DN再次启动时,由于system.properties存在,不被认为是首次重启,但是nodeId是-1,所以启动失败。

 

DN log info:

 

2023-09-20 21:45:29,041 | INFO  | [main] | Successfully update ConfigNode: 
[TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, port:22259), 
TEndPoint(ip:120.12.0.167, port:22259)]. | 
org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96) 
2023-09-20 21:45:29,042 | INFO  | [main] | Pulling system configurations from 
the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode 
(DataNode.java:238) 
2023-09-20 21:45:29,550 | ERROR | [main] | Failed to execute system command | 
org.apache.iotdb.commons.ServerCommandLine (ServerCommandLine.java:69) 
{color:#FF0000}java.lang.NullPointerException: null{color}
{color:#FF0000}    at 
org.apache.iotdb.db.conf.IoTDBDescriptor.loadGlobalConfig(IoTDBDescriptor.java:1930){color}
    at 
org.apache.iotdb.db.service.DataNode.pullAndCheckSystemConfigurations(DataNode.java:275)
    at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:164)
    at 
org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
    at 
org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
    at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
    at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
2023-09-20 21:46:02,198 | INFO  | [main] | Start to read config file 
file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:164) 
2023-09-20 21:46:02,221 | INFO  | [main] | Start to read config file 
file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-datanode.properties
 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:181) 
2023-09-20 21:46:02,247 | INFO  | [main] | initial allocateMemoryForRead = 
644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor 
(IoTDBDescriptor.java:1583) 
2023-09-20 21:46:02,247 | INFO  | [main] | initial allocateMemoryForWrite = 
644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor 
(IoTDBDescriptor.java:1584) 
2023-09-20 21:46:02,248 | INFO  | [main] | initial allocateMemoryForSchema = 
214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor 
(IoTDBDescriptor.java:1585) 
2023-09-20 21:46:02,248 | INFO  | [main] | initial allocateMemoryForConsensus = 
214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor 
(IoTDBDescriptor.java:1586) 
2023-09-20 21:46:02,248 | INFO  | [main] | allocateMemoryForSchemaRegion = 
107374182 | org.apache.iotdb.db.conf.IoTDBDescriptor 
(IoTDBDescriptor.java:1710) 
2023-09-20 21:46:02,250 | INFO  | [main] | allocateMemoryForSchemaCache = 
64424509 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1713) 
2023-09-20 21:46:02,250 | INFO  | [main] | allocateMemoryForPartitionCache = 
21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1717) 
2023-09-20 21:46:02,250 | INFO  | [main] | allocateMemoryForLastCache = 
21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:1720) 
2023-09-20 21:46:02,257 | INFO  | [main] | try loading iotdb-common.properties 
from 
/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
 | org.apache.iotdb.tsfile.common.conf.TSFileDescriptor 
(TSFileDescriptor.java:135) 
2023-09-20 21:46:02,388 | INFO  | [main] | IoTDB enable memory control: true | 
org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:383) 
2023-09-20 21:46:02,492 | INFO  | [main] | IoTDB-DataNode environment 
variables: 
    
IOTDB_HOME=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/install/FusionInsight-IoTDB-1.1.0/iotdb;
    IOTDB_CONF=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc;
    IOTDB_DATA_HOME=null; | org.apache.iotdb.db.service.DataNode 
(DataNode.java:150) 
2023-09-20 21:46:02,777 | INFO  | [main] | new single scheduled thread pool: 
Stateful-Trigger-Information-Updater | 
org.apache.iotdb.commons.concurrent.IoTDBThreadPoolFactory 
(IoTDBThreadPoolFactory.java:192) 
2023-09-20 21:46:02,781 | INFO  | [main] | Running mode -s | 
org.apache.iotdb.db.service.DataNodeServerCommandLine 
(DataNodeServerCommandLine.java:96) 
2023-09-20 21:46:02,790 | INFO  | [main] | Starting IoTDB 
1.1.0-h0.cbu.mrs.330.r3 (Build: 89ddf14-dev) | 
org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:174) 
2023-09-20 21:46:02,815 | WARN  | [main] | Failed to copy file from 
/srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp to 
/srv/BigData/data1/iotdb/iotdbserver/data/system.properties | 
org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:421) 
2023-09-20 21:46:02,822 | INFO  | [main] | Start JMX remotely: JMX is enabled 
to receive remote connection on port 22258 | 
org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:80) 
2023-09-20 21:46:02,823 | INFO  | [main] | JDK version is 8. | 
org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:49) 
2023-09-20 21:46:02,832 | INFO  | [main] | Successfully update ConfigNode: 
[TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, port:22259), 
TEndPoint(ip:120.12.0.167, port:22259)]. | 
org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96) 
2023-09-20 21:46:02,835 | INFO  | [main] | Pulling system configurations from 
the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode 
(DataNode.java:238) 
2023-09-20 21:46:03,514 | WARN  | [main] | Failed to connect to ConfigNode 
TEndPoint(ip:120.12.0.167, port:22259) from DataNode TEndPoint(ip:120.12.0.167, 
port:22260), because the current node is not leader, try next node | 
org.apache.iotdb.db.client.ConfigNodeClient (ConfigNodeClient.java:308) 
2023-09-20 21:46:04,760 | INFO  | [main] | Create system.properties.tmp 
/srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp. | 
org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:537) 
2023-09-20 21:46:04,764 | INFO  | [main] | Successfully pull system 
configurations from ConfigNode-leader. | org.apache.iotdb.db.service.DataNode 
(DataNode.java:306) 
2023-09-20 21:46:04,764 | INFO  | [main] | Sending restart request to 
ConfigNode-leader... | org.apache.iotdb.db.service.DataNode (DataNode.java:405) 
2023-09-20 21:46:04,807 | ERROR | [main] | Fail to start server | 
{color:#FF0000}org.apache.iotdb.db.service.DataNode (DataNode.java:189) {color}
{color:#FF0000}org.apache.iotdb.commons.exception.StartupException: Reject 
DataNode restart. Because the nodeId of the current DataNode is -1. Possible 
solutions are as follows:{color}
{color:#FF0000}    1. Delete "data" dir and retry.{color}
{color:#FF0000}    at 
org.apache.iotdb.db.service.DataNode.sendRestartRequestToConfigNode(DataNode.java:452){color}
    at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:171)
    at 
org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
    at 
org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
    at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
    at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
2023-09-20 21:46:04,808 | INFO  | [main] | Deactivating IoTDB DataNode... | 
org.apache.iotdb.db.service.DataNode (DataNode.java:864) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to