[ https://issues.apache.org/jira/browse/IOTDB-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
QiangShaowei reassigned IOTDB-6167: ----------------------------------- Assignee: QiangShaowei > DataNode can't register to cluster when fetch system configuration throws NPE > ----------------------------------------------------------------------------- > > Key: IOTDB-6167 > URL: https://issues.apache.org/jira/browse/IOTDB-6167 > Project: Apache IoTDB > Issue Type: Bug > Components: Core/Cluster > Reporter: QiangShaowei > Assignee: QiangShaowei > Priority: Major > Fix For: master branch > > > In some special circumstances,DataNode register failed. > the reason is : when DN fistst register , it will fetch system configuration > from ConfigNode, if ConfigNode has some error or leader is not ready. the > fetched configuration will be null, so PNE will abort DN register process, > and the > 'SYSTEM_PROPERTIES.deleteOnExit();' skiped. > so when restart the DN again , it restart failed beacause nodeId is -1 > > 在一些极端特殊的情况下,DN会注册失败 > 原因是,DN首次注册时,会从CN端拉取系统配置,如果碰巧CN有异常或者leader没有准备好,获取的系统配置是Null,DN侧没有判断就直接使用,会抛空指针异常,就中断了注册流程。跳过了'SYSTEM_PROPERTIES.deleteOnExit();'逻辑 > 当DN再次启动时,由于system.properties存在,不被认为是首次重启,但是nodeId是-1,所以启动失败。 > > DN log info: > > 2023-09-20 21:45:29,041 | INFO | [main] | Successfully update ConfigNode: > [TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, > port:22259), TEndPoint(ip:120.12.0.167, port:22259)]. | > org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96) > 2023-09-20 21:45:29,042 | INFO | [main] | Pulling system configurations from > the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode > (DataNode.java:238) > 2023-09-20 21:45:29,550 | ERROR | [main] | Failed to execute system command | > org.apache.iotdb.commons.ServerCommandLine (ServerCommandLine.java:69) > {color:#FF0000}java.lang.NullPointerException: null{color} > {color:#FF0000} at > org.apache.iotdb.db.conf.IoTDBDescriptor.loadGlobalConfig(IoTDBDescriptor.java:1930){color} > at > org.apache.iotdb.db.service.DataNode.pullAndCheckSystemConfigurations(DataNode.java:275) > at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:164) > at > org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100) > at > org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64) > at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151) > at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17) > 2023-09-20 21:46:02,198 | INFO | [main] | Start to read config file > file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties > | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:164) > 2023-09-20 21:46:02,221 | INFO | [main] | Start to read config file > file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-datanode.properties > | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:181) > 2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForRead = > 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1583) > 2023-09-20 21:46:02,247 | INFO | [main] | initial allocateMemoryForWrite = > 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1584) > 2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForSchema = > 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1585) > 2023-09-20 21:46:02,248 | INFO | [main] | initial allocateMemoryForConsensus > = 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1586) > 2023-09-20 21:46:02,248 | INFO | [main] | allocateMemoryForSchemaRegion = > 107374182 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1710) > 2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForSchemaCache = > 64424509 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1713) > 2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForPartitionCache = > 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1717) > 2023-09-20 21:46:02,250 | INFO | [main] | allocateMemoryForLastCache = > 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor > (IoTDBDescriptor.java:1720) > 2023-09-20 21:46:02,257 | INFO | [main] | try loading > iotdb-common.properties from > /opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties > | org.apache.iotdb.tsfile.common.conf.TSFileDescriptor > (TSFileDescriptor.java:135) > 2023-09-20 21:46:02,388 | INFO | [main] | IoTDB enable memory control: true > | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:383) > 2023-09-20 21:46:02,492 | INFO | [main] | IoTDB-DataNode environment > variables: > > IOTDB_HOME=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/install/FusionInsight-IoTDB-1.1.0/iotdb; > IOTDB_CONF=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc; > IOTDB_DATA_HOME=null; | org.apache.iotdb.db.service.DataNode > (DataNode.java:150) > 2023-09-20 21:46:02,777 | INFO | [main] | new single scheduled thread pool: > Stateful-Trigger-Information-Updater | > org.apache.iotdb.commons.concurrent.IoTDBThreadPoolFactory > (IoTDBThreadPoolFactory.java:192) > 2023-09-20 21:46:02,781 | INFO | [main] | Running mode -s | > org.apache.iotdb.db.service.DataNodeServerCommandLine > (DataNodeServerCommandLine.java:96) > 2023-09-20 21:46:02,790 | INFO | [main] | Starting IoTDB > 1.1.0-h0.cbu.mrs.330.r3 (Build: 89ddf14-dev) | > org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:174) > 2023-09-20 21:46:02,815 | WARN | [main] | Failed to copy file from > /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp to > /srv/BigData/data1/iotdb/iotdbserver/data/system.properties | > org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:421) > 2023-09-20 21:46:02,822 | INFO | [main] | Start JMX remotely: JMX is enabled > to receive remote connection on port 22258 | > org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:80) > 2023-09-20 21:46:02,823 | INFO | [main] | JDK version is 8. | > org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:49) > 2023-09-20 21:46:02,832 | INFO | [main] | Successfully update ConfigNode: > [TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, > port:22259), TEndPoint(ip:120.12.0.167, port:22259)]. | > org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96) > 2023-09-20 21:46:02,835 | INFO | [main] | Pulling system configurations from > the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode > (DataNode.java:238) > 2023-09-20 21:46:03,514 | WARN | [main] | Failed to connect to ConfigNode > TEndPoint(ip:120.12.0.167, port:22259) from DataNode > TEndPoint(ip:120.12.0.167, port:22260), because the current node is not > leader, try next node | org.apache.iotdb.db.client.ConfigNodeClient > (ConfigNodeClient.java:308) > 2023-09-20 21:46:04,760 | INFO | [main] | Create system.properties.tmp > /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp. | > org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:537) > 2023-09-20 21:46:04,764 | INFO | [main] | Successfully pull system > configurations from ConfigNode-leader. | org.apache.iotdb.db.service.DataNode > (DataNode.java:306) > 2023-09-20 21:46:04,764 | INFO | [main] | Sending restart request to > ConfigNode-leader... | org.apache.iotdb.db.service.DataNode > (DataNode.java:405) > 2023-09-20 21:46:04,807 | ERROR | [main] | Fail to start server | > {color:#FF0000}org.apache.iotdb.db.service.DataNode (DataNode.java:189) > {color} > {color:#FF0000}org.apache.iotdb.commons.exception.StartupException: Reject > DataNode restart. Because the nodeId of the current DataNode is -1. Possible > solutions are as follows:{color} > {color:#FF0000} 1. Delete "data" dir and retry.{color} > {color:#FF0000} at > org.apache.iotdb.db.service.DataNode.sendRestartRequestToConfigNode(DataNode.java:452){color} > at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:171) > at > org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100) > at > org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64) > at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151) > at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17) > 2023-09-20 21:46:04,808 | INFO | [main] | Deactivating IoTDB DataNode... | > org.apache.iotdb.db.service.DataNode (DataNode.java:864) -- This message was sent by Atlassian Jira (v8.20.10#820010)