[ 
https://issues.apache.org/jira/browse/IOTDB-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangShaowei reassigned IOTDB-6167:
-----------------------------------

    Assignee: QiangShaowei

> DataNode can't register to cluster when fetch system configuration throws NPE
> -----------------------------------------------------------------------------
>
>                 Key: IOTDB-6167
>                 URL: https://issues.apache.org/jira/browse/IOTDB-6167
>             Project: Apache IoTDB
>          Issue Type: Bug
>          Components: Core/Cluster
>            Reporter: QiangShaowei
>            Assignee: QiangShaowei
>            Priority: Major
>             Fix For: master branch
>
>
> In some special circumstances,DataNode register failed.
> the reason is : when DN fistst register , it will  fetch system configuration 
> from ConfigNode, if ConfigNode has some error or leader is not ready. the 
> fetched configuration will be null, so PNE will abort DN register process, 
> and the
> 'SYSTEM_PROPERTIES.deleteOnExit();' skiped.
> so when restart the DN again , it restart failed beacause nodeId is -1
>  
> 在一些极端特殊的情况下,DN会注册失败
> 原因是,DN首次注册时,会从CN端拉取系统配置,如果碰巧CN有异常或者leader没有准备好,获取的系统配置是Null,DN侧没有判断就直接使用,会抛空指针异常,就中断了注册流程。跳过了'SYSTEM_PROPERTIES.deleteOnExit();'逻辑
> 当DN再次启动时,由于system.properties存在,不被认为是首次重启,但是nodeId是-1,所以启动失败。
>  
> DN log info:
>  
> 2023-09-20 21:45:29,041 | INFO  | [main] | Successfully update ConfigNode: 
> [TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, 
> port:22259), TEndPoint(ip:120.12.0.167, port:22259)]. | 
> org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96) 
> 2023-09-20 21:45:29,042 | INFO  | [main] | Pulling system configurations from 
> the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode 
> (DataNode.java:238) 
> 2023-09-20 21:45:29,550 | ERROR | [main] | Failed to execute system command | 
> org.apache.iotdb.commons.ServerCommandLine (ServerCommandLine.java:69) 
> {color:#FF0000}java.lang.NullPointerException: null{color}
> {color:#FF0000}    at 
> org.apache.iotdb.db.conf.IoTDBDescriptor.loadGlobalConfig(IoTDBDescriptor.java:1930){color}
>     at 
> org.apache.iotdb.db.service.DataNode.pullAndCheckSystemConfigurations(DataNode.java:275)
>     at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:164)
>     at 
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
>     at 
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
>     at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
>     at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
> 2023-09-20 21:46:02,198 | INFO  | [main] | Start to read config file 
> file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
>  | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:164) 
> 2023-09-20 21:46:02,221 | INFO  | [main] | Start to read config file 
> file:/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-datanode.properties
>  | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:181) 
> 2023-09-20 21:46:02,247 | INFO  | [main] | initial allocateMemoryForRead = 
> 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1583) 
> 2023-09-20 21:46:02,247 | INFO  | [main] | initial allocateMemoryForWrite = 
> 644245094 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1584) 
> 2023-09-20 21:46:02,248 | INFO  | [main] | initial allocateMemoryForSchema = 
> 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1585) 
> 2023-09-20 21:46:02,248 | INFO  | [main] | initial allocateMemoryForConsensus 
> = 214748364 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1586) 
> 2023-09-20 21:46:02,248 | INFO  | [main] | allocateMemoryForSchemaRegion = 
> 107374182 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1710) 
> 2023-09-20 21:46:02,250 | INFO  | [main] | allocateMemoryForSchemaCache = 
> 64424509 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1713) 
> 2023-09-20 21:46:02,250 | INFO  | [main] | allocateMemoryForPartitionCache = 
> 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1717) 
> 2023-09-20 21:46:02,250 | INFO  | [main] | allocateMemoryForLastCache = 
> 21474836 | org.apache.iotdb.db.conf.IoTDBDescriptor 
> (IoTDBDescriptor.java:1720) 
> 2023-09-20 21:46:02,257 | INFO  | [main] | try loading 
> iotdb-common.properties from 
> /opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc/iotdb-common.properties
>  | org.apache.iotdb.tsfile.common.conf.TSFileDescriptor 
> (TSFileDescriptor.java:135) 
> 2023-09-20 21:46:02,388 | INFO  | [main] | IoTDB enable memory control: true 
> | org.apache.iotdb.db.conf.IoTDBDescriptor (IoTDBDescriptor.java:383) 
> 2023-09-20 21:46:02,492 | INFO  | [main] | IoTDB-DataNode environment 
> variables: 
>     
> IOTDB_HOME=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/install/FusionInsight-IoTDB-1.1.0/iotdb;
>     IOTDB_CONF=/opt/Bigdata/FusionInsight_IoTDB_8.3.0/1_13_IoTDBServer/etc;
>     IOTDB_DATA_HOME=null; | org.apache.iotdb.db.service.DataNode 
> (DataNode.java:150) 
> 2023-09-20 21:46:02,777 | INFO  | [main] | new single scheduled thread pool: 
> Stateful-Trigger-Information-Updater | 
> org.apache.iotdb.commons.concurrent.IoTDBThreadPoolFactory 
> (IoTDBThreadPoolFactory.java:192) 
> 2023-09-20 21:46:02,781 | INFO  | [main] | Running mode -s | 
> org.apache.iotdb.db.service.DataNodeServerCommandLine 
> (DataNodeServerCommandLine.java:96) 
> 2023-09-20 21:46:02,790 | INFO  | [main] | Starting IoTDB 
> 1.1.0-h0.cbu.mrs.330.r3 (Build: 89ddf14-dev) | 
> org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:174) 
> 2023-09-20 21:46:02,815 | WARN  | [main] | Failed to copy file from 
> /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp to 
> /srv/BigData/data1/iotdb/iotdbserver/data/system.properties | 
> org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:421) 
> 2023-09-20 21:46:02,822 | INFO  | [main] | Start JMX remotely: JMX is enabled 
> to receive remote connection on port 22258 | 
> org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:80) 
> 2023-09-20 21:46:02,823 | INFO  | [main] | JDK version is 8. | 
> org.apache.iotdb.commons.service.StartupChecks (StartupChecks.java:49) 
> 2023-09-20 21:46:02,832 | INFO  | [main] | Successfully update ConfigNode: 
> [TEndPoint(ip:120.12.0.206, port:22259), TEndPoint(ip:120.12.0.2, 
> port:22259), TEndPoint(ip:120.12.0.167, port:22259)]. | 
> org.apache.iotdb.db.client.ConfigNodeInfo (ConfigNodeInfo.java:96) 
> 2023-09-20 21:46:02,835 | INFO  | [main] | Pulling system configurations from 
> the ConfigNode-leader... | org.apache.iotdb.db.service.DataNode 
> (DataNode.java:238) 
> 2023-09-20 21:46:03,514 | WARN  | [main] | Failed to connect to ConfigNode 
> TEndPoint(ip:120.12.0.167, port:22259) from DataNode 
> TEndPoint(ip:120.12.0.167, port:22260), because the current node is not 
> leader, try next node | org.apache.iotdb.db.client.ConfigNodeClient 
> (ConfigNodeClient.java:308) 
> 2023-09-20 21:46:04,760 | INFO  | [main] | Create system.properties.tmp 
> /srv/BigData/data1/iotdb/iotdbserver/system/schema/system.properties.tmp. | 
> org.apache.iotdb.db.conf.IoTDBStartCheck (IoTDBStartCheck.java:537) 
> 2023-09-20 21:46:04,764 | INFO  | [main] | Successfully pull system 
> configurations from ConfigNode-leader. | org.apache.iotdb.db.service.DataNode 
> (DataNode.java:306) 
> 2023-09-20 21:46:04,764 | INFO  | [main] | Sending restart request to 
> ConfigNode-leader... | org.apache.iotdb.db.service.DataNode 
> (DataNode.java:405) 
> 2023-09-20 21:46:04,807 | ERROR | [main] | Fail to start server | 
> {color:#FF0000}org.apache.iotdb.db.service.DataNode (DataNode.java:189) 
> {color}
> {color:#FF0000}org.apache.iotdb.commons.exception.StartupException: Reject 
> DataNode restart. Because the nodeId of the current DataNode is -1. Possible 
> solutions are as follows:{color}
> {color:#FF0000}    1. Delete "data" dir and retry.{color}
> {color:#FF0000}    at 
> org.apache.iotdb.db.service.DataNode.sendRestartRequestToConfigNode(DataNode.java:452){color}
>     at org.apache.iotdb.db.service.DataNode.doAddNode(DataNode.java:171)
>     at 
> org.apache.iotdb.db.service.DataNodeServerCommandLine.run(DataNodeServerCommandLine.java:100)
>     at 
> org.apache.iotdb.commons.ServerCommandLine.doMain(ServerCommandLine.java:64)
>     at org.apache.iotdb.db.service.DataNode.main(DataNode.java:151)
>     at com.huawei.iotdb.IoTDBServer.main(IoTDBServer.java:17)
> 2023-09-20 21:46:04,808 | INFO  | [main] | Deactivating IoTDB DataNode... | 
> org.apache.iotdb.db.service.DataNode (DataNode.java:864) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to