Why you have set this again in hdfs-site.xml at the end. <property> <name>dfs.namenode.rpc-address</name> <value>nn1:8020</value> </property>
Remove this and start name node again. Regards Susheel Kumar On Tue, 3 Oct 2023, 10:09 pm Harry Jamison, <harryjamiso...@yahoo.com.invalid> wrote: > OK here is where I am at now. > > When I start the namenodes, they work, but they are all in standby mode. > When I start my first datanode it seems to kill one of the namenodes (the > active one I assume) > > I am getting 2 different warnings in the namenode > > [2023-10-03 09:03:52,162] WARN Unable to initialize > FileSignerSecretProvider, falling back to use random secrets. Reason: Could > not read signature secret file: /root/hadoop-http-auth-signature-secret > (org.apache.hadoop.security.authentication.server.AuthenticationFilter) > > [2023-10-03 09:03:52,350] WARN Only one image storage directory > (dfs.namenode.name.dir) configured. Beware of data loss due to lack of > redundant storage directories! > (org.apache.hadoop.hdfs.server.namenode.FSNamesystem) > > I am using a journal node, so I am not clear if I am supposed to have > multiple dfs.namenode.name.dir directories > I thought each namenode has 1 directory. > > > Susheel Kumar Gadalay said that my shared.edits.dir Is wrong, but I am > not clear how it is wrong > From here mine looks right > > https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html > > This is what is in the logs right before the namenode dies > [2023-10-03 09:01:22,054] INFO Listener at vmnode3:8020 > (org.apache.hadoop.ipc.Server) > [2023-10-03 09:01:22,054] INFO Starting Socket Reader #1 for port 8020 > (org.apache.hadoop.ipc.Server) > [2023-10-03 09:01:22,097] INFO Registered FSNamesystemState, > ReplicatedBlocksState and ECBlockGroupsState MBeans. > (org.apache.hadoop.hdfs.server.namenode.FSNamesystem) > [2023-10-03 09:01:22,119] INFO Number of blocks under construction: 0 > (org.apache.hadoop.hdfs.server.namenode.LeaseManager) > [2023-10-03 09:01:22,122] INFO Initialized the Default Decommission and > Maintenance monitor > (org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminDefaultMonitor) > [2023-10-03 09:01:22,131] INFO STATE* Leaving safe mode after 0 secs > (org.apache.hadoop.hdfs.StateChange) > [2023-10-03 09:01:22,131] INFO STATE* Network topology has 0 racks and 0 > datanodes (org.apache.hadoop.hdfs.StateChange) > [2023-10-03 09:01:22,131] INFO STATE* UnderReplicatedBlocks has 0 blocks > (org.apache.hadoop.hdfs.StateChange) > [2023-10-03 09:01:22,130] INFO Start MarkedDeleteBlockScrubber thread > (org.apache.hadoop.hdfs.server.blockmanagement.BlockManager) > [2023-10-03 09:01:22,158] INFO IPC Server Responder: starting > (org.apache.hadoop.ipc.Server) > [2023-10-03 09:01:22,159] INFO IPC Server listener on 8020: starting > (org.apache.hadoop.ipc.Server) > [2023-10-03 09:01:22,165] INFO NameNode RPC up at: vmnode3/ > 192.168.1.103:8020 (org.apache.hadoop.hdfs.server.namenode.NameNode) > [2023-10-03 09:01:22,166] INFO Starting services required for standby > state (org.apache.hadoop.hdfs.server.namenode.FSNamesystem) > [2023-10-03 09:01:22,168] INFO Will roll logs on active node every 120 > seconds. (org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer) > [2023-10-03 09:01:22,171] INFO Starting standby checkpoint thread... > Checkpointing active NN to possible NNs: [http://vmnode1:9870, > http://vmnode2:9870] > Serving checkpoints at http://vmnode3:9870 > (org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer) > real-time non-blocking time (microseconds, -R) unlimited > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 15187 > max locked memory (kbytes, -l) 8192 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 15187 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > > > > > > > On Tuesday, October 3, 2023 at 03:54:23 AM PDT, Liming Cui < > anyone.cui...@gmail.com> wrote: > > > Harry, > > Great question. > I would say the same configurations in core-site.xml and hdfs-site.xml > will be overwriting each other in some way. > > Glad you found the root cause. > > Keep going. > > On Tue, Oct 3, 2023 at 10:27 AM Harry Jamison <harryjamiso...@yahoo.com> > wrote: > > Liming > > After looking at my config, I think that maybe my problem is because my > fs.defaultFS > is inconsistent between hdfs-site.xml and core-site.xml > What does hdfs-site.xml vs core-site.xml do why is the same setting in 2 > different places? > Or do I just have it there mistakenly? > > this is what I have in hdfs-site.xml > > <?xml version="1.0" encoding="UTF-8"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > <configuration> > <property> > <name>fs.defaultFS</name> > <value>hdfs://mycluster</value> > </property> > <property> > <name>ha.zookeeper.quorum</name> > <value>nn1:2181,nn2:2181,nn3:2181</value> > </property> > > <property> > <name>dfs.nameservices</name> > <value>mycluster</value> > </property> > > <property> > <name>dfs.ha.namenodes.mycluster</name> > <value>nn1,nn2,nn3</value> > </property> > > <property> > <name>dfs.namenode.rpc-address.mycluster.nn1</name> > <value>nn1:8020</value> > </property> > <property> > <name>dfs.namenode.rpc-address.mycluster.nn2</name> > <value>nn2:8020</value> > </property> > <property> > <name>dfs.namenode.rpc-address.mycluster.nn3</name> > <value>nn3:8020</value> > </property> > > <property> > <name>dfs.namenode.http-address.mycluster.nn1</name> > <value>nn1:9870</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster.nn2</name> > <value>nn2:9870</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster.nn3</name> > <value>nn3:9870</value> > </property> > > <property> > <name>dfs.namenode.shared.edits.dir</name> > <value>qjournal://nn1:8485;nn2:8485;nn3:8485/mycluster</value> > </property> > <property> > <name>dfs.client.failover.proxy.provider.mycluster</name> > > <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> > </property> > > <property> > <name>dfs.ha.fencing.methods</name> > <value>sshfence</value> > </property> > > <property> > <name>dfs.ha.fencing.ssh.private-key-files</name> > <value>/home/harry/.ssh/id_rsa</value> > </property> > > <property> > <name>dfs.namenode.name.dir</name> > <value>file:/hadoop/data/hdfs/namenode</value> > </property> > <property> > <name>dfs.datanode.data.dir</name> > <value>file:/hadoop/data/hdfs/datanode</value> > </property> > <property> > <name>dfs.journalnode.edits.dir</name> > <value>/hadoop/data/hdfs/journalnode</value> > </property> > <property> > <name>dfs.namenode.rpc-address</name> > <value>nn1:8020</value> > </property> > > <property> > <name>dfs.ha.nn.not-become-active-in-safemode</name> > <value>true</value> > </property> > > </configuration> > > > > In core-site.xml I have this > > <?xml version="1.0" encoding="UTF-8"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- > > Licensed under the Apache License, Version 2.0 (the "License"); > > you may not use this file except in compliance with the License. > > You may obtain a copy of the License at > > > http://www.apache.org/licenses/LICENSE-2.0 > > > Unless required by applicable law or agreed to in writing, software > > distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the License for the specific language governing permissions and > > limitations under the License. See accompanying LICENSE file. > > --> > > > <!-- Put site-specific property overrides in this file. --> > > > <configuration> > > <property> > > <name>fs.defaultFS</name> > > <value>hdfs://nn1:8020</value> > > </property> > > > </configuration> > > > On Tuesday, October 3, 2023 at 12:54:26 AM PDT, Liming Cui < > anyone.cui...@gmail.com> wrote: > > > Can you show us the configuration files? > Maybe I can help you with some suggestions. > > > On Tue, Oct 3, 2023 at 9:05 AM Harry Jamison > <harryjamiso...@yahoo.com.invalid> wrote: > > I am trying to setup a HA HDFS cluster, and I am running into a problem > > I am not sure what I am doing wrong, I thought I followed the HA namenode > guide, but it is not working. > > > Apache Hadoop 3.3.6 – HDFS High Availability > <https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html> > > > > I have 2 namenodes and 3 journal nodes, and 3 zookeeper nodes. > > After some period of time I see the following and my namenode and journal > node die. > I am not sure where the problem is, or how to diagnose what I am doing > wrong here. And the logging here does not make sense to me. > > Namenode > Serving checkpoints at http://nn1:9870 > (org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer) > > real-time non-blocking time (microseconds, -R) unlimited > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 15187 > > max locked memory (kbytes, -l) 8192 > > max memory size (kbytes, -m) unlimited > > open files (-n) 1024 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 8192 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 15187 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > [2023-10-02 23:53:46,693] ERROR RECEIVED SIGNAL 15: SIGTERM > (org.apache.hadoop.hdfs.server.namenode.NameNode) > > [2023-10-02 23:53:46,701] INFO SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.1.159 > > ************************************************************/ > (org.apache.hadoop.hdfs.server.namenode.NameNode) > > JournalNode > [2023-10-02 23:54:19,162] WARN Journal at nn1/192.168.1.159:8485 has no > edit logs (org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer) > > real-time non-blocking time (microseconds, -R) unlimited > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 15187 > > max locked memory (kbytes, -l) 8192 > > max memory size (kbytes, -m) unlimited > > open files (-n) 1024 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 8192 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 15187 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > > > > -- > *Best* > > Liming > > > > -- > *Best* > > Liming >