I hope this clears up the confusion: First, this is what the hosts file looks like: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.200.144 datanode10.localdomain.com 192.168.200.143 namenode.localdomain.com namenode 192.168.200.107 datanode01.localdomain.com
I have re-started all the nodes in the cluster. Several times. I can reach all nodes from all nodes using ping and their fully qualified domain names I can reach the data nodes from the name node using password-less ssh. I have changed the name of the machines to match their names in the hosts files on each machine. I have checked the /etc/ambari-agnet/conf/ambari-agent.ini file. David Novogrodsky [email protected] http://www.linkedin.com/in/davidnovogrodsky On Tue, Dec 16, 2014 at 10:20 PM, Devopam Mittra <[email protected]> wrote: > > forgive me if i sound rude , but please re-read the installation > instructions properly - it should help you in your case positively. > > 1. have a sound naming convention for all your boxes. e.g.: > namenode01.localdomain , datanode01.localdomain , datanode0N.localdomain , > this will help you much in your future expansion and maintenance of your > cluster > 2. do not , by any means , tamper with /etc/hosts for 127.0.0.1 and ::1 , > let it be localhost keyword only as you don't want to change that in the > first place ... so don't play around with that one . it will help you to > otherwise maintain normal operations on your box as well , otherwise for > every internal lookup of OS functions it will only create issues > 3. if you have a DHCP + very good DNS server in place, then okay , else , > assign static IPs to your machines and create one entry for each box with > the FQDN and static IP address , replicated on ALL the boxes > 4. set up keyless ssh login for root or any other uniform localuser that > you want to use and manage ambari + hadoop > 5. confirm that namenode and the ambari server machines (in case they are > different for you) can talk to ALL the machines using a keyless login for > that universal user you have created in above steps. > > hope the above will help you to sort out the issue in a single go. > > regards > Dev > > > > On Tue, Dec 16, 2014 at 11:45 PM, David Novogrodsky < > [email protected]> wrote: >> >> There is nothing simply done in Ambari. :) >> >> By changing the name of this computer and restarting the namenode Ambari >> does not recogize any node. The main error I am wondering about is this: >> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server at >> https://namenode.localdomain:8440 (98.124.198.1) >> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to >> https://namenode.localdomain:8440/ca >> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to >> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >> refused >> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at >> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >> seconds... >> ', None) >> Why is Ambari using namenode.localdomain to connect? >> >> I am running Ambari on this node; I am running Ambari on the namenode of >> this cluster. The host file for this computer is this: >> GNU nano 2.0.9 File: >> /etc/hosts >> >> 127.0.0.1 localhost localhost.localdomain localhost4 >> localhost4.localdomain4 >> ::1 localhost localhost.localdomain localhost6 >> localhost6.localdomain6 >> 192.168.200.144 localhost.datanode10 >> 192.168.200.107 localhost.datanode01 >> 192.168.200.143 namenode.localdomain.com namenode >> >> The Ambari wizard said I needed to use fully qualified domain names, so >> >> What follows is a detailed log of the registration log. I get this error >> in the registration log for namenode.localdomain.com: >> -- >> ========================== >> Creating target directory... >> ========================== >> >> Command start time 2014-12-16 12:02:18 >> >> Connection to namenode.localdomain.com closed. >> SSH command execution finished >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:18 >> >> ========================== >> Copying common functions script... >> ========================== >> >> Command start time 2014-12-16 12:02:18 >> >> scp /usr/lib/python2.6/site-packages/ambari_commons >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:18 >> >> ========================== >> Copying OS type check script... >> ========================== >> >> Command start time 2014-12-16 12:02:18 >> >> scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:18 >> >> ========================== >> Running OS type check... >> ========================== >> >> Command start time 2014-12-16 12:02:18 >> Cluster primary/cluster OS type is redhat6 and local/current OS type is >> redhat6 >> >> Connection to namenode.localdomain.com closed. >> SSH command execution finished >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:19 >> >> ========================== >> Checking 'sudo' package on remote host... >> ========================== >> >> Command start time 2014-12-16 12:02:19 >> sudo-1.8.6p3-15.el6.x86_64 >> >> Connection to namenode.localdomain.com closed. >> SSH command execution finished >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:20 >> >> ========================== >> Copying repo file to 'tmp' folder... >> ========================== >> >> Command start time 2014-12-16 12:02:20 >> >> scp /etc/yum.repos.d/ambari.repo >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:20 >> >> ========================== >> Moving file to repo dir... >> ========================== >> >> Command start time 2014-12-16 12:02:20 >> >> Connection to namenode.localdomain.com closed. >> SSH command execution finished >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:21 >> >> ========================== >> Copying setup script file... >> ========================== >> >> Command start time 2014-12-16 12:02:21 >> >> scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:21 >> >> ========================== >> Running setup agent script... >> ========================== >> >> Command start time 2014-12-16 12:02:21 >> Verifying Python version compatibility... >> Using python /usr/bin/python2.6 >> Found ambari-agent PID: 5036 >> Stopping ambari-agent >> Removing PID file at /var/run/ambari-agent/ambari-agent.pid >> ambari-agent successfully stopped >> Restarting ambari-agent >> Verifying Python version compatibility... >> Using python /usr/bin/python2.6 >> ambari-agent is not running. No PID found at >> /var/run/ambari-agent/ambari-agent.pid >> Verifying Python version compatibility... >> Using python /usr/bin/python2.6 >> Checking for previously running Ambari Agent... >> Starting ambari-agent >> Verifying ambari-agent process status... >> Ambari Agent successfully started >> Agent PID at: /var/run/ambari-agent/ambari-agent.pid >> Agent out at: /var/log/ambari-agent/ambari-agent.out >> Agent log at: /var/log/ambari-agent/ambari-agent.log >> ('WARNING 2014-12-16 12:01:59,642 NetUtil.py:92 - Server at >> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >> seconds... >> INFO 2014-12-16 12:02:09,653 NetUtil.py:48 - Connecting to >> https://namenode.localdomain:8440/ca >> WARNING 2014-12-16 12:02:09,701 NetUtil.py:71 - Failed to connect to >> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >> refused >> WARNING 2014-12-16 12:02:09,701 NetUtil.py:92 - Server at >> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >> seconds... >> INFO 2014-12-16 12:02:19,711 NetUtil.py:48 - Connecting to >> https://namenode.localdomain:8440/ca >> WARNING 2014-12-16 12:02:19,770 NetUtil.py:71 - Failed to connect to >> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >> refused >> WARNING 2014-12-16 12:02:19,770 NetUtil.py:92 - Server at >> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >> seconds... >> INFO 2014-12-16 12:02:22,680 main.py:83 - loglevel=logging.INFO >> INFO 2014-12-16 12:02:22,681 main.py:55 - signal received, exiting. >> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:39 - Removing pid file >> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:46 - Removing temp files >> INFO 2014-12-16 12:02:29,532 main.py:83 - loglevel=logging.INFO >> INFO 2014-12-16 12:02:29,533 DataCleaner.py:36 - Data cleanup thread >> started >> INFO 2014-12-16 12:02:29,534 DataCleaner.py:117 - Data cleanup started >> INFO 2014-12-16 12:02:29,542 DataCleaner.py:119 - Data cleanup finished >> INFO 2014-12-16 12:02:29,667 PingPortListener.py:51 - Ping port listener >> started on port: 8670 >> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server at >> https://namenode.localdomain:8440 (98.124.198.1) >> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to >> https://namenode.localdomain:8440/ca >> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to >> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >> refused >> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at >> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >> seconds... >> ', None) >> >> Connection to namenode.localdomain.com closed. >> SSH command execution finished >> host=namenode.localdomain.com, exitcode=0 >> Command end time 2014-12-16 12:02:32 >> >> Registering with the server... >> Registration with the server failed. >> ---- >> >> David Novogrodsky >> [email protected] >> http://www.linkedin.com/in/davidnovogrodsky >> >> On Mon, Dec 15, 2014 at 10:02 PM, Devopam Mittra <[email protected]> >> wrote: >>> >>> May I suggest you simply do a ssh -l <keylessusername> using the >>> previous and the new FQDNs that you have defined to verify which one is in >>> effect, and accessible ? >>> Also, since you changed the FQDN, you may wish to simply reboot the >>> cluster once, just to make sure that new ones are in-place. >>> It might happen that after the reboot you will need to redo the ssh >>> keyless pairing once again (most probably) >>> >>> regards >>> Devopam >>> >>> >>> On Tue, Dec 16, 2014 at 4:32 AM, David Novogrodsky < >>> [email protected]> wrote: >>>> >>>> The changes I am making in the hosts file are not being picked up by >>>> the installation scripts of Ambari. I was told I could make changes to the >>>> hosts file and that Ambari would see them. I have >>>> checked the etc/ambari-agent/conf/ambari-agent.ini file and the changes >>>> I made to the hosts file are not showing up in that file. Where is Ambari >>>> getting the names for the other nodes in the cluster? >>>> >>>> Here are the changes I made to the hosts file on the host for the name >>>> node: >>>> 127.0.0.1 localhost localhost.localdomain localhost4 >>>> localhost4.localdomain4 >>>> ::1 localhost localhost.localdomain localhost6 >>>> localhost6.localdomain6 >>>> 192.168.200.144 datanode10.localdomain >>>> 192.168.200.107 datanode01.localdomain >>>> 192.168.200.143 namenode.localdomain namenode >>>> >>>> Since I made these changes Ambari can not discover any of the nodes in >>>> the network. None of them. >>>> >>>> I have not made these changes to the other nodes because I do not want >>>> to make changes to the other nodes until I can see Ambari discover the host >>>> it is sitting upon. >>>> >>>> Regarding the commands you mentioned, here are the results: >>>> [root@localhost conf]# hostname -f >>>> hostname: Unknown host >>>> [root@localhost conf]# hostname >>>> localhost.namenode >>>> [root@localhost conf]# python -c 'import socket; print >>>> socket.getfqdn()' >>>> localhost.namenode >>>> >>>> localhost.namenode was the name for I used for this host during the >>>> installation of CentOS. I thought you said i could make changes to the >>>> hosts file and the installation scripts would recognize them? >>>> >>>> From the Confirm Hosts page I am getting the following errors: >>>> for connecting to the name node >>>> >>>> STDOUT: {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent >>>> host >>>> cannot reach Ambari Server 'localhost.namenode:8080'. Please check the >>>> network >>>> connectivity between the Ambari Agent host and the Ambari Server"} >>>> >>>> for connecting to the datanode10 >>>> >>>> INFO 2014-12-15 16:42:33,348 DataCleaner.py:36 - Data cleanup thread >>>> started >>>> ERROR 2014-12-15 16:42:33,349 main.py:137 - Ambari agent machine hostname >>>> (localhost.datanode10) does not match expected ambari server hostname >>>> (datanode10.localdomain). Aborting registration. Please check hostname, >>>> hostname -f and /etc/hosts file to confirm your hostname is setup correctly >>>> ', None) >>>> >>>> I am getting similiar error when trying to get to the datanode01. >>>> Please note I used the following domain names for the following datanodes >>>> when I installed the CentOS >>>> datanode 10 --> localhost.datanode10 >>>> datanode01 --> localhost.datanode01 >>>> >>>> >>>> >>>> >>>> >>>> David Novogrodsky >>>> [email protected] >>>> http://www.linkedin.com/in/davidnovogrodsky >>>> >>>> On Mon, Dec 15, 2014 at 11:50 AM, Yusaku Sako <[email protected]> >>>> wrote: >>>>> >>>>> Did you change the FQDNs like I proposed, like namenode.localdomain, >>>>> rather than localhost.namenode? >>>>> Did you ensure that the 3 commands returned the results as shown? >>>>> Can each host resolve all the other hosts by name? >>>>> >>>>> If you want to get a cluster up and running on VMs, the best bet is to >>>>> use: >>>>> https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide >>>>> >>>>> This sets up all /etc/hosts and other settings in the way you want. >>>>> Then you can see how these VMs are being set up and mimic on your VMs >>>>> if you'd rather set them up from scratch. >>>>> >>>>> I hope this helps. >>>>> Yusaku >>>>> >>>>> >>>>> On Mon, Dec 15, 2014 at 8:18 AM, David Novogrodsky < >>>>> [email protected]> wrote: >>>>>> >>>>>> Ok, I removed the multiple instances onf localhost.namenode. It now >>>>>> only appears on one line in the hosts file. >>>>>> >>>>>> The main ambari server still cannot see the data nodes nor the node >>>>>> Ambari is on. Ambari is on the namenode. When I run the install, the >>>>>> install program can not connect to any node in the network. >>>>>> >>>>>> Also I tried running /etc/init.d/network restart on one of the nodes; >>>>>> datanode10 ( a virtual machine). Now that node cannot connect to the >>>>>> internet....I would like to send you the information but I am having >>>>>> problems setting the document from the virtual machine. >>>>>> >>>>>> I do not have a DNS. These machines have hardwired IP addresses and >>>>>> names in the host file. Did runn /etc/init.d/network restart break the >>>>>> connection? >>>>>> >>>>>> >>>>>> David Novogrodsky >>>>>> [email protected] >>>>>> http://www.linkedin.com/in/davidnovogrodsky >>>>>> >>>>>> On Sat, Dec 13, 2014 at 12:46 AM, Yusaku Sako <[email protected] >>>>>> > wrote: >>>>>>> >>>>>>> You can just make the changes in /etc/hosts. You might also >>>>>>> change /etc/sysconfig/network and run /etc/init.d/network restart. >>>>>>> >>>>>>> Then make sure that running the 3 commands return expected results. >>>>>>> >>>>>>> Yusaku >>>>>>> >>>>>>> On Fri, Dec 12, 2014 at 9:06 PM, David Novogrodsky < >>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>> When I installed the CentOS on the machines, I chose those name, >>>>>>>> localhost.datanode01...and so on. You mean I have to reinstall CentOS >>>>>>>> on >>>>>>>> the machines again? >>>>>>>> >>>>>>>> Can I just make the changes in the host files? >>>>>>>> >>>>>>>> Will I need to recreate the SSH keys?. >>>>>>>> >>>>>>>> David Novogrodsky >>>>>>>> [email protected] >>>>>>>> http://www.linkedin.com/in/davidnovogrodsky >>>>>>>> >>>>>>>> On Fri, Dec 12, 2014 at 6:21 PM, Yusaku Sako < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I would set it up like this: >>>>>>>>> >>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4 >>>>>>>>> localhost4.localdomain4* <- do not list the hostname here. * >>>>>>>>> ::1 localhost localhost.localdomain localhost6 >>>>>>>>> localhost6.localdomain6 >>>>>>>>> xxx.xxx.200.144 datanode10.localdomain >>>>>>>>> xxx.xxx.200.107 datanode01.localdomain >>>>>>>>> xxx.xxx.200.143 namenode.localdomain namenode >>>>>>>>> >>>>>>>>> With this change: >>>>>>>>> * *hostname -f* should display *namenode.localdomain* >>>>>>>>> * *hostname* should display *namenode* >>>>>>>>> * *python -c 'import socket; print socket.getfqdn()' *should >>>>>>>>> display *namenode.localdomain* >>>>>>>>> >>>>>>>>> I hope this helps. >>>>>>>>> Yusaku >>>>>>>>> >>>>>>>>> On Fri, Dec 12, 2014 at 3:52 PM, David Novogrodsky < >>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> All, >>>>>>>>>> >>>>>>>>>> I am having a problem with Ambari. >>>>>>>>>> I am trying to use Ambari to install Hadoop to a three node >>>>>>>>>> cluster. the name node is where the Ambari server is located. I am >>>>>>>>>> getting >>>>>>>>>> this error: >>>>>>>>>> ERROR 2014-12-12 17:39:56,963 main.py:137 – Ambari agent machine >>>>>>>>>> hostname (localhost.localdomain) does not match expected ambari >>>>>>>>>> server >>>>>>>>>> hostname (namenode). Aborting registration. Please check hostname, >>>>>>>>>> hostname >>>>>>>>>> -f and /etc/hosts file to confirm your hostname is setup correctly >>>>>>>>>> ‘, None) >>>>>>>>>> >>>>>>>>>> Here is the contents of my hosts file: >>>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4 >>>>>>>>>> localhost4.localdomain4 localhost.namenode namenode >>>>>>>>>> ::1 localhost localhost.localdomain localhost6 >>>>>>>>>> localhost6.localdomain6 >>>>>>>>>> xxx.xxx.200.144 localhost.datanode10 >>>>>>>>>> xxx.xxx.200.107 localhost.datanode01 >>>>>>>>>> xxx.xxx.200.143 localhost.namenode namenode >>>>>>>>>> >>>>>>>>>> I am not sure what the problem is. Since there are only four >>>>>>>>>> steps to run ambari there is not a lot of background to determine >>>>>>>>>> the cause >>>>>>>>>> of this problem. >>>>>>>>>> >>>>>>>>>> David Novogrodsky >>>>>>>>>> [email protected] >>>>>>>>>> http://www.linkedin.com/in/davidnovogrodsky >>>>>>>>>> >>>>>>>>> >>>>>>>>> CONFIDENTIALITY NOTICE >>>>>>>>> NOTICE: This message is intended for the use of the individual or >>>>>>>>> entity to which it is addressed and may contain information that is >>>>>>>>> confidential, privileged and exempt from disclosure under applicable >>>>>>>>> law. >>>>>>>>> If the reader of this message is not the intended recipient, you are >>>>>>>>> hereby >>>>>>>>> notified that any printing, copying, dissemination, distribution, >>>>>>>>> disclosure or forwarding of this communication is strictly >>>>>>>>> prohibited. If >>>>>>>>> you have received this communication in error, please contact the >>>>>>>>> sender >>>>>>>>> immediately and delete it from your system. Thank You. >>>>>>>> >>>>>>>> >>>>>>> CONFIDENTIALITY NOTICE >>>>>>> NOTICE: This message is intended for the use of the individual or >>>>>>> entity to which it is addressed and may contain information that is >>>>>>> confidential, privileged and exempt from disclosure under applicable >>>>>>> law. >>>>>>> If the reader of this message is not the intended recipient, you are >>>>>>> hereby >>>>>>> notified that any printing, copying, dissemination, distribution, >>>>>>> disclosure or forwarding of this communication is strictly prohibited. >>>>>>> If >>>>>>> you have received this communication in error, please contact the sender >>>>>>> immediately and delete it from your system. Thank You. >>>>>>> >>>>>> >>>>> CONFIDENTIALITY NOTICE >>>>> NOTICE: This message is intended for the use of the individual or >>>>> entity to which it is addressed and may contain information that is >>>>> confidential, privileged and exempt from disclosure under applicable law. >>>>> If the reader of this message is not the intended recipient, you are >>>>> hereby >>>>> notified that any printing, copying, dissemination, distribution, >>>>> disclosure or forwarding of this communication is strictly prohibited. If >>>>> you have received this communication in error, please contact the sender >>>>> immediately and delete it from your system. Thank You. >>>>> >>>> >>> >>> -- >>> Devopam Mittra >>> Life and Relations are not binary >>> >> > > -- > Devopam Mittra > Life and Relations are not binary >
