I am following the startup procedure listed here: https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+1.7.0+from+Public+Repositories. I am trying to install HDP 2.1. Please note that these install instructions mention nothing about iptables or selinux. I am using CentOS 6 machines.
I tried changing the hosts file on all the machines to this: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 xxx.xxx.200.144 datanode10.localdomain.com xxx.xxx.200.107 datanode01.localdomain.com xxx.xxx.200.143 namenode.localdomain.com ---- (just trying to hide my IP addresses) The error is the same David Novogrodsky david.novogrod...@gmail.com http://www.linkedin.com/in/davidnovogrodsky On Wed, Dec 17, 2014 at 9:05 AM, David Novogrodsky < david.novogrod...@gmail.com> wrote: > > I hope this clears up the confusion: > > First, this is what the hosts file looks like: > 127.0.0.1 localhost localhost.localdomain localhost4 > localhost4.localdomain4 > ::1 localhost localhost.localdomain localhost6 > localhost6.localdomain6 > 192.168.200.144 datanode10.localdomain.com > 192.168.200.143 namenode.localdomain.com namenode > 192.168.200.107 datanode01.localdomain.com > > I have re-started all the nodes in the cluster. Several times. > I can reach all nodes from all nodes using ping and their fully qualified > domain names > I can reach the data nodes from the name node using password-less ssh. > I have changed the name of the machines to match their names in the hosts > files on each machine. > I have checked the /etc/ambari-agnet/conf/ambari-agent.ini file. > > David Novogrodsky > david.novogrod...@gmail.com > http://www.linkedin.com/in/davidnovogrodsky > > On Tue, Dec 16, 2014 at 10:20 PM, Devopam Mittra <devo...@gmail.com> > wrote: >> >> forgive me if i sound rude , but please re-read the installation >> instructions properly - it should help you in your case positively. >> >> 1. have a sound naming convention for all your boxes. e.g.: >> namenode01.localdomain , datanode01.localdomain , datanode0N.localdomain , >> this will help you much in your future expansion and maintenance of your >> cluster >> 2. do not , by any means , tamper with /etc/hosts for 127.0.0.1 and ::1 , >> let it be localhost keyword only as you don't want to change that in the >> first place ... so don't play around with that one . it will help you to >> otherwise maintain normal operations on your box as well , otherwise for >> every internal lookup of OS functions it will only create issues >> 3. if you have a DHCP + very good DNS server in place, then okay , else , >> assign static IPs to your machines and create one entry for each box with >> the FQDN and static IP address , replicated on ALL the boxes >> 4. set up keyless ssh login for root or any other uniform localuser that >> you want to use and manage ambari + hadoop >> 5. confirm that namenode and the ambari server machines (in case they are >> different for you) can talk to ALL the machines using a keyless login for >> that universal user you have created in above steps. >> >> hope the above will help you to sort out the issue in a single go. >> >> regards >> Dev >> >> >> >> On Tue, Dec 16, 2014 at 11:45 PM, David Novogrodsky < >> david.novogrod...@gmail.com> wrote: >>> >>> There is nothing simply done in Ambari. :) >>> >>> By changing the name of this computer and restarting the namenode >>> Ambari does not recogize any node. The main error I am wondering about is >>> this: >>> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server >>> at https://namenode.localdomain:8440 (98.124.198.1) >>> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to >>> https://namenode.localdomain:8440/ca >>> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to >>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >>> refused >>> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at >>> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >>> seconds... >>> ', None) >>> Why is Ambari using namenode.localdomain to connect? >>> >>> I am running Ambari on this node; I am running Ambari on the namenode of >>> this cluster. The host file for this computer is this: >>> GNU nano 2.0.9 File: >>> /etc/hosts >>> >>> 127.0.0.1 localhost localhost.localdomain localhost4 >>> localhost4.localdomain4 >>> ::1 localhost localhost.localdomain localhost6 >>> localhost6.localdomain6 >>> 192.168.200.144 localhost.datanode10 >>> 192.168.200.107 localhost.datanode01 >>> 192.168.200.143 namenode.localdomain.com namenode >>> >>> The Ambari wizard said I needed to use fully qualified domain names, so >>> >>> What follows is a detailed log of the registration log. I get this >>> error in the registration log for namenode.localdomain.com: >>> -- >>> ========================== >>> Creating target directory... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:18 >>> >>> Connection to namenode.localdomain.com closed. >>> SSH command execution finished >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:18 >>> >>> ========================== >>> Copying common functions script... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:18 >>> >>> scp /usr/lib/python2.6/site-packages/ambari_commons >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:18 >>> >>> ========================== >>> Copying OS type check script... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:18 >>> >>> scp /usr/lib/python2.6/site-packages/ambari_server/os_check_type.py >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:18 >>> >>> ========================== >>> Running OS type check... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:18 >>> Cluster primary/cluster OS type is redhat6 and local/current OS type is >>> redhat6 >>> >>> Connection to namenode.localdomain.com closed. >>> SSH command execution finished >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:19 >>> >>> ========================== >>> Checking 'sudo' package on remote host... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:19 >>> sudo-1.8.6p3-15.el6.x86_64 >>> >>> Connection to namenode.localdomain.com closed. >>> SSH command execution finished >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:20 >>> >>> ========================== >>> Copying repo file to 'tmp' folder... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:20 >>> >>> scp /etc/yum.repos.d/ambari.repo >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:20 >>> >>> ========================== >>> Moving file to repo dir... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:20 >>> >>> Connection to namenode.localdomain.com closed. >>> SSH command execution finished >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:21 >>> >>> ========================== >>> Copying setup script file... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:21 >>> >>> scp /usr/lib/python2.6/site-packages/ambari_server/setupAgent.py >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:21 >>> >>> ========================== >>> Running setup agent script... >>> ========================== >>> >>> Command start time 2014-12-16 12:02:21 >>> Verifying Python version compatibility... >>> Using python /usr/bin/python2.6 >>> Found ambari-agent PID: 5036 >>> Stopping ambari-agent >>> Removing PID file at /var/run/ambari-agent/ambari-agent.pid >>> ambari-agent successfully stopped >>> Restarting ambari-agent >>> Verifying Python version compatibility... >>> Using python /usr/bin/python2.6 >>> ambari-agent is not running. No PID found at >>> /var/run/ambari-agent/ambari-agent.pid >>> Verifying Python version compatibility... >>> Using python /usr/bin/python2.6 >>> Checking for previously running Ambari Agent... >>> Starting ambari-agent >>> Verifying ambari-agent process status... >>> Ambari Agent successfully started >>> Agent PID at: /var/run/ambari-agent/ambari-agent.pid >>> Agent out at: /var/log/ambari-agent/ambari-agent.out >>> Agent log at: /var/log/ambari-agent/ambari-agent.log >>> ('WARNING 2014-12-16 12:01:59,642 NetUtil.py:92 - Server at >>> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >>> seconds... >>> INFO 2014-12-16 12:02:09,653 NetUtil.py:48 - Connecting to >>> https://namenode.localdomain:8440/ca >>> WARNING 2014-12-16 12:02:09,701 NetUtil.py:71 - Failed to connect to >>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >>> refused >>> WARNING 2014-12-16 12:02:09,701 NetUtil.py:92 - Server at >>> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >>> seconds... >>> INFO 2014-12-16 12:02:19,711 NetUtil.py:48 - Connecting to >>> https://namenode.localdomain:8440/ca >>> WARNING 2014-12-16 12:02:19,770 NetUtil.py:71 - Failed to connect to >>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >>> refused >>> WARNING 2014-12-16 12:02:19,770 NetUtil.py:92 - Server at >>> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >>> seconds... >>> INFO 2014-12-16 12:02:22,680 main.py:83 - loglevel=logging.INFO >>> INFO 2014-12-16 12:02:22,681 main.py:55 - signal received, exiting. >>> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:39 - Removing pid file >>> INFO 2014-12-16 12:02:22,681 ProcessHelper.py:46 - Removing temp files >>> INFO 2014-12-16 12:02:29,532 main.py:83 - loglevel=logging.INFO >>> INFO 2014-12-16 12:02:29,533 DataCleaner.py:36 - Data cleanup thread >>> started >>> INFO 2014-12-16 12:02:29,534 DataCleaner.py:117 - Data cleanup started >>> INFO 2014-12-16 12:02:29,542 DataCleaner.py:119 - Data cleanup finished >>> INFO 2014-12-16 12:02:29,667 PingPortListener.py:51 - Ping port listener >>> started on port: 8670 >>> INFO 2014-12-16 12:02:29,669 main.py:233 - Connecting to Ambari server >>> at https://namenode.localdomain:8440 (98.124.198.1) >>> INFO 2014-12-16 12:02:29,670 NetUtil.py:48 - Connecting to >>> https://namenode.localdomain:8440/ca >>> WARNING 2014-12-16 12:02:29,718 NetUtil.py:71 - Failed to connect to >>> https://namenode.localdomain:8440/ca due to [Errno 111] Connection >>> refused >>> WARNING 2014-12-16 12:02:29,719 NetUtil.py:92 - Server at >>> https://namenode.localdomain:8440 is not reachable, sleeping for 10 >>> seconds... >>> ', None) >>> >>> Connection to namenode.localdomain.com closed. >>> SSH command execution finished >>> host=namenode.localdomain.com, exitcode=0 >>> Command end time 2014-12-16 12:02:32 >>> >>> Registering with the server... >>> Registration with the server failed. >>> ---- >>> >>> David Novogrodsky >>> david.novogrod...@gmail.com >>> http://www.linkedin.com/in/davidnovogrodsky >>> >>> On Mon, Dec 15, 2014 at 10:02 PM, Devopam Mittra <devo...@gmail.com> >>> wrote: >>>> >>>> May I suggest you simply do a ssh -l <keylessusername> using the >>>> previous and the new FQDNs that you have defined to verify which one is in >>>> effect, and accessible ? >>>> Also, since you changed the FQDN, you may wish to simply reboot the >>>> cluster once, just to make sure that new ones are in-place. >>>> It might happen that after the reboot you will need to redo the ssh >>>> keyless pairing once again (most probably) >>>> >>>> regards >>>> Devopam >>>> >>>> >>>> On Tue, Dec 16, 2014 at 4:32 AM, David Novogrodsky < >>>> david.novogrod...@gmail.com> wrote: >>>>> >>>>> The changes I am making in the hosts file are not being picked up by >>>>> the installation scripts of Ambari. I was told I could make changes to >>>>> the >>>>> hosts file and that Ambari would see them. I have >>>>> checked the etc/ambari-agent/conf/ambari-agent.ini file and the >>>>> changes I made to the hosts file are not showing up in that file. Where >>>>> is >>>>> Ambari getting the names for the other nodes in the cluster? >>>>> >>>>> Here are the changes I made to the hosts file on the host for the name >>>>> node: >>>>> 127.0.0.1 localhost localhost.localdomain localhost4 >>>>> localhost4.localdomain4 >>>>> ::1 localhost localhost.localdomain localhost6 >>>>> localhost6.localdomain6 >>>>> 192.168.200.144 datanode10.localdomain >>>>> 192.168.200.107 datanode01.localdomain >>>>> 192.168.200.143 namenode.localdomain namenode >>>>> >>>>> Since I made these changes Ambari can not discover any of the nodes in >>>>> the network. None of them. >>>>> >>>>> I have not made these changes to the other nodes because I do not want >>>>> to make changes to the other nodes until I can see Ambari discover the >>>>> host >>>>> it is sitting upon. >>>>> >>>>> Regarding the commands you mentioned, here are the results: >>>>> [root@localhost conf]# hostname -f >>>>> hostname: Unknown host >>>>> [root@localhost conf]# hostname >>>>> localhost.namenode >>>>> [root@localhost conf]# python -c 'import socket; print >>>>> socket.getfqdn()' >>>>> localhost.namenode >>>>> >>>>> localhost.namenode was the name for I used for this host during the >>>>> installation of CentOS. I thought you said i could make changes to the >>>>> hosts file and the installation scripts would recognize them? >>>>> >>>>> From the Confirm Hosts page I am getting the following errors: >>>>> for connecting to the name node >>>>> >>>>> STDOUT: {'exitstatus': 1, 'log': "Host registration aborted. Ambari Agent >>>>> host >>>>> cannot reach Ambari Server 'localhost.namenode:8080'. Please check the >>>>> network >>>>> connectivity between the Ambari Agent host and the Ambari Server"} >>>>> >>>>> for connecting to the datanode10 >>>>> >>>>> INFO 2014-12-15 16:42:33,348 DataCleaner.py:36 - Data cleanup thread >>>>> started >>>>> ERROR 2014-12-15 16:42:33,349 main.py:137 - Ambari agent machine hostname >>>>> (localhost.datanode10) does not match expected ambari server hostname >>>>> (datanode10.localdomain). Aborting registration. Please check hostname, >>>>> hostname -f and /etc/hosts file to confirm your hostname is setup >>>>> correctly >>>>> ', None) >>>>> >>>>> I am getting similiar error when trying to get to the datanode01. >>>>> Please note I used the following domain names for the following datanodes >>>>> when I installed the CentOS >>>>> datanode 10 --> localhost.datanode10 >>>>> datanode01 --> localhost.datanode01 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> David Novogrodsky >>>>> david.novogrod...@gmail.com >>>>> http://www.linkedin.com/in/davidnovogrodsky >>>>> >>>>> On Mon, Dec 15, 2014 at 11:50 AM, Yusaku Sako <yus...@hortonworks.com> >>>>> wrote: >>>>>> >>>>>> Did you change the FQDNs like I proposed, like namenode.localdomain, >>>>>> rather than localhost.namenode? >>>>>> Did you ensure that the 3 commands returned the results as shown? >>>>>> Can each host resolve all the other hosts by name? >>>>>> >>>>>> If you want to get a cluster up and running on VMs, the best bet is >>>>>> to use: >>>>>> https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide >>>>>> >>>>>> This sets up all /etc/hosts and other settings in the way you want. >>>>>> Then you can see how these VMs are being set up and mimic on your VMs >>>>>> if you'd rather set them up from scratch. >>>>>> >>>>>> I hope this helps. >>>>>> Yusaku >>>>>> >>>>>> >>>>>> On Mon, Dec 15, 2014 at 8:18 AM, David Novogrodsky < >>>>>> david.novogrod...@gmail.com> wrote: >>>>>>> >>>>>>> Ok, I removed the multiple instances onf localhost.namenode. It now >>>>>>> only appears on one line in the hosts file. >>>>>>> >>>>>>> The main ambari server still cannot see the data nodes nor the node >>>>>>> Ambari is on. Ambari is on the namenode. When I run the install, the >>>>>>> install program can not connect to any node in the network. >>>>>>> >>>>>>> Also I tried running /etc/init.d/network restart on one of the >>>>>>> nodes; datanode10 ( a virtual machine). Now that node cannot connect to >>>>>>> the internet....I would like to send you the information but I am having >>>>>>> problems setting the document from the virtual machine. >>>>>>> >>>>>>> I do not have a DNS. These machines have hardwired IP addresses and >>>>>>> names in the host file. Did runn /etc/init.d/network restart break the >>>>>>> connection? >>>>>>> >>>>>>> >>>>>>> David Novogrodsky >>>>>>> david.novogrod...@gmail.com >>>>>>> http://www.linkedin.com/in/davidnovogrodsky >>>>>>> >>>>>>> On Sat, Dec 13, 2014 at 12:46 AM, Yusaku Sako < >>>>>>> yus...@hortonworks.com> wrote: >>>>>>>> >>>>>>>> You can just make the changes in /etc/hosts. You might also >>>>>>>> change /etc/sysconfig/network and run /etc/init.d/network restart. >>>>>>>> >>>>>>>> Then make sure that running the 3 commands return expected results. >>>>>>>> >>>>>>>> Yusaku >>>>>>>> >>>>>>>> On Fri, Dec 12, 2014 at 9:06 PM, David Novogrodsky < >>>>>>>> david.novogrod...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> When I installed the CentOS on the machines, I chose those name, >>>>>>>>> localhost.datanode01...and so on. You mean I have to reinstall >>>>>>>>> CentOS on >>>>>>>>> the machines again? >>>>>>>>> >>>>>>>>> Can I just make the changes in the host files? >>>>>>>>> >>>>>>>>> Will I need to recreate the SSH keys?. >>>>>>>>> >>>>>>>>> David Novogrodsky >>>>>>>>> david.novogrod...@gmail.com >>>>>>>>> http://www.linkedin.com/in/davidnovogrodsky >>>>>>>>> >>>>>>>>> On Fri, Dec 12, 2014 at 6:21 PM, Yusaku Sako < >>>>>>>>> yus...@hortonworks.com> wrote: >>>>>>>>> >>>>>>>>>> I would set it up like this: >>>>>>>>>> >>>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4 >>>>>>>>>> localhost4.localdomain4* <- do not list the hostname here. * >>>>>>>>>> ::1 localhost localhost.localdomain localhost6 >>>>>>>>>> localhost6.localdomain6 >>>>>>>>>> xxx.xxx.200.144 datanode10.localdomain >>>>>>>>>> xxx.xxx.200.107 datanode01.localdomain >>>>>>>>>> xxx.xxx.200.143 namenode.localdomain namenode >>>>>>>>>> >>>>>>>>>> With this change: >>>>>>>>>> * *hostname -f* should display *namenode.localdomain* >>>>>>>>>> * *hostname* should display *namenode* >>>>>>>>>> * *python -c 'import socket; print socket.getfqdn()' *should >>>>>>>>>> display *namenode.localdomain* >>>>>>>>>> >>>>>>>>>> I hope this helps. >>>>>>>>>> Yusaku >>>>>>>>>> >>>>>>>>>> On Fri, Dec 12, 2014 at 3:52 PM, David Novogrodsky < >>>>>>>>>> david.novogrod...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> All, >>>>>>>>>>> >>>>>>>>>>> I am having a problem with Ambari. >>>>>>>>>>> I am trying to use Ambari to install Hadoop to a three node >>>>>>>>>>> cluster. the name node is where the Ambari server is located. I am >>>>>>>>>>> getting >>>>>>>>>>> this error: >>>>>>>>>>> ERROR 2014-12-12 17:39:56,963 main.py:137 – Ambari agent machine >>>>>>>>>>> hostname (localhost.localdomain) does not match expected ambari >>>>>>>>>>> server >>>>>>>>>>> hostname (namenode). Aborting registration. Please check hostname, >>>>>>>>>>> hostname >>>>>>>>>>> -f and /etc/hosts file to confirm your hostname is setup correctly >>>>>>>>>>> ‘, None) >>>>>>>>>>> >>>>>>>>>>> Here is the contents of my hosts file: >>>>>>>>>>> 127.0.0.1 localhost localhost.localdomain localhost4 >>>>>>>>>>> localhost4.localdomain4 localhost.namenode namenode >>>>>>>>>>> ::1 localhost localhost.localdomain localhost6 >>>>>>>>>>> localhost6.localdomain6 >>>>>>>>>>> xxx.xxx.200.144 localhost.datanode10 >>>>>>>>>>> xxx.xxx.200.107 localhost.datanode01 >>>>>>>>>>> xxx.xxx.200.143 localhost.namenode namenode >>>>>>>>>>> >>>>>>>>>>> I am not sure what the problem is. Since there are only four >>>>>>>>>>> steps to run ambari there is not a lot of background to determine >>>>>>>>>>> the cause >>>>>>>>>>> of this problem. >>>>>>>>>>> >>>>>>>>>>> David Novogrodsky >>>>>>>>>>> david.novogrod...@gmail.com >>>>>>>>>>> http://www.linkedin.com/in/davidnovogrodsky >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> CONFIDENTIALITY NOTICE >>>>>>>>>> NOTICE: This message is intended for the use of the individual or >>>>>>>>>> entity to which it is addressed and may contain information that is >>>>>>>>>> confidential, privileged and exempt from disclosure under applicable >>>>>>>>>> law. >>>>>>>>>> If the reader of this message is not the intended recipient, you are >>>>>>>>>> hereby >>>>>>>>>> notified that any printing, copying, dissemination, distribution, >>>>>>>>>> disclosure or forwarding of this communication is strictly >>>>>>>>>> prohibited. If >>>>>>>>>> you have received this communication in error, please contact the >>>>>>>>>> sender >>>>>>>>>> immediately and delete it from your system. Thank You. >>>>>>>>> >>>>>>>>> >>>>>>>> CONFIDENTIALITY NOTICE >>>>>>>> NOTICE: This message is intended for the use of the individual or >>>>>>>> entity to which it is addressed and may contain information that is >>>>>>>> confidential, privileged and exempt from disclosure under applicable >>>>>>>> law. >>>>>>>> If the reader of this message is not the intended recipient, you are >>>>>>>> hereby >>>>>>>> notified that any printing, copying, dissemination, distribution, >>>>>>>> disclosure or forwarding of this communication is strictly prohibited. >>>>>>>> If >>>>>>>> you have received this communication in error, please contact the >>>>>>>> sender >>>>>>>> immediately and delete it from your system. Thank You. >>>>>>>> >>>>>>> >>>>>> CONFIDENTIALITY NOTICE >>>>>> NOTICE: This message is intended for the use of the individual or >>>>>> entity to which it is addressed and may contain information that is >>>>>> confidential, privileged and exempt from disclosure under applicable law. >>>>>> If the reader of this message is not the intended recipient, you are >>>>>> hereby >>>>>> notified that any printing, copying, dissemination, distribution, >>>>>> disclosure or forwarding of this communication is strictly prohibited. If >>>>>> you have received this communication in error, please contact the sender >>>>>> immediately and delete it from your system. Thank You. >>>>>> >>>>> >>>> >>>> -- >>>> Devopam Mittra >>>> Life and Relations are not binary >>>> >>> >> >> -- >> Devopam Mittra >> Life and Relations are not binary >> >