I'm diagnosing an issue, and I think I found a bug with the ambari-agent code:
https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/Controller.py#L390 If 'cluster_name' has spaces in it, this request fails because it fails to URL-encode value. This causes all of the agents to go to HEARTBEAT_LOST state and everything fails, but the error it spits out in the agent log is hugely misleading: ERROR 2015-04-08 18:30:20,312 Controller.py:140 - Unable to connect to: https://ambari.local:8441/agent/v1/register/ambari.local Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 128, in registerWithServer self.addToStatusQueue(ret['statusCommands']) File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 172, in addToStatusQueue self.updateComponents(commands[0]['clusterName']) File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 360, in updateComponents response = self.sendRequest(self.componentsUrl + cluster_name, None) File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 353, in sendRequest + '; Response: ' + str(response)) IOError: Response parsing failed! Request data: None; Response: It connected fine, and parsed the response fine, but then died during processing of the response. Probably shouldn't be trapping every Exception here: https://github.com/apache/ambari/blob/trunk/ambari-agent/src/main/python/ambari_agent/Controller.py#L170 I assume that this is a bug and we want to allow cluster names to be whatever the customer would like. I'll open a JIRA unless someone can disconfirm that this is a bug. Greg
