This is for Ambari 2.1.1 so apologies if this has since been fixed.  We saw a 
failure today in one of our custom actions caused by a temporary network hiccup:

Caught an exception while executing custom service command: <class 
'ambari_agent.FileCache.CachingException'>: Can not download file from url 
https://ambari.local:443/resources//custom_actions/.hash : <urlopen error timed 
out>; Can not download file from url 
https://ambari.local:443/resources//custom_actions/.hash : <urlopen error timed 
out>

Is there some way to tell the agent to not fail here? Just keep retrying until 
it can download the file from the server.  If it takes too long we'll handle 
timing out the build and cleaning up ourselves.

The 'tolerate_download_failures' setting doesn't trigger a retry, it just 
relies on the local cache to proceed, and the file isn't in the local cache 
yet, so it fails with a file missing exception if we enable it.

Greg

Reply via email to