This alert fires when there are alerts which haven’t reported in within 2x 
their interval value. The most common reason that this alert would misfire is 
that the scheduler on the agents isn’t able to run the scheduled alert jobs.

We’ve been seeing a problem where the scheduler may occasionally miss the 
window and the alert won’t run. This is being fixed for Ambari 2.1.3. In the 
meantime, you can changed these lines of 
code<https://github.com/apache/ambari/blob/branch-2.1.2/ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py#L46-L50>
 on the agents where the problem is occurring to:


APS_CONFIG = {
      'apscheduler.threadpool.core_threads': 3,
      'apscheduler.coalesce': True,
      'apscheduler.standalone': False,
      'apscheduler.misfire_grace_time': 5
    }

After restarting the agents, the misfire grace time should be higher and allow 
for the alert job to run, even if it misses its window.

On Oct 26, 2015, at 6:31 AM, Vijaya Narayana Reddy Bhoomi Reddy 
<[email protected]<mailto:[email protected]>> 
wrote:

Hi,

In my cluster, I often see “Ambari Server Alerts” with the message “There are 7 
stale alerts from 1 host(s). Resource ManagerRPC Latency, Zookeeper etc”

Can anyone please throw light on the root cause of this alert? I am not able to 
trace the correct cause for this. It occurs occasionally and then disappears 
after a while. However, it comes with a CRITICAL flag. Hence wanted to 
understand the root cause behind it and as well as the severity.

Thanks
Vijay


--
The contents of this e-mail are confidential and for the exclusive use of
the intended recipient. If you receive this e-mail in error please delete
it from your system immediately and notify us either by e-mail or
telephone. You should not copy, forward or otherwise disclose the content
of the e-mail. The views expressed in this communication may not
necessarily be the view held by WHISHWORKS.


Reply via email to