We use a statsd metric reporter into a graphite cluster, and have built out
extensive graphs shown in Grafana.  On top of that we use seyren to do
alerting.  Right now we have alerts on the following:

- Spout lag greater than our defined SLAs
- Null reported spout lag - IE if the topology stops reporting metrics (or
just isn't deployed) for a period of time.
- Failed tuple percentage, if this exceeds a threshold
- Thru-put / number of executes - Our topologies should always be doing
something, they're never completely idle.  If we see thru-put drop below a
threshold we'll be alerted.

Hope this helps!  Curious to what others monitor/alert on.

Stephen

On Thu, Oct 27, 2016 at 2:49 AM, Chen Junfeng <k-2f...@hotmail.com> wrote:

> What specifications will you use to measure it ?
>
>
>
>
>
> Regard
>
> Junfeng Chen
>

Reply via email to