Re: [ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-12 Thread Stefan Kooman
Quoting Miroslav Kalina (miroslav.kal...@livesport.eu): > Monitor down is also easy as pie, because it's just "num_mon - > mon_quorum". But there is also metric mon_outside_quorum which I have > always zero and don't really know how it works. See this issue if you want to know where it is used fo

Re: [ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-12 Thread Miroslav Kalina
I just briefly peaked into source of module and I suppose it's because main design idea is just to forward existing metrics from ceph core and do not calculate anything. To me it seems most users probably use prometheus which doesn't have this kind of issue. Monitor down is also easy as pie, beca

Re: [ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-11 Thread Mario Giammarco
Miroslav replied better for us why "is not so simple" to use math. And osd down was the easiest. How can I calculate: - monitor down - osd near full ? I do not understand why ceph plugin cannot send to influx all the metrics it has, especially the most useful for creating alarms. Il giorno mer 1

Re: [ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-11 Thread Miroslav Kalina
As I mentioned yesterday here, there is an issue with current scheme of metrics. With current scheme you cannot do simple math like > SELECT num_osd - num_osd_up FROM "ceph_cluster_stats" Instead you will need query like > SELECT (SELECT last("value") FROM "ceph_cluster_stats" WHERE "type_insta

Re: [ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-10 Thread Konstantin Shalygin
But it is very difficult/complicated to make simple queries because, for example I have osd up and osd total but not osd down metric. To determine how much osds down you don't need special metric, because you already have osd_up and osd_in metrics. Just use math. k ___

[ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-10 Thread Mario Giammarco
Hi, I enabled telegraf and influx plugins for my ceph cluster. I would like to use influx/chronograf to detect anomalies: - osd down - monitor down - osd near full But it is very difficult/complicated to make simple queries because, for example I have osd up and osd total but not osd down metric.