Few important thingsto monitor from top of head Compaction queue size, compaction size ( size of all files in compaction) GC pause time, number gc (highly co rellated to compactions) Ipc read write call size Slow query logs Number of failed regions from canary tests Replication queue size
Its better to monitor these metrics at each region server level to detect issues e.g overall cluster gc may be around average but all of the gc’s could be happening in only one region server, its very difficult to find these unless you track these metrics at each region server level. -sudhir On Fri, 6 Apr 2018 at 11:27 PM, Hubbert Smith <[email protected]> wrote: > OK, guilty as charged. my imagination got away from me > you just wanted to monitor your hbase, not your hardware ... ok then > > On Fri, Apr 6, 2018 at 4:13 AM, Mark Bonetti <[email protected] > > > wrote: > > > Hi, > > I'm building a monitoring system for HBase and want to set up default > > alerts (threshold or anomaly) on 2-3 key metrics everyone who uses HBase > > typically wants to alert on, but I don't yet have production-grade > > experience with HBase. > > > > Importantly, alert rules have to be generally useful, so can't be on > > metrics whose values vary wildly based on the size of deployment. > > > > In other words, which metrics would be most significant indicators that > > something went wrong with your HBase? > > > > I thought the best place to find experienced HBase users, who would find > > answering this question trivial, would be here. > > > > Thanks very much, > > Mark > > > > > > -- > [email protected] | 385 321 0757 | LinkedIN > <http://tinyurl.com/7v5eu2p> > Linkedin Learning: Storage Foundations Cert Prep: SNCP Foundations S10-110 > < > https://www.linkedin.com/learning/cert-prep-sncp-foundations-s10-110/storage-and-business-and-career-path > > >
