[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics
[ https://issues.apache.org/jira/browse/SLING-12262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824395#comment-17824395 ] Robert Munteanu commented on SLING-12262: - Right, I was just expecting that the window of opportunity when these metrics are available to be quite short, as the instance would be taken down. Not opposing the implementation, just asking :-) > Repoinit: report failures via metrics > - > > Key: SLING-12262 > URL: https://issues.apache.org/jira/browse/SLING-12262 > Project: Sling > Issue Type: Task > Components: Repoinit >Affects Versions: Repoinit JCR 1.1.46 >Reporter: Joerg Hoh >Priority: Major > > When a repoinit statement fails (and for that reason the SlingRepository > service cannot be started, repoinit should expose this as a metric. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics
[ https://issues.apache.org/jira/browse/SLING-12262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824365#comment-17824365 ] Joerg Hoh commented on SLING-12262: --- We scrape metrics via prometheus and have an alertmanager instance to create alerts from it. If now an instance is not starting up, it's much easier to find out if repoinit is the culprit if you can query a metric than to search the logs for the characteristical exception of repoinit. That allows us to refine the "instance-not-starting-up" alert and convert it into an "instance-not-starting-up-because-of-repoinit-issues" alert, which is much more meaningful and which can be handled differently than the generic alert, which always requires the general triage process. > Repoinit: report failures via metrics > - > > Key: SLING-12262 > URL: https://issues.apache.org/jira/browse/SLING-12262 > Project: Sling > Issue Type: Task > Components: Repoinit >Affects Versions: Repoinit JCR 1.1.46 >Reporter: Joerg Hoh >Priority: Major > > When a repoinit statement fails (and for that reason the SlingRepository > service cannot be started, repoinit should expose this as a metric. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics
[ https://issues.apache.org/jira/browse/SLING-12262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823664#comment-17823664 ] Robert Munteanu commented on SLING-12262: - With the repository being unavailable, is there any point to the metrics with regards to repoinit failure being exposed/scraped? The instance is presumably unhealthy and should not be around for long. > Repoinit: report failures via metrics > - > > Key: SLING-12262 > URL: https://issues.apache.org/jira/browse/SLING-12262 > Project: Sling > Issue Type: Task > Components: Repoinit >Affects Versions: Repoinit JCR 1.1.46 >Reporter: Joerg Hoh >Priority: Major > > When a repoinit statement fails (and for that reason the SlingRepository > service cannot be started, repoinit should expose this as a metric. -- This message was sent by Atlassian Jira (v8.20.10#820010)