[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics

2024-03-07 Thread Robert Munteanu (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-12262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824395#comment-17824395
 ] 

Robert Munteanu commented on SLING-12262:
-

Right, I was just expecting that the window of opportunity when these metrics 
are available to be quite short, as the instance would be taken down.

Not opposing the implementation, just asking :-)

> Repoinit: report failures via metrics
> -
>
> Key: SLING-12262
> URL: https://issues.apache.org/jira/browse/SLING-12262
> Project: Sling
>  Issue Type: Task
>  Components: Repoinit
>Affects Versions: Repoinit JCR 1.1.46
>Reporter: Joerg Hoh
>Priority: Major
>
> When a repoinit statement fails (and for that reason the SlingRepository 
> service cannot be started, repoinit should expose this as a metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics

2024-03-07 Thread Joerg Hoh (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-12262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824365#comment-17824365
 ] 

Joerg Hoh commented on SLING-12262:
---

We scrape metrics via prometheus and have an alertmanager instance to create 
alerts from it. If now an instance is not starting up, it's much easier to find 
out if repoinit is the culprit if you can query a metric than to search the 
logs for the characteristical exception of repoinit. That allows us to refine 
the "instance-not-starting-up" alert and convert it into an 
"instance-not-starting-up-because-of-repoinit-issues" alert, which is much more 
meaningful and which can be handled differently than the generic alert, which 
always requires the general triage process.







> Repoinit: report failures via metrics
> -
>
> Key: SLING-12262
> URL: https://issues.apache.org/jira/browse/SLING-12262
> Project: Sling
>  Issue Type: Task
>  Components: Repoinit
>Affects Versions: Repoinit JCR 1.1.46
>Reporter: Joerg Hoh
>Priority: Major
>
> When a repoinit statement fails (and for that reason the SlingRepository 
> service cannot be started, repoinit should expose this as a metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics

2024-03-05 Thread Robert Munteanu (Jira)


[ 
https://issues.apache.org/jira/browse/SLING-12262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823664#comment-17823664
 ] 

Robert Munteanu commented on SLING-12262:
-

With the repository being unavailable, is there any point to the metrics with 
regards to repoinit failure being exposed/scraped? The instance is presumably 
unhealthy and should not be around for long.

> Repoinit: report failures via metrics
> -
>
> Key: SLING-12262
> URL: https://issues.apache.org/jira/browse/SLING-12262
> Project: Sling
>  Issue Type: Task
>  Components: Repoinit
>Affects Versions: Repoinit JCR 1.1.46
>Reporter: Joerg Hoh
>Priority: Major
>
> When a repoinit statement fails (and for that reason the SlingRepository 
> service cannot be started, repoinit should expose this as a metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)