José Correia created SLING-11181: ------------------------------------ Summary: Emit metrics that distinguish transient and permanent distribution failures Key: SLING-11181 URL: https://issues.apache.org/jira/browse/SLING-11181 Project: Sling Issue Type: Improvement Components: Content Distribution Reporter: José Correia
h3. Context Currently, our error metrics don't distinguish between distribution failures that are permanent and will fail even if retried, or failures that succeed after being retried. We want to improve this in order to be able to differentiate both scenarios. h3. Solution Failure metric should be labeled by: * {{Transient failure}} * {{Permanent failure}} h3. Proposed approach We can distinguish both these scenarios by using the following rationale: * Transient failures happen whenever a package is distributed successfully but had more than 1 attempt at being distributed: {{retries > 0}} -- This message was sent by Atlassian Jira (v8.20.1#820001)