Thanks Benno, this has come up before, mainly in the context of reducing cost of computing / serving / processing large numbers of metrics.
However, in that use case, a single prefix wasn't sufficient because the user would be interested in the subset of the metrics that they're using for graphs and alerts, and these probably will not share a non-empty prefix. So, the thinking was that multiple prefixes would be needed (which doesn't work in the case of path parameters). The thinking was also to avoid the alternative of wildcard patterns to start with (e.g. /master/frameworks/*/tasks_running). To give some context on why "/snapshot" is there: originally when the metrics library was implemented, it was envisioned that there might be multiple endpoints to read the data (e.g. "/snapshot" is current values, "history" might expose historical timeseries, etc). In retrospect I don't think there will be any other support other than "give me the current values", so attempting to get rid of the "/snapshot" suffix sounds good. But, this is orthogonal to whether a prefix path parameter or query parameter is added, no? On Thu, Mar 14, 2019 at 10:03 PM Benno Evers <bev...@mesosphere.com> wrote: > Hi all, > > while this proposal/idea is a very small change code-wise, but it would be > employing libprocess HTTP routing logic in an afaik unprecedented way, so I > wanted to open this up for discussion. > > # Motivation > > Currently, the only way to access libprocess metrics is via the > `metrics/snapshot` endpoint, which returns the current values of all > installed metrics. > > If the caller is only interested in a specific metric, or a subset of the > metrics, this is wasteful in two ways: First the process has to do extra > work to collect these metrics, and second the caller has to do extra work > to filter out the unneeded metrics. > > # Proposal > I'm proposing to have the `/metrics/` endpoint being able to be followed by > an arbitrary path. The returned returned JSON object will contain only > those metrics whose key begins with the specified path: > > `/metrics` -> Return all metrics > `/metrics/master/messages` -> Return all metrics beginning with > `master/messages`, e.g. `master/messages_launch_tasks`, etc. > > A proof of concept implementation can be found here: > https://reviews.apache.org/r/70211 > > # Discussion > The current naming conventions for metrics, i.e. `master/tasks_killed`, > suggests to the casual observer that metrics are stored and accessible in a > hierarchical manner. Using a prefix filter allows users to filter certain > parts of the metrics as if they were indeed hierarchical, while still > allowing libprocess to use a flat namespace for all metric names > internally. > > The method of access, using the url path directly instead of a query > parameter, is unusual but it has the advantage that, in my obervations, it > matches what people intuitively try to do anyways when they want to access > a subset of metrics. > > One other drawback is that all other routes of the MetricsProcess will > shadow the corresponding filter value, e.g. in right now it would not be > possible to return all metrics whose names begin with 'snapshot/'. > > # Alternatives > 1) Add a `prefix` parameter to the `snapshot` endpoint, i.e. > > `/metrics/snapshot?prefix=/master/cpu` > > This is more in line with how we classically do libprocess endpoints, but > from a UI perspective it's hard to discover: Many people, including some > Mesos developers, already have trouble remembering to append `/snapshot` to > get the metrics, so requiring to memorize an additional parameter does not > seem nice. > > 2) Move the dynamic prefix under some other endpoint `/values`, i.e. > > /metrics/values/master/messages` > > This has the main disadvantage that /values (with empty filter) and > /snapshot will return exactly the same data, begging the question why both > are needed. > > > What do you think? I'm looking forward to hear your thoughts, ideas, etc. > > Best regards, > -- > Benno Evers > Software Engineer, Mesosphere >