I assumed that. Thanks for confirming it. This is what I end up doing,
cumulative metrics via Metrics API and metrics with discarding panes custom
alongside data. Problem is, that is is not natural and I have to "manually"
sync the way I write these to target time series database.

I understand that metrics are global as you are saying and not bounded to
windows, but what I do at the end is still global window + discard fired
panes. Would be great to have same option on Metrics API. Maybe possibility
to consider in the new feature of metrics reporting independent from
runners.

P.S.: I tried to do all metrics custom via side outputs, but I got quite
noticeable performance degradation even when using combine, which I
expected to reduce data considerable before shuffle.

On Tue, Jun 19, 2018 at 11:13 AM Etienne Chauchot <[email protected]>
wrote:

> Hi Scott and Jozef,
>
> Sorry for the late answer, I missed the email.
>
> Well, MetricsPusher will aggregate the metrics just as
> PipelineResult.metrics() does but it will do so at given configurable
> intervals and export the values. It means that if you configure the export
> to be every 5s, you will get the aggregated (between workers) value of the
> distribution every 5 sec. It will not be reset. For ex, at t = 0 + 5s if
> the max received until then is 10, then the value exported will be 10.
> Then, at t = 0 + 10s, it the distribution was updated with a 5 it will
> still report 10. Then at t = 0 + 15s, if the distribution was updated with
> a 11, then it will export 11.
> As metrics are global and not bound to windows like PCollection elements,
> you will always have the cumulative value (essence of the distribution
> metric). So I agree with Scott, better for your use case is to treat the
> metric as if it was an element and compute it donwstream so that it could
> be bound to a window.
>
> Etienne
>
>
>
> Le samedi 02 juin 2018 à 08:01 +0300, Jozef Vilcek a écrit :
>
> Hi Scott,
>
> nothing special about the use-case. Just want to monitor upper and lower
> bound for some data floating in operator.
> The "report interval" is right now 30 seconds and it is independent of
> business logic. It is the one mentionedd here:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#reporter
>
> and value set with respect to how granular and fast do I want to see
> changes on what is going on in the pipeline compared to how much resources
> in time-series database I dedicate to it.
>
> Thanks for looking into it
>
> On Fri, Jun 1, 2018 at 7:49 PM, Scott Wegner <[email protected]> wrote:
>
> Hi Jozef,
>
> Can you elaborate a bit on your use-case; is the "report interval" a
> concept you depend on for your data processing logic?
>
> The Metrics API in general is designed to serve data to the executing
> runner or external service which can then manage the aggregation and
> reporting through PipelineResult or monitoring UI. Etienne, do you know if
> MetricsPusher [1] would help at all?
>
> I suspect you'd be better off calculating the Min/Max values in a
> downstream Combine transform and set the Windowing/Trigger strategy which
> captures the report interval you're looking for.
>
> [1] https://s.apache.org/runner_independent_metrics_extraction
>
> On Fri, Jun 1, 2018 at 3:39 AM Jozef Vilcek <[email protected]> wrote:
>
> Hi,
>
> I am running a streaming job on flink and want to monitor MIN and MAX
> ranges of a metric floating through operator. I did it via
> org.apache.beam.sdk.metrics.Distribution
>
> Problem is, that it seems to report only cumulative values. What I would
> want instead is discrete report for MIN / MAX which were seen in each
> particular report interval.
>
> Is there a way to get non-cumulative  data from beam distribution metrics?
> What are my options?
> The obvious workaround is to track it "manually" and submit  2 gauge
> metrics. I hope there is a better way ... Is there?
>
>
>

Reply via email to