Hi Scott and Jozef,
Sorry for the late answer, I missed the email.
Well, MetricsPusher will aggregate the metrics just as PipelineResult.metrics() 
does but it will do so at given
configurable intervals and export the values. It means that if you configure 
the export to be every 5s, you will get the
aggregated (between workers) value of the distribution every 5 sec. It will not 
be reset. For ex, at t = 0 + 5s if the
max received until then is 10, then the value exported will be 10. Then, at t = 
0 + 10s, it the distribution was updated
with a 5 it will still report 10. Then at t = 0 + 15s, if the distribution was 
updated with a 11, then it will export
11. As metrics are global and not bound to windows like PCollection elements, 
you will always have the cumulative value
(essence of the distribution metric).  So I agree with Scott, better for your 
use case is to treat the metric as if it
was an element and compute it donwstream so that it could be bound to a window.
Etienne


Le samedi 02 juin 2018 à 08:01 +0300, Jozef Vilcek a écrit :
> Hi Scott,
> nothing special about the use-case. Just want to monitor upper and lower 
> bound for some data floating in operator. 
> The "report interval" is right now 30 seconds and it is independent of 
> business logic. It is the one mentionedd here:
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#reporter
> 
> and value set with respect to how granular and fast do I want to see changes 
> on what is going on in the pipeline
> compared to how much resources in time-series database I dedicate to it.
> 
> Thanks for looking into it   
> On Fri, Jun 1, 2018 at 7:49 PM, Scott Wegner <[email protected]> wrote:
> > Hi Jozef,
> > Can you elaborate a bit on your use-case; is the "report interval" a 
> > concept you depend on for your data processing
> > logic?
> > 
> > The Metrics API in general is designed to serve data to the executing 
> > runner or external service which can then
> > manage the aggregation and reporting through PipelineResult or monitoring 
> > UI. Etienne, do you know if MetricsPusher
> > [1] would help at all?
> > 
> > I suspect you'd be better off calculating the Min/Max values in a 
> > downstream Combine transform and set the
> > Windowing/Trigger strategy which captures the report interval you're 
> > looking for.
> > 
> > [1] https://s.apache.org/runner_independent_metrics_extraction 
> > 
> > On Fri, Jun 1, 2018 at 3:39 AM Jozef Vilcek <[email protected]> wrote:
> > > Hi,
> > > I am running a streaming job on flink and want to monitor MIN and MAX 
> > > ranges of a metric floating through
> > > operator. I did it via  org.apache.beam.sdk.metrics.Distribution
> > > 
> > > Problem is, that it seems to report only cumulative values. What I would 
> > > want instead is discrete report for MIN /
> > > MAX which were seen in each particular report interval.
> > > 
> > > Is there a way to get non-cumulative  data from beam distribution 
> > > metrics? What are my options?
> > > The obvious workaround is to track it "manually" and submit  2 gauge 
> > > metrics. I hope there is a better way ... Is
> > > there?

Reply via email to