[DISCUSSION] Runner agnostic metrics extractor?

2017-11-27 Thread Etienne Chauchot
Hi all, I came by this ticket https://issues.apache.org/jira/browse/BEAM-2456. I know that the metrics subject has already been discussed a lot, but I would like to revive the discussion. The aim in this ticket is to avoid relying on the runner to provide the metrics because they don't have

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-11-27 Thread Jean-Baptiste Onofré
Hi all, Etienne forgot to mention that we started a PoC about that. What I started is to wrap the Pipeline creation to include a thread that polls periodically the metrics in the pipeline result (it's what I proposed when I compared with Karaf Decanter some time ago). Then, this thread marshal

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-11-27 Thread Ben Chambers
I think discussing a runner agnostic way of configuring how metrics are extracted is a great idea -- thanks for bringing it up Etienne! Using a thread that polls the pipeline result relies on the program that created and submitted the pipeline continuing to run (eg., no machine faults, network pro

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-11-27 Thread Jean-Baptiste Onofré
Yeah, I think that something in the runner makes sense. The only drawback is that it would require some enforcement on the runners and change on all runners. If it could be part of the Runner API, that would help I think. The idea of the thread poller was more in PoC way. Only the polling/send

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-11-29 Thread Etienne Chauchot
Thanks Ben for your comments! Indeed, there is an issue about failover regarding the polling thread. To that extent, pushing metrics to a sink would be better. To make this push runner agnostic, doing the code in the runner-common part of beam would be good. Maybe in the runner API like JB sug

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-11-29 Thread Jean-Baptiste Onofré
Hi Etienne, yeah, I think it makes sense to update the PoC. I like the package/class name you are proposing. Thanks ! Regards JB On 11/29/2017 10:30 AM, Etienne Chauchot wrote: Thanks Ben for your comments! Indeed, there is an issue about failover regarding the polling thread. To that exten

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-12-11 Thread Etienne Chauchot
Hi all, I sketched a little doc [1] about this subject. It tries to sum up the differences between the runners towards metrics extraction and propose some possible designs to have a runner agnostic extraction of the metrics. It is a 2 pages long doc, can you please comment it, and correct it

Re: [DISCUSSION] Runner agnostic metrics extractor?

2017-12-11 Thread Jean-Baptiste Onofré
Hi, thanks for the doc. I left some comments. Regards JB On 12/11/2017 05:33 PM, Etienne Chauchot wrote: Hi all, I sketched a little doc [1] about this subject. It tries to sum up the differences between the runners towards metrics extraction and propose some possible designs to have a runn

Re: [DISCUSSION] Runner agnostic metrics extractor?

2018-01-31 Thread Etienne Chauchot
Hi all, Just to let you know that I have just submitted the PR [1]: This PR adds a MetricsPusher discussed in this [2] document in scenario 3.b. It merges and pushes beam metrics at a configurable (via pipelineOptions) frequency to a configurable sink. By default the sink is a DummySink also u

Re: [DISCUSSION] Runner agnostic metrics extractor?

2018-03-26 Thread Etienne Chauchot
Hi guys, As part of the work bellow I need the help of Google Dataflow engine maintainers: AFAIK Dataflow being a cloud hosted engine, the related runner is very different from the others. It just submits a job to the cloud hosted engine. So, no access to metrics container etc... from the runner

Re: [DISCUSSION] Runner agnostic metrics extractor?

2018-03-26 Thread Jean-Baptiste Onofré
Hi Etienne, as we might want to keep the runners consistent on such feature, I think it makes sense to have this in the dataflow runner. Especially, if it's not used by end-users, there's no impact in the runner. So, +1 to add MetricsPusher in dataflow runner. My $0.01 Regards JB On 03/26/201

Re: [DISCUSSION] Runner agnostic metrics extractor?

2018-03-26 Thread Scott Wegner
Thanks for keeping this discussion going, Etienne. I can help investigate what it would take to add support for Dataflow runner. I've filed BEAM-3926 to track [1]. Is there a @ValidatesRunner integration test [2] that can be used to verify when the functionality has been correctly implemented for

Re: [DISCUSSION] Runner agnostic metrics extractor?

2018-03-26 Thread Etienne Chauchot
Hi JB, I guess you mean add it on the engine, not on the runner, as dataflow runner is more a client Le lundi 26 mars 2018 à 17:36 +0200, Jean-Baptiste Onofré a écrit : > Hi Etienne, > > as we might want to keep the runners consistent on such feature, I think it > makes sense to have this in the

Re: [DISCUSSION] Runner agnostic metrics extractor?

2018-03-26 Thread Etienne Chauchot
Hi Scott, Thanks for the help and thanks for the ticket creation. I'll add the equivalent tickets for flink and spark but the integration is already done in the PR you reviewed (https://github.com/apache/beam/pull/4548) Maybe I'll split the PR into different runner coverage (I had already grouped