Hi, You can find information how to use metrics here [1]. I don’t think there is a straightforward way to access them from within a job. You could access them via JMX when using JMXReporter or you can implement some custom reporter, that could expose the metrics via localhost connections or some static (:S) variables.
Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html> > On 5 Dec 2019, at 19:40, Nguyen, Michael <michael.nguye...@t-mobile.com> > wrote: > > Hi Piotrek, > > For the second article, I understand I can monitor the backpressure status > via the Flink Web UI. Can I refer to the same metrics in my Flink jobs > itself? For example, can I put in an if statement to check for when > outPoolUsage reaches 100%? > > Thank you, > Michael > > From: Piotr Nowojski <pi...@ververica.com <mailto:pi...@ververica.com>> > Date: Thursday, December 5, 2019 at 10:27 AM > To: Michael Nguyen <michael.nguye...@t-mobile.com > <mailto:michael.nguye...@t-mobile.com>> > Cc: Khachatryan Roman <khachatryan.ro...@gmail.com > <mailto:khachatryan.ro...@gmail.com>>, "user@flink.apache.org > <mailto:user@flink.apache.org>" <user@flink.apache.org > <mailto:user@flink.apache.org>> > Subject: Re: How does Flink handle backpressure in EMR > > [External] > > Hi, > > If you are using event time and watermarks, you can monitor the delays using > `currentInputWatermark` metric [1]. If not (or alternatively), this blog post > [2] describes how to check back pressure status [2] for Flink up to 1.9. In > Flink 1.10 there will be an additional new metric for that [3]. > > Piotrek > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/debugging_event_time.html > > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.9%2Fmonitoring%2Fdebugging_event_time.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570374404&sdata=GS8PCzeohW95e%2BT5phGljgHdMArImMjqBxSkR79dIzw%3D&reserved=0> > [2] https://flink.apache.org/2019/07/23/flink-network-stack-2.html > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2019%2F07%2F23%2Fflink-network-stack-2.html&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=fA%2FBiBnL%2BdqLN4FJ9t2%2B71b7M7Ii7rjfrsmRUlqgIiA%3D&reserved=0> > [3] https://issues.apache.org/jira/browse/FLINK-14813 > <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-14813&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570384404&sdata=NSe%2BN9ur9u3YLEGqqq%2F%2FBH8XXgd2jtlwV67LUyXmA8A%3D&reserved=0> > > >> On 5 Dec 2019, at 19:11, Nguyen, Michael <michael.nguye...@t-mobile.com >> <mailto:michael.nguye...@t-mobile.com>> wrote: >> >> Hi Roman, >> >> So right now we have a couple Flink jobs that consumes data from one Kinesis >> data stream. These jobs vary from a simple dump into a PostgreSQL table to >> calculating anomalies in a 30 minute window. >> >> One large scenario we were worried about was what if one of our jobs was >> taking a long time to process the Kinesis stream data? How would we detect >> this scenario from within our Flink job? >> >> We do not want our Flink jobs to lag too far from the latest point in our >> Kinesis stream as we are trying to deliver information in (near) real-time. >> >> From: Khachatryan Roman <khachatryan.ro...@gmail.com >> <mailto:khachatryan.ro...@gmail.com>> >> Date: Thursday, December 5, 2019 at 9:47 AM >> To: Michael Nguyen <michael.nguye...@t-mobile.com >> <mailto:michael.nguye...@t-mobile.com>> >> Cc: Piotr Nowojski <pi...@ververica.com <mailto:pi...@ververica.com>>, >> "user@flink.apache.org <mailto:user@flink.apache.org>" >> <user@flink.apache.org <mailto:user@flink.apache.org>> >> Subject: Re: How does Flink handle backpressure in EMR >> >> [External] >> >> @Michael, >> Could you please describe your topology with which operators being slow, >> back-pressured and probably skews in sources? >> >> Regards, >> Roman >> >> >> On Thu, Dec 5, 2019 at 6:20 PM Nguyen, Michael >> <michael.nguye...@t-mobile.com <mailto:michael.nguye...@t-mobile.com>> wrote: >>> Thank you for the response Roman and Piotrek! >>> >>> @Roman - can you clarify on what you mean when you mentioned Flink >>> propagating it back to the sources? >>> >>> Also, if one of my Flink operators is processing records too slowly and is >>> getting further away from the latest record of my source data stream, is >>> there a way to detect this slow processing in Flink? Would this be detected >>> by Flink's backpressure mechanism? >>> >>> Thanks, >>> Michael >>> >>> On 12/5/19, 7:57 AM, "Piotr Nowojski" <pi...@data-artisans.com >>> <mailto:pi...@data-artisans.com> on behalf of pi...@ververica.com >>> <mailto:pi...@ververica.com>> wrote: >>> >>> [External] >>> >>> >>> Hi Michael, >>> >>> As Roman pointed out Flink currently doesn’t support the auto-scaling. >>> It’s on our roadmap but it requires quite a bit of preliminary work to >>> happen before. >>> >>> Piotrek >>> >>> > On 5 Dec 2019, at 15:32, r_khachatryan <khachatryan.ro...@gmail.com >>> <mailto:khachatryan.ro...@gmail.com>> wrote: >>> > >>> > Hi Michael >>> > >>> > Flink *does* detect backpressure but currently, it only propagates it >>> back >>> > to sources. >>> > And so it doesn't support auto-scaling. >>> > >>> > Regards, >>> > Roman >>> > >>> > >>> > Nguyen, Michael wrote >>> >> How does Flink handle backpressure (caused by an increase in >>> traffic) in a >>> >> Flink job when it’s being hosted in an EMR cluster? Does Flink >>> detect the >>> >> backpressure and auto-scales the EMR cluster to handle the workload >>> to >>> >> relieve the backpressure? Once the backpressure is gone, then the EMR >>> >> cluster would scale back down? >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > Sent from: >>> https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cfda964dacbf04cc2d2cb08d7799bc347%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111582216065152&sdata=FIHbAWbkH7Tq8AjG%2BVSZoxNrOl0Bn6SukN%2BY1PyhSHg%3D&reserved=0 >>> >>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-flink-user-mailing-list-archive.2336050.n4.nabble.com%2F&data=02%7C01%7CMichael.Nguyen79%40t-mobile.com%7Cbbe7399df2e24da718b708d779b0cc2e%7Cbe0f980bdd994b19bd7bbc71a09b026c%7C0%7C0%7C637111672570394400&sdata=j0YcpGEm3Lour%2FbVi2WX7hSH1vtxcBRUXNgeNnrCgBU%3D&reserved=0>