Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-05 Thread Mason Chen
scale to > requests to redeploy the job. > > Sorry, I didn't understand what type of benchmarking > we should do, could you elaborate on it? Thanks a lot. > > Best, > Rui > > On Sat, Nov 18, 2023 at 3:32 AM Mason Chen wrote: > >> Hi Rui, >> >&g

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-11-17 Thread Mason Chen
Hi Rui, I suppose we could do some benchmarking on what works well for the resource providers that Flink relies on e.g. Kubernetes. Based on conferences and blogs, it seems most people are relying on Kubernetes to deploy Flink and the restart strategy has a large dependency on how well Kubernetes

Re: Metric to capture decoding failure in flink sources

2023-10-10 Thread Mason Chen
Hi Prateek, I agree, the reader should ideally expose the context to record metrics about deserialization. One option is to defer deserialization to another operator, say a RichMapFunction that has access to the RuntimeContext. Best, Mason On Mon, Oct 9, 2023 at 12:42 PM Prateek Kohli wrote: >

Re: Custom Prometheus metrics disappeared in 1.16.2 => 1.17.1 upgrade

2023-10-03 Thread Mason Chen
Hi Javier, Is there a particular reason why you aren't leveraging Flink metric API? It seems that functionality was internal to the PrometheusReporter implementation and your usecase should've continued working if it had depended on Flink's metric API. Best, Mason On Thu, Sep 28, 2023 at 2:51 A

Re: Task manager creation in Flink native Kubernetes (application mode)

2023-07-27 Thread Mason Chen
I'm also curious about this and how to make it better in the current native Kubernetes integration model. Is there some way for Flink to discover and surface the oom kill signal from Kubernetes? Best, Mason On Tue, Jul 25, 2023 at 6:11 AM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wrote:

Re: Kafka coordinator not available

2023-07-20 Thread Mason Chen
Hi Lars, You are likely seeing this Kafka client bug: https://issues.apache.org/jira/browse/KAFKA-13840. The latest versions of Flink have updated Kafka clients dependency to include this fix. Best, Mason On Thu, Jul 20, 2023 at 9:21 AM Lars Skjærven wrote: > Hello, > I experienced > Coordinat

Re: Multiple Kafka Source for a Data Pipeline

2023-07-06 Thread Mason Chen
Hi all, This should be fixed. Are you setting a different `client.id.prefix` for each KafkaSource? See this: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/kafka/#additional-properties and https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connec

Re: Difference between different values for starting offset

2023-07-03 Thread Mason Chen
Hi Oscar, You are correct about the OffsetInitializer being only effective when there is no Flink state--in addition, if you have partition discovery on, this initializer will be reused for the new partitions (i.e. splits) discovered. Assuming the job is continuing from the offset in Flink state,

Re: Kafka Quotas & Consumer Group Client ID (Flink 1.15)

2023-05-27 Thread Mason Chen
Thu, May 25, 2023 at 12:55 AM Mason Chen > wrote: > >> Hi Hatem, >> >> The reason for setting different client ids is to due to Kafka client >> metrics conflicts and the issue is documented here: >> https://nightlies.apache.org/flink/flink-docs-stable/docs/connec

Re: Kafka Quotas & Consumer Group Client ID (Flink 1.15)

2023-05-24 Thread Mason Chen
Hi Hatem, The reason for setting different client ids is to due to Kafka client metrics conflicts and the issue is documented here: https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kafka/#kafka-consumer-metrics. I think that the warning log is benign if you are using

Re: Question about Flink metrics

2023-05-04 Thread Mason Chen
Hi Neha, For the jobs you care about, you can attach additional labels using `scope-variables-additional` [1]. The example located in the same page showcases how you can configure KV pairs in its map configuration. Be sure to replace the reporter name with the name of your prometheus reporter! [1

Re: Waiting for a signal on one stream to start processing on another

2023-03-08 Thread Mason Chen
Hi Yuval, It seems you are trying to perform bootstrapping on a Flink job by doing the bounded read first. A good pattern to follow is to use HybridSource [1] and the docs have some examples with File and Kafka sources. The point of switching can be coordinated by the source so that you can dynami

Re: Various Flink Deployment States

2023-03-02 Thread Mason Chen
a little clearer)? On Thu, Mar 2, 2023 at 3:10 PM Mason Chen wrote: > Hi all, > > There are quite a few states or statuses for a Flink deployment e.g. > deployment status, job status, job manager status etc. I understand these > are useful to debug an error with deployment

Various Flink Deployment States

2023-03-02 Thread Mason Chen
Hi all, There are quite a few states or statuses for a Flink deployment e.g. deployment status, job status, job manager status etc. I understand these are useful to debug an error with deployment since there are multiple points of failure. However, I want to understand how a user can verify that a

Re: Fast and slow stream sources for Interval Join

2023-02-27 Thread Mason Chen
Hi all, It's true that the problem can be handled by caching records in state. However, there is an alternative using `watermark alignment` with Flink 1.15+ [1] which does the desired synchronization that you described while reducing the size of state from the former approach. To use this with tw

Flink K8s operator pod section of CRD

2023-02-23 Thread Mason Chen
Hi all, Why does the FlinkDeployment CRD refer to the Pod class instead of the PodTemplate class from the fabric8 library? As far as I can tell, the only difference is that the Pod class exposes the PodStatus, which doesn't seem mutable. Thanks in advance! Best, Mason

Re: Kafka Sink Kafka Producer metrics?

2023-02-06 Thread Mason Chen
Hi Andrew, I misread the docs: `register.producer.metrics` is mentioned here, but it is not on by default. https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#kafka-connector-metrics Best, Mason On Mon, Feb 6, 2023 at 6:19 PM Mason Chen wrote: > Hi And

Re: Kafka Sink Kafka Producer metrics?

2023-02-06 Thread Mason Chen
Hi Andrew, Unfortunately, the functionality is undocumented, but you can set the property `register.producer.metrics` to true in your Kafka client properties map. This is a JIRA to document the feature: https://issues.apache.org/jira/browse/FLINK-30932 Best, Mason On Mon, Feb 6, 2023 at 11:49 AM

Re: Flink Kubernetes Operator podTemplate and 'app' pod label bug?

2023-01-19 Thread Mason Chen
@Andrew I was also confused by this earlier and FYI this line where it is referenced https://github.com/apache/flink-kubernetes-operator/blame/7d5bf9536bdfbf86de5803766b28e503cd32ee04/flink-kubernetes-standalone/src/main/java/org/apache/flink/kubernetes/operator/utils/StandaloneKubernetesUtils.java

Re: [Security] - Critical OpenSSL Vulnerability

2022-11-02 Thread Mason Chen
me would be >>> high in effort and longer in duration as well . >>> >>> Thanks, >>> Prasanna >>> >>> On Tue, Nov 1, 2022 at 11:30 AM Prasanna kumar < >>> prasannakumarram...@gmail.com> wrote: >>> >>>> If flink version 1.1

Re: [Security] - Critical OpenSSL Vulnerability

2022-10-31 Thread Mason Chen
Hi Tamir and Martjin, We have also noticed this internally. So far, we have found that the *latest* Flink Java 11/Scala 2.12 docker images *1.14, 1.15, and 1.16* are affected, which all have the *openssl 3.0.2 *dependency. It would be good to discuss an emergency release when this patch comes out

Re: Job uptime metric in Flink Operator managed cluster

2022-10-13 Thread Mason Chen
Hi all, I think what Meghajit is trying to understand is how to measure the uptime of a submitted Flink job. Prior to the K8s operator, perhaps the job manager was torn down with the job shutdown so the uptime value would stop; therefore, the uptime value also measures how long the job was running

Re: KafkaSource, consumer assignment, and Kafka ‘stretch’ clusters for multi-DC streaming apps

2022-10-12 Thread Mason Chen
; Hi Andrew, > > While definitely no expert on this topic, my first thought was if this > idea could be solved with the idea that was proposed in FLIP-246 > https://cwiki.apache.org/confluence/display/FLINK/FLIP-246%3A+Multi+Cluster+Kafka+Source > > I'm also looping in Mason Ch

Re: Switching kafka brokers

2022-10-06 Thread Mason Chen
Hi Lars, That sounds like a painful process. Since the offsets are inconsistent, I would suggest to reset the Kafka source state by changing the `uid`, set the source to start from earliest if you haven't already, make the bootstrap server change, and restart your job with allowNonRestoredState en

Re: Flink KafkaSource still referencing deleted topic

2022-10-04 Thread Mason Chen
Hi Martjin, I notice that this question comes up quite often. Would this be a good addition to the KafkaSource documentation? I'd be happy to contribute to the documentation. Best, Mason On Tue, Oct 4, 2022 at 11:23 AM Martijn Visser wrote: > Hi Robert, > > Based on > https://stackoverflow.com

Re: Hybrid Source stop processing files after processing 128 SourceFactories

2022-07-27 Thread Mason Chen
ira/browse/FLINK-27479>?* > > > > And thanks for the original fix. > > > > *From: *Mason Chen > *Date: *Tuesday, July 26, 2022 at 9:57 PM > *To: *Benenson, Michael > *Cc: *user@flink.apache.org , Deshpande, Omkar < > omkar_deshpa...@intuit.com>, Rosenswe

Re: Hybrid Source stop processing files after processing 128 SourceFactories

2022-07-26 Thread Mason Chen
Hi Michael, I'm glad the CPU fix works for you! Regarding the behavior, HybridSource should only consume from Kafka after it finishes the bounded read of the files. At that time, files will not be read anymore. In addition, there is no limitation where there can only be 128 source factories (the

Flink application mode, multiple jobs

2022-07-14 Thread Mason Chen
Hi all, Is there any limitation on the number of jobs you can deploy together within the same Flink application? We are noticing some exceptions related to task slots at job startup. It typically recovers after 10-20 minutes. What are some of the recommended configurations that we can tune to all

Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-07-14 Thread Mason Chen
; > > I'm sure there's a PMC (*hint*) out there who can grant you access to > > create a FLIP. Looking forward to it, this sounds like an improvement > that > > users are looking forward to. > > > > Best regards, > > > > Martijn > > > &

Re: Configure a kafka source dynamically (???)

2022-07-08 Thread Mason Chen
Hi Salva, I was the contributor on the ticket and have updated the PR. Sorry for the delay! Meanwhile, you can use reflection to set the KafkaSubscriber if you need to have an immediate solution. With respect to your control message idea, what is the motivation to use a push based mechanism vs po

Re: influxdb metrics reporter - 4k series per job restart

2022-07-01 Thread Mason Chen
Hi all, If you can wait for Flink 1.16, there is a new feature to filter metrics (includes/excludes filter). Additionally, you can already take advantage of dropping unnecessary labels with `scope.variables.excludes` in the current release. Link to 1.16 metric features: https://nightlies.apache.or

Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-28 Thread Mason Chen
ure. > > Thanks for the effort on this and looking forward to your FLIP! > > Best, > Qingsheng > > > On Jun 24, 2022, at 09:43, Mason Chen wrote: > > > > Hi community, > > > > We have been working on a Multi Cluster Kafka Source and are looking to &g

[DISCUSS] Contribution of Multi Cluster Kafka Source

2022-06-23 Thread Mason Chen
Hi community, We have been working on a Multi Cluster Kafka Source and are looking to contribute it upstream. I've given a talk about the features and design at a Flink meetup: https://youtu.be/H1SYOuLcUTI. The main features that it provides is: 1. Reading multiple Kafka clusters within a single

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-04 Thread Mason Chen
Nice work Peter! Looking forward to the fix. @ChangZhou Kafka metrics are emitted from the source and the process function would be a different operator. For the datastream API, you can set `KafkaSourceOptions.REGISTER_KAFKA_CONSUMER_METRICS.key()` as `false` in your consumer properties. Best, Ma

Re: Prometheus metrics does not work in 1.15.0 taskmanager

2022-05-03 Thread Mason Chen
Hi ChangZhou, The warning log indicates that the metric was previously defined and so the runtime is handling the "duplicate" metric by ignoring it. This is typically a benign message unless you rely on this metric. Is it possible that you are using the same task name for different tasks? It would

Re: Discuss making KafkaSubscriber Public

2022-04-20 Thread Mason Chen
interface may be subject to change if requirements change in the future. Anyone have any opinions? Thanks! Best, Mason On Wed, Apr 13, 2022 at 10:07 AM Mason Chen wrote: > Hi Chesnay, > > Typically, users want to plug in a KafkaSubscriber that depends on an > external system [1][2]. W

Re: Discuss making KafkaSubscriber Public

2022-04-13 Thread Mason Chen
ion > logics. > > > > +1 (non-binding) > > > > Cheers, > > > > Qingsheng > > > >> On Apr 12, 2022, at 11:46, Mason Chen wrote: > >> > >> Hi Flink Devs, > >> > >> I was looking to contribute to > htt

Discuss making KafkaSubscriber Public

2022-04-11 Thread Mason Chen
Hi Flink Devs, I was looking to contribute to https://issues.apache.org/jira/browse/FLINK-24660, which is a ticket to track changing the KafkaSubscriber from Internal to PublicEvolving. In the PR, it seems a few of us have agreement on making the subscriber pluggable in the KafkaSource, but I'd l

Re: KafkaPartitionSplitReader handleSplitsChanges

2022-03-02 Thread Mason Chen
Or is the motivation that resolving the committed/latest offsets is an infrequent event (and only for bounded read) so the optimization is not worth it? On Wed, Mar 2, 2022 at 2:16 PM Mason Chen wrote: > Hi all, > > I noticed in the javadocs that SplitReaders should not have a

KafkaPartitionSplitReader handleSplitsChanges

2022-03-02 Thread Mason Chen
Hi all, I noticed in the javadocs that SplitReaders should not have a blocking handleSplitsChanges implementation: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/splitreader/SplitReader.java#L55 However

Re: MetricRegistryTestUtils java class (flink-runtime/metrics) not found in source code version 1.14.3

2022-02-25 Thread Mason Chen
Hi Prasanna, Why do you need histograms vs summaries? I'm curious about the change and want to see if it applies to my usage of the PrometheusReporter. Best, Mason On Mon, Jan 31, 2022 at 11:51 PM Martijn Visser wrote: > Hi Prasanna, > > Just a quick note that the Github links are all pointing

Prom Pushgateway Reporter HTTPS support

2022-01-18 Thread Mason Chen
Hi all, There is some interest from our users to use prometheus push gateway reporter with a https endpoint. So, I've filed https://issues.apache.org/jira/browse/FLINK-25697 and I figured that it would be acceptable since influxdb reporter supports something similar. Could someone assign me th

Re: unaligned checkpoint for job with large start delay

2022-01-11 Thread Mason Chen
> the critical path, so I don't see this happening soon :( > > Best, > Piotrek > > wt., 4 sty 2022 o 18:02 Mason Chen <mailto:mason.c...@apple.com>> napisał(a): > Hi Piotrek, > >> In other words, something (presumably a watermark) has fired more than 151

Re: unaligned checkpoint for job with large start delay

2022-01-04 Thread Mason Chen
gt; checkpoint can not make any progress. Is this number of triggered windows > plausible in your scenario? > > Best, > Piotrek > > > czw., 23 gru 2021 o 12:12 Mason Chen <mailto:mason.c...@apple.com>> napisał(a): > Hi Piotr, > > Thanks for the thorough r

Re: unaligned checkpoint for job with large start delay

2021-12-17 Thread Mason Chen
ckpoint barriers propagation time by > quite a lot. > > Best, > Piotrek > > [1] https://issues.apache.org/jira/browse/FLINK-23041 > <https://issues.apache.org/jira/browse/FLINK-23041> > wt., 14 gru 2021 o 22:04 Mason Chen <mailto:mas.chen6...@gmail.com>>

unaligned checkpoint for job with large start delay

2021-12-14 Thread Mason Chen
Hi all, I'm using Flink 1.13 and my job is experiencing high start delay, more so than high alignment time. (our flip 27 kafka source is heavily backpressured). Since our alignment timeout is set to 1s, the unaligned checkpoint never triggers since alignment delay is always below the threshold. I

Log level for insufficient task slots message

2021-12-03 Thread Mason Chen
Hi all, java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not acquire the minimum required resources. Is an exception/message that is thrown when the users misconfigures the job with insufficient task slots. Currently, t

Re: How do I configure commit offsets on checkpoint in new KafkaSource api?

2021-11-19 Thread Mason Chen
Hi Marco, > https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kafka/#additional-properties In the new KafkaSource, you can configure it in your properties. You can take a look at `KafkaSourceOptions#COMMIT_OFFSETS_ON_CHECKPOINT` for the specific config, which is defau

Re: Kafka Source Recovery Behavior

2021-11-10 Thread Mason Chen
Hi all, Any update on this? Best, Mason On Sat, Oct 30, 2021 at 5:56 AM Arvid Heise wrote: > This seems to be a valid concern but I'm not deep enough to clearly say > that this is indeed a bug. @renqschn could you > please double-check? > > On Thu, Oct 28, 2021 at 8:39

Re: How to refresh topics to ingest with KafkaSource?

2021-11-02 Thread Mason Chen
to take contribuitons (I flagged it as a starter task). > > On Wed, Oct 27, 2021 at 2:36 AM Mason Chen <mailto:mason.c...@apple.com>> wrote: > Hi all, > > I have a similar requirement to Preston. I created > https://issues.apache.org/jira/browse/FLINK-24660 > <

Kafka Source Recovery Behavior

2021-10-28 Thread Mason Chen
Hi all, I noticed that the KafkaSourceReader did not have a pointer to the KafkaSubscriber, so I was wondering if this could be a bug: 1. User has a flink job with topic set A and takes savepoint 2. User modifies flink job to read from topic set B; however, splits are still read from topic set A

Re: How to refresh topics to ingest with KafkaSource?

2021-10-26 Thread Mason Chen
Hi all, I have a similar requirement to Preston. I created https://issues.apache.org/jira/browse/FLINK-24660 to track this effort. Best, Mason > On Oct 18, 2021, at 1:59 AM, Arvid Heise wrote: > > Hi Preston, > > if you still need to set K

FlinkKafkaConsumer -> KafkaSource State Migration

2021-10-26 Thread Mason Chen
Hi all, I read these instructions for migrating to the KafkaSource: https://nightlies.apache.org/flink/flink-docs-release-1.14/release-notes/flink-1.14/#deprecate-flinkkafkaconsumer . Do we need to employ any uid/allowNonRestoredState tricks if our Flink job is also stateful outside of the source

Re: SplitEnumeratorContext callAsync() cleanup

2021-10-26 Thread Mason Chen
Hi Fabian, Unfortunately, I don't have the log since I was just testing it out on my local setup. I can try to reproduce it later in the week. Best, Mason On Mon, Oct 25, 2021 at 8:09 AM Fabian Paul wrote: > Hi Mason, > > Thanks for opening the ticket. Can you also share the log with us when t

Re: SplitEnumeratorContext callAsync() cleanup

2021-10-22 Thread Mason Chen
Hi Fabian, Here we are: https://issues.apache.org/jira/browse/FLINK-24622 Feel free to modify the description as I lazily copied and pasted our discussion here. Best, Mason > On Oct 22, 2021, at 3:31 AM, Fabian Paul wrote: > > Hi Mason, >

SplitEnumeratorContext callAsync() cleanup

2021-10-21 Thread Mason Chen
Hi all, I was wondering how to cancel a task that is enqueued by the callAsync() method, the one that takes in a time interval. For example, the KafkaSource uses this for topic partition discovery. It would be straightforward if the API returned the underlying future so that a process can cancel i

SplitFetcherManager custom error handler

2021-10-18 Thread Mason Chen
Hi all, I am implementing a Kafka connector with some custom error handling that is aligned with our internal infrastructure. `SplitFetcherManager` has a hardcoded error handler in the constructor and I was wondering if it could be exposed by the classes that extend it. Happy to contribute if peop

Removing metrics

2021-10-14 Thread Mason Chen
Hi all, Suppose I have a short lived process within a UDF that defines metrics. After the process has completed, the underlying resources should be cleaned up. Is there an API to remove/unregister metrics? Best, Mason

Kafka Partition Discovery

2021-09-22 Thread Mason Chen
Hi all, We are sometimes facing a connection issue with Kafka when a broker restarts ``` java.lang.RuntimeException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.runWithPartitio

Re: Tracking Total Metrics Reported

2021-09-15 Thread Mason Chen
For it to be most useful, the user should be able to obtain the total number of counters, gauges, meters, and histograms, separately. On Wed, Sep 15, 2021 at 6:23 PM Mason Chen wrote: > Hi all, > > Does Flink have any sort of feature to track the total number of metrics > reported

Tracking Total Metrics Reported

2021-09-15 Thread Mason Chen
Hi all, Does Flink have any sort of feature to track the total number of metrics reported by the Flink job? Ideally, the total would be reported by the job manager. Even if there is a log that exposes this information, that would be helpful! Best, Mason

Obtaining Flink Conf in User Code

2021-09-03 Thread Mason Chen
Hi all, Is it possible to obtain the Flink configuration in the user code? I've tried the Configuration parameter in the open method of rich functions and StreamExecutionEnvironment.getConfig().getGlobalJobParameters()--both do not give the configs in the flink configuration. Best, Mason

Re: Kafka Metrics

2021-08-23 Thread Mason Chen
> You could also implement a custom MetricReporter that delegates to your > actual reporter and filters the respective metrics. > > Best, > > Arvid > > On Fri, Aug 20, 2021 at 8:16 AM Mason Chen wrote: > >> FYI, I'm referring to the legacy offsets metric gaug

Re: Kafka Metrics

2021-08-19 Thread Mason Chen
FYI, I'm referring to the legacy offsets metric gauges. On Thu, Aug 19, 2021 at 4:53 PM Mason Chen wrote: > Hi all, > > We have found that the per partition Kafka metrics contributes to a lot of > metrics being indexed by our metrics system. > > We would still like to

Kafka Metrics

2021-08-19 Thread Mason Chen
Hi all, We have found that the per partition Kafka metrics contributes to a lot of metrics being indexed by our metrics system. We would still like to have the proxied kafka metrics from the kafka clients library. Is there a flag to only exclude Flink's additional Kafka metrics? Best, Mason

1.13 Flamegraphs

2021-08-06 Thread Mason Chen
Hi all, Does the sample processing also sample threads that do not belong to the Flink framework? For example, a background thread that is created by and managed by the user? Best, Mason

Re: as-variable configuration for state ac

2021-07-27 Thread Mason Chen
cket so > that we could resolve it later. > > > Best, > Yun Tang > ------ > *From:* Mason Chen > *Sent:* Tuesday, July 27, 2021 15:15 > *To:* Yun Tang > *Cc:* Mason Chen ; user@flink.apache.org < > user@flink.apache.org> > *Subj

Re: as-variable configuration for state ac

2021-07-27 Thread Mason Chen
f my understanding > is right, are you hoping that all state latency metrics for a particular > state could be aggregated per state level? > > > Best > Yun Tang > From: Mason Chen > Sent: Tuesday, July 27, 2021 4:24 > To: user@flink.apache.org > Subject: as-variab

as-variable configuration for state ac

2021-07-26 Thread Mason Chen
We have been using the state backend latency tracking metrics from Flink 1.13. To make metrics aggregation easier, could there be a config to expose something like `state.backend.rocksdb.metrics.column-family-as-variable` that rocksdb provides to do aggregation across column families. In this case

Re: User Classpath from Plugin

2021-07-13 Thread Mason Chen
I've read this page ( https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/filesystems/plugins/), but would like to know more about modifying the whitelist so I can read the class. On Tue, Jul 13, 2021 at 2:54 PM Mason Chen wrote: > Hi all, > > How can I

User Classpath from Plugin

2021-07-13 Thread Mason Chen
Hi all, How can I read the user classpath from a Flink plugin (e.g. one of the metric reporters)? Best, Mason

Flink Metric Reporting from Job Manager

2021-07-07 Thread Mason Chen
Hi all, Does Flink support reporting metrics from the main method that is ran on the Job Manager? In this case, we want to report a failure to add an operator to the Job Graph. Best, Mason

Re: Flink exported metrics scope configuration

2021-06-03 Thread Mason Chen
Hi Kai, You can use the excluded variables config for the reporter. metrics.reporter..scope.variables.excludes: (optional) A semi-colon (;) separate list of variables that should be ignored by tag-based reporters (e.g., Prometheus, InfluxDB). https://ci.apache.org/projects/flink/flink-docs-rel

Re: Prometheus Reporter Enhancement

2021-06-02 Thread Mason Chen
ternal re-structuring that we'd > like to do before extending the metric system further, because we've been > tacking on more and more things since it was released in 1.3.0 (!!!) but > barely refactored things to properly fit together. > > On 5/20/2021 12:58 AM, Mason Chen wro

Re: Flink Metrics Naming

2021-06-01 Thread Mason Chen
Upon further inspection, it seems like the user scope is not universal (i.e. comes through the connectors and not UDFs (like rich map function)), but the question still stands if the process makes sense. > On Jun 1, 2021, at 10:38 AM, Mason Chen wrote: > > Makes sense. We are

Re: Flink Metrics Naming

2021-06-01 Thread Mason Chen
structure and that's why the >> path is exposed via labels if I am not mistaken. So long story short, what >> you are seeing is a combination of how Flink organizes metrics and what can >> be reported to Prometheus. >> >> I am also pulling in Chesnay who is mor

Flink Metrics Naming

2021-05-28 Thread Mason Chen
Can anyone give insight as to why Flink allows 2 metrics with the same “name”? For example, getRuntimeContext.addGroup(“group”, “group1”).counter(“myMetricName”); And getRuntimeContext.addGroup(“other_group”, “other_group1”).counter(“myMetricName”); Are totally valid. It seems that it has l

Re: Prometheus Reporter Enhancement

2021-05-19 Thread Mason Chen
/jira/browse/FLINK-17495 > <https://issues.apache.org/jira/browse/FLINK-17495> > > On 5/18/2021 8:16 PM, Andrew Otto wrote: >> Sounds useful! >> >> On Tue, May 18, 2021 at 2:02 PM Mason Chen > <mailto:mason.c...@apple.com>> wrote: >> Hi all, >

Prometheus Reporter Enhancement

2021-05-18 Thread Mason Chen
Hi all, Would people appreciate enhancements to the prometheus reporter to include extra labels via a configuration, as a contribution to Flink? I can see it being useful for adding labels that are not job specific, but infra specific. The change would be nicely integrated with the Flink’s Conf