Hi Andrew!

I think you are completely right, this is a bug. The per namespace metrics
do not seem to filter per namespace and show the aggregated global count
for each namespace:

I opened a ticket:
https://issues.apache.org/jira/browse/FLINK-32164

Thanks for reporting this!
Gyula

On Mon, May 22, 2023 at 10:49 PM Andrew Otto <o...@wikimedia.org> wrote:

> Also!  I do have 2 FlinkDeployments deployed with this operator, but they
> are in different namespaces, and each of the per namespace metrics reports
> that it has 2 Deployments in them, even though there is only one according
> to kubectl.
>
> Actually...we just tried to deploy a change (enabling some checkpointing)
> that caused one of our FlinkDeployments to fail.  Now, both namespace
> STABLE_Counts each report 1.
>
> # curl -s <pod_ip>:<prom_port> | grep
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
> 1.0
> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="rdf_streaming_updater",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
> 1.0
>
> It looks like maybe this metric is not reporting per namespace, but a
> global count.
>
>
>
> On Mon, May 22, 2023 at 2:56 PM Andrew Otto <o...@wikimedia.org> wrote:
>
>> Oh, FWIW, I do have operator HA enabled with 2 replicas running, but in
>> my examples there, I am curl-ing the leader flink operator pod.
>>
>>
>>
>> On Mon, May 22, 2023 at 2:47 PM Andrew Otto <o...@wikimedia.org> wrote:
>>
>>> Hello!
>>>
>>> I'm doing some grafana+prometheus dashboarding for
>>> flink-kubernetes-operator.  Reading metrics docs
>>> <https://stackoverflow.com/a/61795256>, I see that I have nice per k8s
>>> namespace lifecycle current count gauge metrics in Prometheus.
>>>
>>> Using kubectl, I can see that I have one FlinkDeployment in my namespace:
>>>
>>> # kubectl -n stream-enrichment-poc get flinkdeployments
>>> NAME             JOB STATUS   LIFECYCLE STATE
>>> flink-app-main   RUNNING      STABLE
>>>
>>> But, prometheus is reporting that I have 2 FlinkDeployments in the
>>> STABLE state.
>>>
>>> # curl -s <pod_ip>:<prom_port>  | grep
>>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count
>>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
>>> 2.0
>>>
>>> I'm not sure why I see 2.0 reported.
>>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count reports only
>>> one FlinkDeployment.
>>>
>>> # curl <pod_ip>:<prom_port>/metrics | grep
>>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count
>>> flink_k8soperator_namespace_JmDeploymentStatus_READY_Count{resourcetype="FlinkDeployment",resourcens="stream_enrichment_poc",name="flink_kubernetes_operator",host="flink_kubernetes_operator_86b888d6b6_gbrt4",namespace="flink_operator",}
>>> 1.0
>>>
>>> Is it possible that
>>> flink_k8soperator_namespace_Lifecycle_State_STABLE_Count is being
>>> reported as an incrementing counter instead of a guage?
>>>
>>> Thanks
>>> -Andrew Otto
>>>  Wikimedia Foundation
>>>
>>>

Reply via email to