If anyone is interested, flink_taskmanager_job_task_operator_functions_<functionNamespace>_<functionName>_<specific_metric> is the way to get them.
Thanks again.
Best regards,
Oliver
Von: "Biao Geng" <biaoge...@gmail.com>
An: "Oliver Schmied" <uncharted...@gmx.at>
Cc: user@flink.apache.org
Betreff: Re: Help with monitoring metrics of StateFun runtime with prometheus
Dear Apache Flink community,
I am setting up an apche flink statefun runtime on Kubernetes, following the flink-playground example: https://github.com/apache/flink-statefun-playground/tree/main/deployments/k8s.
This is the manifest I used for creating the statefun enviroment:
```---
apiVersion: v1
kind: ConfigMap
metadata:
namespace: statefun
name: flink-config
labels:
app: statefun
data:
flink-conf.yaml: |+
jobmanager.rpc.address: statefun-master
taskmanager.numberOfTaskSlots: 1
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
classloader.parent-first-patterns.additional: org.apache.flink.statefun;org.apache.kafka;com.google.protobuf
state.backend: rocksdb
state.backend.rocksdb.timer-service.factory: ROCKSDB
state.backend.incremental: true
parallelism.default: 1
s3.access-key: minioadmin
s3.secret-key: minioadmin
state.checkpoints.dir: s3://checkpoints/subscriptions
s3.endpoint: http://minio.statefun.svc.cluster.local:9000
s3.path-style-access: true
jobmanager.memory.process.size: 1g
taskmanager.memory.process.size: 1glog4j-console.properties: |+
monitorInterval=30
rootLogger.level = INFO
rootLogger.appenderRef.console.ref = ConsoleAppender
logger.akka.name = akka
logger.akka.level = INFO
logger.kafka.name= org.apache.kafka
logger.kafka.level = INFO
logger.hadoop.name = org.apache.hadoop
logger.hadoop.level = INFO
logger.zookeeper.name = org.apache.zookeeper
logger.zookeeper.level = INFO
appender.console.name = ConsoleAppender
appender.console.type = CONSOLE
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
logger.netty.level = OFF---
apiVersion: v1
kind: Service
metadata:
name: statefun-master-rest
namespace: statefun
spec:
type: NodePort
ports:
- name: rest
port: 8081
targetPort: 8081
selector:
app: statefun
component: master
---
apiVersion: v1
kind: Service
metadata:
name: statefun-master
namespace: statefun
spec:
type: ClusterIP
ports:
- name: rpc
port: 6123
- name: blob
port: 6124
- name: ui
port: 8081
selector:
app: statefun
component: master
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: statefun-master
namespace: statefun
spec:
replicas: 1
selector:
matchLabels:
app: statefun
component: master
template:
metadata:
labels:
app: statefun
component: master
spec:
containers:
- name: master
image: apache/flink-statefun:3.3.0
imagePullPolicy: IfNotPresent
env:
- name: ROLE
value: master
- name: MASTER_HOST
value: statefun-master
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 8081
name: ui
livenessProbe:
tcpSocket:
port: 6123
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf
- name: module-config-volume
mountPath: /opt/statefun/modules/example
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: module-config-volume
configMap:
name: module-config
items:
- key: module.yaml
path: module.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: statefun-worker
namespace: statefun
spec:
replicas: 1
selector:
matchLabels:
app: statefun
component: worker
template:
metadata:
labels:
app: statefun
component: worker
spec:
containers:
- name: worker
image: apache/flink-statefun:3.3.0
imagePullPolicy: IfNotPresent
env:
- name: ROLE
value: worker
- name: MASTER_HOST
value: statefun-master
resources:
requests:
memory: "1Gi"
ports:
- containerPort: 6122
name: rpc
- containerPort: 6124
name: blob
- containerPort: 8081
name: ui
livenessProbe:
tcpSocket:
port: 6122
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf
- name: module-config-volume
mountPath: /opt/statefun/modules/example
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: module-config-volume
configMap:
name: module-config
items:
- key: module.yaml
path: module.yaml```
Problem:
I could not find any sources that describe how to monitor the flink metrics of the statefun runtime with Prometheus on Kubernetes. I am particular interested in the flink statefun specific metrics (https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/deployment/metrics/)
Could someone please guide me on how to set this up, or share any resources that cover this topic?
Any help or suggestions would be greatly appreciated.Thanks for your time and help.
Best regards,
Oliver