waldoppper opened a new issue, #29268:
URL: https://github.com/apache/airflow/issues/29268

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   [2.2.5]
   I recently enabled statsd metric emission on my team's deploy and began 
writing views on the result. 
   
   I've found that:
   1. some dag authors had included a period in their task_id name - like 
`task_id = "load.all"`
   2. this string is not sanitized of statsd-sensitive characters (like '.' and 
'|') prior to the [building of the Metric 
name](https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py#L1504)
   
   Result: metric names with too many .-delimiters and grafana views which 
don't render these task metrics. 
   
   ### What you think should happen instead
   
   Any instance of `Statsd.foo(f'{dag_id}...')` or 
`Statsd.foo(f'{task_id}...')` should probably first sanitize those strings of 
statsd-sensitive characters. 
   
   So instead of
   
   Current: 
`Statsd.timer("dag.{self.task.dag_id}.{self.task.task_id}.duration")` 
   
   Recommended: 
`Stats.timer(f"dag.{statsd_sanitize(self.task.dag_id)}.{statsd_sanitize(self.task.task_id)}.duration")`
   
   ### How to reproduce
   
   1. create a dag
   2. write a task in that dag with task_id which contains a period. Like: 
"foo.bar"
   3. enable statsd metrics collection
   4. run your dag
   5. struggle to deal with poorly-formatted metric names like 
`"dag.DAG.WITH.PERIODS.IN.NAME.TASK.WITH.PERIODS.IN.NAME.duration"`, the client 
emits `"dag.DAG_WITH_PERIODS_IN_NAME.TASK_WITH_PERIODS_IN_NAME.duration"` 
   
   ### Operating System
   
   debian
   
   ### Versions of Apache Airflow Providers
   
   ```apache-airflow-providers-amazon==3.2.0
   apache-airflow-providers-apache-hive==2.3.1
   apache-airflow-providers-apache-spark==2.1.2
   apache-airflow-providers-celery==2.1.3
   apache-airflow-providers-cncf-kubernetes==3.0.0
   apache-airflow-providers-docker==2.5.2
   apache-airflow-providers-elasticsearch==2.2.0
   apache-airflow-providers-ftp==2.1.2
   apache-airflow-providers-google==6.7.0
   apache-airflow-providers-grpc==2.0.4
   apache-airflow-providers-hashicorp==2.1.4
   apache-airflow-providers-http==2.1.2
   apache-airflow-providers-imap==2.2.3
   apache-airflow-providers-microsoft-azure==3.7.2
   apache-airflow-providers-mysql==2.2.3
   apache-airflow-providers-odbc==2.0.4
   apache-airflow-providers-postgres==4.1.0
   apache-airflow-providers-redis==2.0.4
   apache-airflow-providers-sendgrid==2.0.4
   apache-airflow-providers-sftp==2.5.2
   apache-airflow-providers-slack==4.2.3
   apache-airflow-providers-sqlite==2.1.3
   apache-airflow-providers-ssh==2.4.3
   ```
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Deployed with v8.6.1 of 
https://github.com/airflow-helm/charts/tree/main/charts/airflow 
   
   We wired up a pod to convert statsd->prom using 
https://github.com/prometheus/statsd_exporter, and use Matcher rules by 
glob/regex to convert statsd-style metrics into prom-style. 
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to