This is an automated email from the ASF dual-hosted git repository.

ferruzzi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new 667b842632 metrics tagging documentation (#36627)
667b842632 is described below

commit 667b842632fbee984b940a3b8b5a1f0bb3749a0f
Author: Gopal Dirisala <39794726+dir...@users.noreply.github.com>
AuthorDate: Fri Jan 12 07:14:05 2024 +0530

    metrics tagging documentation (#36627)
    
    * metrics tagging documentation
---
 .../logging-monitoring/metrics.rst                 | 62 +++++++++++++++++-----
 1 file changed, 50 insertions(+), 12 deletions(-)

diff --git 
a/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst
 
b/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst
index aff3ec3e9a..4c91f37b71 100644
--- 
a/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst
+++ 
b/docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst
@@ -147,41 +147,59 @@ Name                                                      
             Descripti
                                                                        
``LocalTaskJob``
 ``local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code>`` Number 
of ``LocalTaskJob`` terminations with a ``<return_code>``
                                                                        while 
running a task ``<task_id>`` of a DAG  ``<dag_id>``.
+``local_task_job.task_exit``                                           Number 
of ``LocalTaskJob`` terminations with a ``<return_code>``
+                                                                       while 
running a task ``<task_id>`` of a DAG  ``<dag_id>``.
+                                                                       Metric 
with job_id, dag_id, task_id and return_code tagging.
 ``operator_failures_<operator_name>``                                  
Operator ``<operator_name>`` failures
+``operator_failures``                                                  
Operator ``<operator_name>`` failures. Metric with operator_name tagging.
 ``operator_successes_<operator_name>``                                 
Operator ``<operator_name>`` successes
-``ti_failures``                                                        Overall 
task instances failures
-``ti_successes``                                                       Overall 
task instances successes
-``previously_succeeded``                                               Number 
of previously succeeded task instances
-``zombies_killed``                                                     Zombie 
tasks killed
+``operator_successes``                                                 
Operator ``<operator_name>`` successes. Metric with operator_name tagging.
+``ti_failures``                                                        Overall 
task instances failures. Metric with dag_id and task_id tagging.
+``ti_successes``                                                       Overall 
task instances successes. Metric with dag_id and task_id tagging.
+``previously_succeeded``                                               Number 
of previously succeeded task instances. Metric with dag_id and task_id tagging.
+``zombies_killed``                                                     Zombie 
tasks killed. Metric with dag_id and task_id tagging.
 ``scheduler_heartbeat``                                                
Scheduler heartbeats
 ``dag_processing.processes``                                           
Relative number of currently running DAG parsing processes (ie this delta
-                                                                       is 
negative when, since the last metric was sent, processes have completed)
-``dag_processing.processor_timeouts``                                  Number 
of file processors that have been killed due to taking too long
+                                                                       is 
negative when, since the last metric was sent, processes have completed).
+                                                                       Metric 
with file_path and action tagging.
+``dag_processing.processor_timeouts``                                  Number 
of file processors that have been killed due to taking too long.
+                                                                       Metric 
with file_path tagging.
 ``dag_processing.sla_callback_count``                                  Number 
of SLA callbacks received
 ``dag_processing.other_callback_count``                                Number 
of non-SLA callbacks received
 ``dag_processing.file_path_queue_update_count``                        Number 
of times we've scanned the filesystem and queued all existing dags
 ``dag_file_processor_timeouts``                                        
(DEPRECATED) same behavior as ``dag_processing.processor_timeouts``
 ``dag_processing.manager_stalls``                                      Number 
of stalled ``DagFileProcessorManager``
 ``dag_file_refresh_error``                                             Number 
of failures loading any DAG files
-``scheduler.tasks.killed_externally``                                  Number 
of tasks killed externally
+``scheduler.tasks.killed_externally``                                  Number 
of tasks killed externally. Metric with dag_id and task_id tagging.
 ``scheduler.orphaned_tasks.cleared``                                   Number 
of Orphaned tasks cleared by the Scheduler
 ``scheduler.orphaned_tasks.adopted``                                   Number 
of Orphaned tasks adopted by the Scheduler
 ``scheduler.critical_section_busy``                                    Count 
of times a scheduler process tried to get a lock on the critical
                                                                        section 
(needed to send tasks to the executor) and found it locked by
                                                                        another 
process.
-``sla_missed``                                                         Number 
of SLA misses
-``sla_callback_notification_failure``                                  Number 
of failed SLA miss callback notification attempts
-``sla_email_notification_failure``                                     Number 
of failed SLA miss email notification attempts
+``sla_missed``                                                         Number 
of SLA misses. Metric with dag_id and task_id tagging.
+``sla_callback_notification_failure``                                  Number 
of failed SLA miss callback notification attempts. Metric with dag_id and 
func_name tagging.
+``sla_email_notification_failure``                                     Number 
of failed SLA miss email notification attempts. Metric with dag_id tagging.
 ``ti.start.<dag_id>.<task_id>``                                        Number 
of started task in a given dag. Similar to <job_name>_start but for task
+``ti.start``                                                           Number 
of started task in a given dag. Similar to <job_name>_start but for task.
+                                                                       Metric 
with dag_id and task_id tagging.
 ``ti.finish.<dag_id>.<task_id>.<state>``                               Number 
of completed task in a given dag. Similar to <job_name>_end but for task
+``ti.finish``                                                          Number 
of completed task in a given dag. Similar to <job_name>_end but for task
+                                                                       Metric 
with dag_id and task_id tagging.
 ``dag.callback_exceptions``                                            Number 
of exceptions raised from DAG callbacks. When this happens, it
-                                                                       means 
DAG callback is not working.
+                                                                       means 
DAG callback is not working. Metric with dag_id tagging
 ``celery.task_timeout_error``                                          Number 
of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker.
 ``celery.execute_command.failure``                                     Number 
of non-zero exit code from Celery task.
-``task_removed_from_dag.<dag_id>``                                     Number 
of tasks removed for a given dag (i.e. task no longer exists in DAG)
+``task_removed_from_dag.<dag_id>``                                     Number 
of tasks removed for a given dag (i.e. task no longer exists in DAG).
+``task_removed_from_dag``                                              Number 
of tasks removed for a given dag (i.e. task no longer exists in DAG).
+                                                                       Metric 
with dag_id and run_type tagging.
 ``task_restored_to_dag.<dag_id>``                                      Number 
of tasks restored for a given dag (i.e. task instance which was
                                                                        
previously in REMOVED state in the DB is added to DAG file)
+``task_restored_to_dag.<dag_id>``                                      Number 
of tasks restored for a given dag (i.e. task instance which was
+                                                                       
previously in REMOVED state in the DB is added to DAG file).
+                                                                       Metric 
with dag_id and run_type tagging.
 ``task_instance_created_<operator_name>``                              Number 
of tasks instances created for a given Operator
+``task_instance_created``                                              Number 
of tasks instances created for a given Operator.
+                                                                       Metric 
with dag_id and run_type tagging.
 ``triggerer_heartbeat``                                                
Triggerer heartbeats
 ``triggers.blocked_main_thread``                                       Number 
of triggers that blocked the main thread (likely due to not being
                                                                        fully 
asynchronous)
@@ -213,11 +231,18 @@ Name                                                
Description
 ``executor.queued_tasks``                           Number of queued tasks on 
executor
 ``executor.running_tasks``                          Number of running tasks on 
executor
 ``pool.open_slots.<pool_name>``                     Number of open slots in 
the pool
+``pool.open_slots``                                 Number of open slots in 
the pool. Metric with pool_name tagging.
 ``pool.queued_slots.<pool_name>``                   Number of queued slots in 
the pool
+``pool.queued_slots``                               Number of queued slots in 
the pool. Metric with pool_name tagging.
 ``pool.running_slots.<pool_name>``                  Number of running slots in 
the pool
+``pool.running_slots``                              Number of running slots in 
the pool. Metric with pool_name tagging.
 ``pool.deferred_slots.<pool_name>``                 Number of deferred slots 
in the pool
+``pool.deferred_slots``                             Number of deferred slots 
in the pool. Metric with pool_name tagging.
 ``pool.starving_tasks.<pool_name>``                 Number of starving tasks 
in the pool
+``pool.starving_tasks``                             Number of starving tasks 
in the pool. Metric with pool_name tagging.
 ``triggers.running.<hostname>``                     Number of triggers 
currently running for a triggerer (described by hostname)
+``triggers.running``                                Number of triggers 
currently running for a triggerer (described by hostname).
+                                                    Metric with hostname 
tagging.
 =================================================== 
========================================================================
 
 Timers
@@ -231,17 +256,30 @@ Name                                                      
       Description
 ``dag.<dag_id>.<task_id>.duration``                              Seconds taken 
to run a task
 ``task.duration``                                                Seconds taken 
to run a task. Metric with dag_id and task-id tagging.
 ``dag.<dag_id>.<task_id>.scheduled_duration``                    Seconds a 
task spends in the Scheduled state, before being Queued
+``task.scheduled_duration``                                      Seconds a 
task spends in the Scheduled state, before being Queued.
+                                                                 Metric with 
dag_id and task_id tagging.
 ``dag.<dag_id>.<task_id>.queued_duration``                       Seconds a 
task spends in the Queued state, before being Running
+``task.queued_duration``                                         Seconds a 
task spends in the Queued state, before being Running.
+                                                                 Metric with 
dag_id and task_id tagging.
 ``dag_processing.last_duration.<dag_file>``                      Seconds taken 
to load the given DAG file
+``dag_processing.last_duration``                                 Seconds taken 
to load the given DAG file. Metric with file_name tagging.
 ``dagrun.duration.success.<dag_id>``                             Seconds taken 
for a DagRun to reach success state
+``dagrun.duration.success``                                      Seconds taken 
for a DagRun to reach success state.
+                                                                 Metric with 
dag_id and run_type tagging.
 ``dagrun.duration.failed.<dag_id>``                              Seconds taken 
for a DagRun to reach failed state
+``dagrun.duration.failed``                                       Seconds taken 
for a DagRun to reach failed state.
+                                                                 Metric with 
dag_id and run_type tagging.
 ``dagrun.schedule_delay.<dag_id>``                               Milliseconds 
of delay between the scheduled DagRun
                                                                  start date 
and the actual DagRun start date
+``dagrun.schedule_delay``                                        Milliseconds 
of delay between the scheduled DagRun
+                                                                 start date 
and the actual DagRun start date. Metric with dag_id tagging.
 ``scheduler.critical_section_duration``                          Milliseconds 
spent in the critical section of scheduler loop --
                                                                  only a single 
scheduler can enter this loop at a time
 ``scheduler.critical_section_query_duration``                    Milliseconds 
spent running the critical section task instance query
 ``scheduler.scheduler_loop_duration``                            Milliseconds 
spent running one scheduler loop
 ``dagrun.<dag_id>.first_task_scheduling_delay``                  Seconds 
elapsed between first task start_date and dagrun expected start
+``dagrun.first_task_scheduling_delay``                           Seconds 
elapsed between first task start_date and dagrun expected start.
+                                                                 Metric with 
dag_id and run_type tagging.
 ``collect_db_dags``                                              Milliseconds 
taken for fetching all Serialized Dags from DB
 ``kubernetes_executor.clear_not_launched_queued_tasks.duration`` Milliseconds 
taken for clearing not launched queued tasks in Kubernetes Executor
 ``kubernetes_executor.adopt_task_instances.duration``            Milliseconds 
taken to adopt the task instances in Kubernetes Executor

Reply via email to