Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2025-02-11 Thread via GitHub


potiuk commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2652256341

   I'd actually say, the content of that table should be eventually kept in a 
yaml/toml file and it should be programmatically read and:
   
   a) separate page in documentation should be generated for each error type
   b) index to those page should be generated as well
   c) all error codes referred to in the `AirflowEnumeratedException` should be 
checked against this meta-data (and fail pre-commit if you are referring to 
non-existing error).
   d) CLI should display the information
   
   Important is to have one single source of truth where the data is read from


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2025-02-10 Thread via GitHub


omkar-foss commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-265968

   Yes, I'm looking into adding a new exception class 
`AirflowEnumeratedException` as described in the excerpt below from [this 
comment](https://github.com/apache/airflow/pull/44616#issuecomment-2639787911):
   >I think the next step could be to add a framework, where we could have a 
way to add the ERRORID to (say) AirflowException - or maybe beetter create 
AirflowEnumeratedException with obligatory error ID following the convention 
described here).
   
   Also the [markdown 
table](https://github.com/apache/airflow/blob/main/dev/AIRFLOW_ERROR_GUIDE.md) 
right now as it stands is intimidatingly huge. So just a random thought, I'm 
thinking may be we can eventually replace the markdown table with an Airflow 
CLI command - something like `airflow help `. Let's say a user gets an 
exception showing ERRORID `AERR052`, example command as follows:
   
   ```
   $ airflow help AERR052
   The error you're facing would be with this message: "Failed to resolve 
template variable".
   
   As per our observations, a possible cause could be as follows:
   Triggered when a task's templated field contains errors or undefined 
variables.
   
   To resolve this, as first step, you can try the following:
   Review the task templated fields for any errors or undefined variables.
   Ensure all variables are defined and passed correctly in the DAG.
   
   If this doesn't resolve your problem, you can check out the docs for more 
info:
   
https://airflow.apache.org/docs/apache-airflow/stable/concepts/operators.html#templating
   
   You may also ask your questions on the Airflow Slack #user-troubleshooting 
channel:
   https://apache-airflow.slack.com/messages/user-troubleshooting
   
   Happy Debugging! 🐞 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2025-02-09 Thread via GitHub


potiuk commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2646311283

   See: https://github.com/apache/airflow/pull/44616#issuecomment-2639787911. 
-> I think @omkar-foss will make an initial implementation and create an issue 
where others (you are of course welcome) will help to add and review the error 
messages while doing it for all the cases - similalrly as we did with 
provider's move @Dev-iL . So stay tuned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2025-02-09 Thread via GitHub


Dev-iL commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2646209698

   What's the next step here? Add custom fields (such as `possible_causes`) and 
formatting functions (for log, for docs, etc.) to `AirflowException`, so that 
subclasses can provide this information?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-12-03 Thread via GitHub


omkar-foss commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2514907930

   Hey all, apologies for the delay on this. I've created a very basic sheet 
with the Airflow error mapping, which we all can start adding to and improving 
further. For further details, kindly refer to this Airflow community slack 
thread 
[here](https://apache-airflow.slack.com/archives/C07J87PK1BK/p1733239968796499).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-11-27 Thread via GitHub


omkar-foss commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2505372476

   >I'm working on a doc to describe a list of all Airflow-related exceptions - 
starting with the AirflowException (as @potiuk mentioned 
https://github.com/apache/airflow/issues/43171#issuecomment-2445213423) as 
AERR001, and subsequent error codes assigned incrementally in a bread-first 
order. Will share the doc in the next few days.
   
   Hi, I'm still working on this, got caught up with other things. Will share 
the list in the next couple of days or so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-11-11 Thread via GitHub


omkar-foss commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2468648123

   Nice to hear from you @kunaljubce.
   
   I'm working on a doc to describe a list of all Airflow-related exceptions - 
starting with the `AirflowException` (as @potiuk mentioned 
[above](https://github.com/apache/airflow/issues/43171#issuecomment-2445213423))
 as `AERR001`, and subsequent error codes assigned incrementally in a 
bread-first order. Will share the doc in the next few days.
   
   We can then update that list as required based on per further discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-11-10 Thread via GitHub


kunaljubce commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2466733440

   @potiuk @omkar-foss I really like how this discussion is shaping up. Have we 
established any guidelines or SOPs around how to designate the error codes? Or 
if there's a thread where this discussion is ongoing, would be happy to 
contribute (both via discussions and PR). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-10-29 Thread via GitHub


potiuk commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2445213423

   I really like it.
   
   We could **finally** find a use for AirflowException - so far it was mainly 
about being a base class for a number of exceptions, but if we add mandatry 
"error id" to AirflowException and make Airflow Exception abstract, and add 
handling so that that Error ID is displayed in the logs and maybe also produced 
as metric (counting the errors) and produce an event in the OTEL trace when 
they happen, might be really great mechanism to have and to "force" 
classification of all the errors that we have in Airlfow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-10-29 Thread via GitHub


potiuk commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2445207696

   :heart:  this. This is what many other tools are doing already. And being 
able to classify and list all the different types of errors that the software 
can generate, together with explaining their cause and remediations  - even 
just list those is a sign of high maturity of the software. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-10-29 Thread via GitHub


omkar-foss commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2444113421

   Have a suggestion for multi-possible-root-cause issues - we can print 
Airflow error code with the error message e.g. `AERR055: Job 10 was killed 
before it finished` and can have an error code mapping with possible root 
causes like (just examples, not real causes):
   
   | Error Code | Possible Commonly Observed Causes   |
   ||-|
   |  AERR055   | 1) Ran out of memory|
   || 2) Job was stuck and killed after timeout   |
   || 3) Job being run on Spot Instance Node (K8S on EKS) |
   
   Since error codes are shareable and easily searchable, it would be useful 
for team collaboration as well (e.g. instead of me saying "I'm looking into the 
error `Job 10 was killed before it finished`", can probably just say "I'm 
looking into AERR055". Much like how we use JIRA ticket numbers or GitHub 
issue/PR numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-10-25 Thread via GitHub


potiuk commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2435764433

   > but has to be done very carefully, because if it gives misleading advice 
it will lead users down chasing the wrong rabbit hole. For example this log in 
standard_task_runner.py is most of the time not due to memory running out: "Job 
%s was killed before it finished (likely due to running out of memory)",. I've 
seen our engineers chasing memory issues in vain countless of times because of 
that message. 
   
   I am big fan of "always tell the user what action from their side the error 
implies.". Agree things can be misleading and re the case you mentioned  - I 
cannot find it now but I think in case of such complicated and 
multi-possible-root-cause we should explain what's going on and link to a FAQ 
page on Airflow explaining possible reasons. This way when you have the error, 
and we find other reasons and more detailed explanations what could be wrong 
and how to remediate it - we can always update the docs and add more 
information that will be useful for many past versions of airflow that people 
will have.
   
   > (yes we should have filed a PR 😄)
   
   Absolutely :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]

2024-10-20 Thread via GitHub


hterik commented on issue #43171:
URL: https://github.com/apache/airflow/issues/43171#issuecomment-2425691353

   I can recommend this guide from Google about writing good error messages: 
https://developers.google.com/tech-writing/error-messages. The rest of the 
courses in that book are also really good btw.
   
   > an error like `Celery command failed on host` can be transformed or 
displayed with something like "`Please check your DAG processor timeout 
variable for this`". 
   
   Actionable errors are good, but has to be done very carefully, because if it 
gives misleading advice it will lead users down chasing the wrong rabbit hole. 
For example this log in `standard_task_runner.py` is most of the time not due 
to memory running out:  `"Job %s was killed before it finished (likely due to 
running out of memory)",`. I've seen our engineers chasing memory issues in 
vain countless of times because of that message. (yes we should have filed a PR 
:smile:)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]