Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
potiuk commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2652256341 I'd actually say, the content of that table should be eventually kept in a yaml/toml file and it should be programmatically read and: a) separate page in documentation should be generated for each error type b) index to those page should be generated as well c) all error codes referred to in the `AirflowEnumeratedException` should be checked against this meta-data (and fail pre-commit if you are referring to non-existing error). d) CLI should display the information Important is to have one single source of truth where the data is read from -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
omkar-foss commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-265968 Yes, I'm looking into adding a new exception class `AirflowEnumeratedException` as described in the excerpt below from [this comment](https://github.com/apache/airflow/pull/44616#issuecomment-2639787911): >I think the next step could be to add a framework, where we could have a way to add the ERRORID to (say) AirflowException - or maybe beetter create AirflowEnumeratedException with obligatory error ID following the convention described here). Also the [markdown table](https://github.com/apache/airflow/blob/main/dev/AIRFLOW_ERROR_GUIDE.md) right now as it stands is intimidatingly huge. So just a random thought, I'm thinking may be we can eventually replace the markdown table with an Airflow CLI command - something like `airflow help `. Let's say a user gets an exception showing ERRORID `AERR052`, example command as follows: ``` $ airflow help AERR052 The error you're facing would be with this message: "Failed to resolve template variable". As per our observations, a possible cause could be as follows: Triggered when a task's templated field contains errors or undefined variables. To resolve this, as first step, you can try the following: Review the task templated fields for any errors or undefined variables. Ensure all variables are defined and passed correctly in the DAG. If this doesn't resolve your problem, you can check out the docs for more info: https://airflow.apache.org/docs/apache-airflow/stable/concepts/operators.html#templating You may also ask your questions on the Airflow Slack #user-troubleshooting channel: https://apache-airflow.slack.com/messages/user-troubleshooting Happy Debugging! 🐞 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
potiuk commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2646311283 See: https://github.com/apache/airflow/pull/44616#issuecomment-2639787911. -> I think @omkar-foss will make an initial implementation and create an issue where others (you are of course welcome) will help to add and review the error messages while doing it for all the cases - similalrly as we did with provider's move @Dev-iL . So stay tuned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
Dev-iL commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2646209698 What's the next step here? Add custom fields (such as `possible_causes`) and formatting functions (for log, for docs, etc.) to `AirflowException`, so that subclasses can provide this information? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
omkar-foss commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2514907930 Hey all, apologies for the delay on this. I've created a very basic sheet with the Airflow error mapping, which we all can start adding to and improving further. For further details, kindly refer to this Airflow community slack thread [here](https://apache-airflow.slack.com/archives/C07J87PK1BK/p1733239968796499). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
omkar-foss commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2505372476 >I'm working on a doc to describe a list of all Airflow-related exceptions - starting with the AirflowException (as @potiuk mentioned https://github.com/apache/airflow/issues/43171#issuecomment-2445213423) as AERR001, and subsequent error codes assigned incrementally in a bread-first order. Will share the doc in the next few days. Hi, I'm still working on this, got caught up with other things. Will share the list in the next couple of days or so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
omkar-foss commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2468648123 Nice to hear from you @kunaljubce. I'm working on a doc to describe a list of all Airflow-related exceptions - starting with the `AirflowException` (as @potiuk mentioned [above](https://github.com/apache/airflow/issues/43171#issuecomment-2445213423)) as `AERR001`, and subsequent error codes assigned incrementally in a bread-first order. Will share the doc in the next few days. We can then update that list as required based on per further discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
kunaljubce commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2466733440 @potiuk @omkar-foss I really like how this discussion is shaping up. Have we established any guidelines or SOPs around how to designate the error codes? Or if there's a thread where this discussion is ongoing, would be happy to contribute (both via discussions and PR). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
potiuk commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2445213423 I really like it. We could **finally** find a use for AirflowException - so far it was mainly about being a base class for a number of exceptions, but if we add mandatry "error id" to AirflowException and make Airflow Exception abstract, and add handling so that that Error ID is displayed in the logs and maybe also produced as metric (counting the errors) and produce an event in the OTEL trace when they happen, might be really great mechanism to have and to "force" classification of all the errors that we have in Airlfow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
potiuk commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2445207696 :heart: this. This is what many other tools are doing already. And being able to classify and list all the different types of errors that the software can generate, together with explaining their cause and remediations - even just list those is a sign of high maturity of the software. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
omkar-foss commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2444113421 Have a suggestion for multi-possible-root-cause issues - we can print Airflow error code with the error message e.g. `AERR055: Job 10 was killed before it finished` and can have an error code mapping with possible root causes like (just examples, not real causes): | Error Code | Possible Commonly Observed Causes | ||-| | AERR055 | 1) Ran out of memory| || 2) Job was stuck and killed after timeout | || 3) Job being run on Spot Instance Node (K8S on EKS) | Since error codes are shareable and easily searchable, it would be useful for team collaboration as well (e.g. instead of me saying "I'm looking into the error `Job 10 was killed before it finished`", can probably just say "I'm looking into AERR055". Much like how we use JIRA ticket numbers or GitHub issue/PR numbers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
potiuk commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2435764433 > but has to be done very carefully, because if it gives misleading advice it will lead users down chasing the wrong rabbit hole. For example this log in standard_task_runner.py is most of the time not due to memory running out: "Job %s was killed before it finished (likely due to running out of memory)",. I've seen our engineers chasing memory issues in vain countless of times because of that message. I am big fan of "always tell the user what action from their side the error implies.". Agree things can be misleading and re the case you mentioned - I cannot find it now but I think in case of such complicated and multi-possible-root-cause we should explain what's going on and link to a FAQ page on Airflow explaining possible reasons. This way when you have the error, and we find other reasons and more detailed explanations what could be wrong and how to remediate it - we can always update the docs and add more information that will be useful for many past versions of airflow that people will have. > (yes we should have filed a PR 😄) Absolutely :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Make Airflow error messages more specific, clear and actionable [airflow]
hterik commented on issue #43171: URL: https://github.com/apache/airflow/issues/43171#issuecomment-2425691353 I can recommend this guide from Google about writing good error messages: https://developers.google.com/tech-writing/error-messages. The rest of the courses in that book are also really good btw. > an error like `Celery command failed on host` can be transformed or displayed with something like "`Please check your DAG processor timeout variable for this`". Actionable errors are good, but has to be done very carefully, because if it gives misleading advice it will lead users down chasing the wrong rabbit hole. For example this log in `standard_task_runner.py` is most of the time not due to memory running out: `"Job %s was killed before it finished (likely due to running out of memory)",`. I've seen our engineers chasing memory issues in vain countless of times because of that message. (yes we should have filed a PR :smile:) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
