Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
bentorb commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2694108699 > [@bentorb](https://github.com/bentorb) maybe try and run your task with Airflow 3? We have beta just released Thank you @amoghrajesh. MWAA supports up to Airflow 2.10.3, so at the moment this is not an option for my team. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
potiuk commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2690517588 > Signal handling is now done by a) (supervisor) which watches the b) (task runner subprocess) and this should probably be fixed there :) I have not checked - but is timeout handling also done in supervisor? I was previously done in the equivalent of the task runner subprocess and that was the root cause of the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
amoghrajesh commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2689901099 @potiuk @dabla IIUC, are you checking with me if every "task" will now run a new subprocess? Signal handling is now done by a) (supervisor) which watches the b) (task runner subprocess) and this should probably be fixed there :) @bentorb maybe try and run your task with Airflow 3? We have beta just released -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
dabla commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2685159279 > This is a question of signal handling most likely. Airlfow's timeout is done using SIGALRM. If the driver runs a low-level c code that does not handle signals properly this behaviour might happen. Proper implementation of a c-library code is to periodically check in a loop if signal has been received and pass control back to python if it happened - but some c-level code (especially database drivers) might not handle it properly. > > The solution - discussed few times is to make another fork of the process and handle timeout in the separate fork, and we decided to defer it to after new Task SDK is completed - and maybe even in new Task SDK it's already handled ? [@ashb](https://github.com/ashb) [@amoghrajesh](https://github.com/amoghrajesh) ? > > Generally we have already two processes (parent and fork): > > a) supervisor that performs heartbeat b) task that executes the job there was also c) openlineage handling > > In Airlfow 2 timeout was implemented as siglarm handling inside b) which means that if b) executed bad-behaving C-code timeout was not happening. In Airflow 3 we have a bit different way of heartbeating and task execution and it's being still iterated on - and the options we could implement is to move the timout handling to a) or introduce another fork with timeout between a) and b). The former likely better from resource usage point of view. Thank you Jarek, this seems indeed to be a probable explanation. This proves that I have still a lot to learn from Airflow inner workings :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
potiuk commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2679680186 This is a question of signal handling most likely. Airlfow's timeout is done using SIGALRM. If the driver runs a low-level c code that does not handle signals properly this behaviour might happen. Proper implementation of a c-library code is to periodically check in a loop if signal has been received and pass control back to python if it happened - but some c-level code (especially database drivers) might not handle it properly. The solution - discussed few times is to make another fork of the process and handle timeout in the separate fork, and we decided to defer it to after new Task SDK is completed - and maybe even in new Task SDK it's already handled ? @ashb @amoghrajesh ? Generally we have already two processes (parent and fork): a) supervisor that performs heartbeat b) task that executes the job there was also c) openlineage handling In Airlfow 2 timeout was implemented as siglarm handling inside b) which means that if b) executed bad-behaving C-code timeout was not happening. In Airflow 3 we have a bit different way of heartbeating and task execution and it's being still iterated on - and the options we could implement is to move the timout handling to a) or introduce another fork with timeout between a) and b). The former likely better from resource usage point of view. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
dabla commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607756533 > > but now we can assume this is a generic issue related to database connections. So yes it's weird that worker doesn't get killed. > > I am not sure it explains it. Airflow runs a process (that happens to be database connection) Airflow should be able to kill the process. The fact that the issue happens for Sql but not for Python is really odd. From the description it sounds like something is blocking the schedule from killing the task (because it tried to kill it after its finished). I think that the issue is with the scheduler itself. Question, at least for me, is how do you debug something like this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
eladkal commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607703764 > but now we can assume this is a generic issue related to database connections. So yes it's weird that worker doesn't get killed. I am not sure it explains it. Airflow runs a process (that happens to be database connection) Airflow should be able to kill the process. The fact that the issue happens for Sql but not for Python is really odd. From the description it sounds like something is blocking the schedule from killing the task (because it tried to kill it after its finished). I think that the issue is with the scheduler itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
dabla commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607539721 > Should we apply this logic for the underlying get hook function that we are using? While not generic to all if we can provide workaround for connections that support it I think we should. Yet the underlying issue is not clear to me. The kill signal from timeout comes from the scheduler. Why would the task not respond to it? The `exectuion_timeout` is for Airflow, not for the underlying SQL engine. We could, but that possibly would not work for all cases and would be a workaround of the real issue. It's indeed a weird issue and I was also surprised by that behaviour. Back then I thought it was related to the Jdbc connection and maybe there was some glitch between the Python and Java process, but now we can assume this is a generic issue related to database connections. So yes it's weird that worker doesn't get killed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
eladkal commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-260748 Should we apply this logic for the underlying get hook function that we are using? While not generic to all if we can provide workaround for connections that support it I think we should. Yet the underlying issue is not clear to me. The kill signal from timeout comes from the scheduler. Why would the task not respond to it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
dabla commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607352971 I've also encountered the same behaviour, as the connection is blocking the thread or something and thus the SQLExecuteQueryOperator task only stops once the connection returns back, at least that's what I experienced. So to avoid that issue, what I've done when using the JdbcHook, was by specifying a connection timeout on the JDBC connection, that way when the task has to timeout, the connection will also (even though the query will continue to run on the database) and thus not block the thread of the Airflow worker, but this was a specific solution with JDBC connections. Bellow the code: ``` hook = DbApiHook.get_hook(conn_id="conn_id") with hook.get_conn() as conn: with closing(conn.cursor()) as cur: stmt = conn.jconn.createStatement() if timeout: stmt.setQueryTimeout( int(timeout.total_seconds()) ) # Set the timeout in seconds stmt.executeQuery(sql) # Execute the SQL query return fetch_one_handler(cur) ``` So maybe in the hook some kind of timeout should also be specified on the connection being used, or if you want a more finegrained approach per operator, the you could specify the connection timeout through the hook_params of the SQLExecuteQueryOperator? Still, this al depends if the underlying connection supports it of course. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
eladkal commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607141623 cc @dabla maybe you will have time to look into this? Seems inconsistent behavior of SQLExecuteQueryOperator -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]
boring-cyborg[bot] commented on issue #45930: URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607133874 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org