Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-03-03 Thread via GitHub


bentorb commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2694108699


   > [@bentorb](https://github.com/bentorb) maybe try and run your task with 
Airflow 3? We have beta just released
   
   Thank you @amoghrajesh. MWAA supports up to Airflow 2.10.3, so at the moment 
this is not an option for my team.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-02-28 Thread via GitHub


potiuk commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2690517588

   > Signal handling is now done by a) (supervisor) which watches the b) (task 
runner subprocess) and this should probably be fixed there :)
   
   I have not checked - but is timeout handling also done in supervisor? I was 
previously done in the equivalent of the task runner subprocess and that was 
the root cause of the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-02-27 Thread via GitHub


amoghrajesh commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2689901099

   @potiuk @dabla IIUC, are you checking with me if every "task" will now run a 
new subprocess?
   
   Signal handling is now done by a) (supervisor) which watches the b) (task 
runner subprocess) and this should probably be fixed there :)
   
   @bentorb maybe try and run your task with Airflow 3? We have beta just 
released


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-02-26 Thread via GitHub


dabla commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2685159279

   > This is a question of signal handling most likely. Airlfow's timeout is 
done using SIGALRM. If the driver runs a low-level c code that does not handle 
signals properly this behaviour might happen. Proper implementation of a 
c-library code is to periodically check in a loop if signal has been received 
and pass control back to python if it happened - but some c-level code 
(especially database drivers) might not handle it properly.
   > 
   > The solution - discussed few times is to make another fork of the process 
and handle timeout in the separate fork, and we decided to defer it to after 
new Task SDK is completed - and maybe even in new Task SDK it's already handled 
? [@ashb](https://github.com/ashb) 
[@amoghrajesh](https://github.com/amoghrajesh) ?
   > 
   > Generally we have already two processes (parent and fork):
   > 
   > a) supervisor that performs heartbeat b) task that executes the job there 
was also c) openlineage handling
   > 
   > In Airlfow 2 timeout was implemented as siglarm handling inside b) which 
means that if b) executed bad-behaving C-code timeout was not happening. In 
Airflow 3 we have a bit different way of heartbeating and task execution and 
it's being still iterated on - and the options we could implement is to move 
the timout handling to a) or introduce another fork with timeout between a) and 
b). The former likely better from resource usage point of view.
   
   Thank you Jarek, this seems indeed to be a probable explanation.  This 
proves that I have still a lot to learn from Airflow inner workings :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-02-24 Thread via GitHub


potiuk commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2679680186

   This is a question of signal handling most likely. Airlfow's timeout is done 
using SIGALRM. If the driver runs a low-level c code that does not handle 
signals properly this behaviour might happen. Proper implementation of a 
c-library code is to periodically check in a loop if signal has been received 
and pass control back to python if it happened - but some c-level code 
(especially database drivers) might not handle it properly.
   
   The solution - discussed few times is to make another fork of the process 
and handle timeout in the separate fork, and we decided to defer it to after 
new Task SDK is completed - and maybe even in new Task SDK it's already handled 
? @ashb @amoghrajesh ? 
   
   Generally we have already two processes (parent and fork):
   
   a) supervisor that performs heartbeat
   b) task that executes the job
   there was also c) openlineage handling
   
   In Airlfow 2 timeout was implemented as siglarm handling inside b) which 
means that if b) executed bad-behaving C-code timeout was not  happening. In 
Airflow 3 we have a bit different way of heartbeating and task execution and 
it's being still iterated on - and the options we could implement is to move 
the timout handling to a) or introduce another fork with timeout between a) and 
b). The former likely better from resource usage point of view.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-01-22 Thread via GitHub


dabla commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607756533

   > > but now we can assume this is a generic issue related to database 
connections. So yes it's weird that worker doesn't get killed.
   > 
   > I am not sure it explains it. Airflow runs a process (that happens to be 
database connection) Airflow should be able to kill the process. The fact that 
the issue happens for Sql but not for Python is really odd. From the 
description it sounds like something is blocking the schedule from killing the 
task (because it tried to kill it after its finished). I think that the issue 
is with the scheduler itself.
   
   Question, at least for me, is how do you debug something like this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-01-22 Thread via GitHub


eladkal commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607703764

   > but now we can assume this is a generic issue related to database 
connections. So yes it's weird that worker doesn't get killed.
   
   I am not sure it explains it. Airflow runs a process (that happens to be 
database connection) Airflow should be able to kill the process. The fact that 
the issue happens for Sql but not for Python is really odd. From the 
description it sounds like something is blocking the schedule from killing the 
task (because it tried to kill it after its finished). I think that the issue 
is with the scheduler itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-01-22 Thread via GitHub


dabla commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607539721

   > Should we apply this logic for the underlying get hook function that we 
are using? While not generic to all if we can provide workaround for 
connections that support it I think we should. Yet the underlying issue is not 
clear to me. The kill signal from timeout comes from the scheduler. Why would 
the task not respond to it? The `exectuion_timeout` is for Airflow, not for the 
underlying SQL engine.
   
   We could, but that possibly would not work for all cases and would be a 
workaround of the real issue.
   It's indeed a weird issue and I was also surprised by that behaviour.  Back 
then I thought it was related to the Jdbc connection and maybe there was some 
glitch between the Python and Java process, but now we can assume this is a 
generic issue related to database connections. So yes it's weird that worker 
doesn't get killed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-01-22 Thread via GitHub


eladkal commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-260748

   Should we apply this logic for the underlying get hook function that we are 
using? While not generic to all if we can provide workaround for connections 
that support it I think we should.
   Yet the underlying issue is not clear to me. The kill signal from timeout 
comes from the scheduler. Why would the task not respond to it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-01-22 Thread via GitHub


dabla commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607352971

   I've also encountered the same behaviour, as the connection is blocking the 
thread or something and thus the SQLExecuteQueryOperator task only stops once 
the connection returns back, at least that's what I experienced.  
   
   So to avoid that issue, what I've done when using the JdbcHook, was by 
specifying a connection timeout on the JDBC connection, that way when the task 
has to timeout, the connection will also (even though the query will continue 
to run on the database) and thus not block the thread of the Airflow worker, 
but this was a specific solution with JDBC connections.
   
   Bellow the code:
   
   ```
   hook = DbApiHook.get_hook(conn_id="conn_id")
   with hook.get_conn() as conn:
   with closing(conn.cursor()) as cur:
   stmt = conn.jconn.createStatement()
   if timeout:
   stmt.setQueryTimeout(
   int(timeout.total_seconds())
   )  # Set the timeout in seconds
   stmt.executeQuery(sql)  # Execute the SQL query
   return fetch_one_handler(cur)
   ```
   
   So maybe in the hook some kind of timeout should also be specified on the 
connection being used, or if you want a more finegrained approach per operator, 
the you could specify the connection timeout through the hook_params of the 
SQLExecuteQueryOperator?  Still, this al depends if the underlying connection 
supports it of course.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-01-22 Thread via GitHub


eladkal commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607141623

   cc @dabla maybe you will have time to look into this? Seems inconsistent 
behavior of SQLExecuteQueryOperator


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] SQLExecuteQueryOperator not timing out within expected timeframe [airflow]

2025-01-22 Thread via GitHub


boring-cyborg[bot] commented on issue #45930:
URL: https://github.com/apache/airflow/issues/45930#issuecomment-2607133874

   Thanks for opening your first issue here! Be sure to follow the issue 
template! If you are willing to raise PR to address this issue please do so, no 
need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org