Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-27 Thread via GitHub


dstandish commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2913961276

   Reverting here https://github.com/apache/airflow/pull/51135


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-27 Thread via GitHub


dstandish commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2913881766

   This PR has some kind of bug.
   
   When you clear a TI you see two log files with try == 1
   
   https://github.com/user-attachments/assets/6e070a83-c16b-4c88-927a-a598a8fe7d7f";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-20 Thread via GitHub


jason810496 commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2894404069

   Manual packport #50833 to `v3-10-test` due to rconflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-20 Thread via GitHub


jason810496 merged PR #50175:
URL: https://github.com/apache/airflow/pull/50175


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-20 Thread via GitHub


oboki commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2893974733

   > The test seems flaky, update with latest main.
   
   Finally, all tests have passed! Thanks @jason810496. I really appreciate 
your support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-20 Thread via GitHub


github-actions[bot] commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2894239190

   ### Backport failed to create: v3-0-test. View the failure log  Run details 

   
   
   
   Status
   Branch
   Result
   
   
   ❌
   v3-0-test
   https://github.com/apache/airflow/commit/0b0ff5d11802a57b136f159cb6753a16dbe5fe07";>
   
   
   
   You can attempt to backport this manually by running:
   
   ```bash
   cherry_picker 0b0ff5d v3-0-test
   ```
   
   This should apply the commit to the v3-0-test branch and leave the commit in 
conflict state marking
   the files that need manual conflict resolution.
   
   After you have resolved the conflicts, you can continue the backport process 
by running:
   
   ```bash
   cherry_picker --continue
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-19 Thread via GitHub


jason810496 commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2893120170

   The test seems flaky, update with latest main.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-19 Thread via GitHub


jason810496 commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2095278876


##
airflow-core/tests/unit/api_fastapi/common/db/test_task_instance.py:
##
@@ -0,0 +1,79 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from airflow.api_fastapi.common.db.task_instance import 
get_task_instance_or_history_for_try_number
+from airflow.models.taskinstance import TaskInstance
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+from airflow.providers.standard.operators.empty import EmptyOperator
+from airflow.utils import timezone
+from airflow.utils.types import DagRunType
+
+from tests_common.test_utils.db import clear_db_runs
+
+pytestmark = pytest.mark.db_test
+
+
+class TestDBTaskInstance:
+DAG_ID = "dag_for_testing_db_task_instance"
+RUN_ID = "dag_run_id_for_testing_db_task_instance"
+TASK_ID = "task_for_testing_db_task_instance"
+TRY_NUMBER = 1
+
+default_time = "2020-06-10T20:00:00+00:00"
+
+@pytest.fixture(autouse=True)
+def setup_attrs(self, dag_maker, session) -> None:
+with dag_maker(self.DAG_ID, 
start_date=timezone.parse(self.default_time), session=session) as dag:
+EmptyOperator(task_id=self.TASK_ID)
+
+dr = dag_maker.create_dagrun(
+run_id=self.RUN_ID,
+run_type=DagRunType.SCHEDULED,
+logical_date=timezone.parse(self.default_time),
+start_date=timezone.parse(self.default_time),
+)
+
+for ti in dr.task_instances:
+ti.try_number = 1
+ti.hostname = "localhost"
+session.merge(ti)
+dag.clear()
+for ti in dr.task_instances:
+ti.try_number = 2
+ti.hostname = "localhost"
+session.merge(ti)
+session.flush()

Review Comment:
   ```suggestion
 session.commit()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-19 Thread via GitHub


pierrejeambrun commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2890350459

   As mentioned by Jason, rebasing should fix the CI and should make it 
mergeable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-18 Thread via GitHub


oboki commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2094572091


##
airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py:
##
@@ -57,6 +57,38 @@
 }
 
 
+def _find_task_instance_for_try_number(

Review Comment:
   I'm still running into the `sqlalchemy.exc.OperationalError: 
(sqlite3.OperationalError) database is locked` error in the CI tests.
   
   Could you please provide any further suggestions on what might be causing 
this problem? Thanks again for your help.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-18 Thread via GitHub


oboki closed pull request #50175: fix: resolve 404 log error for non-latest 
task tries in multi-host worker environments
URL: https://github.com/apache/airflow/pull/50175


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-18 Thread via GitHub


oboki commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2094544178


##
airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py:
##
@@ -57,6 +57,38 @@
 }
 
 
+def _find_task_instance_for_try_number(

Review Comment:
   Thanks for the feedback! I've moved the function to 
`airflow-core/src/airflow/api_fastapi/common/db/task_instance.py` as suggested 
and added corresponding unit test.
   
   Let me know if you have any further comments or suggestions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-16 Thread via GitHub


bugraoz93 commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2887572214

   I will take a look moving other methods from routes to proper places soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-16 Thread via GitHub


bugraoz93 commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2093608213


##
airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py:
##
@@ -57,6 +57,38 @@
 }
 
 
+def _find_task_instance_for_try_number(

Review Comment:
   Yes, I directly went to the service but for this method for this use case, 
`common/db/task_instances.py` makes more sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-16 Thread via GitHub


bugraoz93 commented on PR #50175:
URL: https://github.com/apache/airflow/pull/50175#issuecomment-2887563668

   > > A more general thing about routes and services: There are similar cases 
where we have methods inside route files. We already have some level of 
distinction between them. How about moving them to services? What do you think? 
cc: @pierrejeambrun @rawwar @jason810496 @shubhamraj-git @ephraimbuddy
   > 
   > Looks good to me to move the query helper function to the 
`airflow-core/src/airflow/api_fastapi/core_api/services` module level. (I think 
`airflow-core/src/airflow/api_fastapi/common/db` would also be a suitable 
location.)
   > 
   > Just to double-check, do you mean to move the 
`_find_task_instance_for_try_number` function directly into the `services` 
module and not introduce a new service class, right?
   > 
   > Since this helper simply wraps a query as a function, and the existing 
service classes (e.g., for connections, pools, variables) are designed for bulk 
operations, creating a new class might not be necessary here.
   
   Yes, I agree on the second look, 
`airflow-core/src/airflow/api_fastapi/common/db` is more suitable. I intend to 
decouple these methods from routes. Even with `collapse all` in some routes, 
there are a lot of endpoints and making the code less readable.
   
   Yes, it can be on the root level. For sure, it depends on the use case if we 
look at it in general. We aren't doing full object-oriented programming in 
general, it is mixed with Functional mostly due to the nature of Python. Unless 
we use service classes as a singleton across all routes, it can only add 
overhead of creating the object rather than calling the method. I don't have a 
strong opinion on that since the entire stack is using a mixed approach and 
makes sense in most cases :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-13 Thread via GitHub


pierrejeambrun commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2086373150


##
airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_log.py:
##
@@ -449,3 +449,28 @@ def test_get_external_log_url(
 
 assert response.status_code == expected_status
 assert response.json() == expected_response
+
+@pytest.mark.parametrize("try_number", [1, 2])
+def test_log_fetched_from_ti_history_when_try_number_differs(self, 
try_number, session):
+key = self.app.state.secret_key
+serializer = URLSafeSerializer(key)
+token = serializer.dumps({"download_logs": False})
+
+mock_ti = mock.Mock()
+# Simulates a mismatch between the current TaskInstance's try_number 
(=3)
+# and the requested try_number (1 or 2). The log should be fetched 
from TaskInstanceHistory.
+mock_ti.try_number = 3
+mock_ti_history = mock.Mock()
+
+with mock.patch.object(session, "scalar", side_effect=[mock_ti, 
mock_ti_history]) as _:
+response = self.client.get(
+
f"/dags/{self.DAG_ID}/dagRuns/{self.RUN_ID}/taskInstances/{self.TASK_ID}/logs/{try_number}",
+params={"token": token},
+headers={"Accept": "application/json"},
+)
+
+assert response.status_code == 200
+

Review Comment:
   Instead of patching the session, which seems to bring lock problems on 
sqlite, you could add a real TIH record (manually inserting, or running with 
failures a couple of TI tries).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-07 Thread via GitHub


oboki commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2077980085


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   In my previous attempt, I ran into a type issue right from the beginning, 
like this:
   
   ```python
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/log/file_task_handler.py",
 line 589, in _read_from_logs_server
   log_type = LogType.TRIGGER if ti.triggerer_job else LogType.WORKER
 
   AttributeError: 'TaskInstanceHistory' object has no attribute 'triggerer_job'
   ```
   
   So I didn't dig much deeper at the time. But after working around this 
`AttributeError`, your approach looks much simpler!
   
   I'm currently using airflow 2.10, so I didn't quite catch your suggestion at 
first, as the webserver has changed quite a bit in 3.0. But now it finally 
makes sense.
   
   Thanks again!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-07 Thread via GitHub


oboki commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2078077637


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   In my previous attempt, I ran into a type issue right from the beginning, 
like this:
   
   ```python
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/log/file_task_handler.py",
 line 589, in _read_from_logs_server
   log_type = LogType.TRIGGER if ti.triggerer_job else LogType.WORKER
 
   AttributeError: 'TaskInstanceHistory' object has no attribute 'triggerer_job'
   ```
   
   And I didn't dig much deeper at the time. But after working around this 
`AttributeError`, your approach looks much simpler!
   
   Thanks again!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-07 Thread via GitHub


oboki commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2077980085


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   In my previous attempt, I ran into a type issue right from the beginning, 
like this:
   
   ```python
   Traceback (most recent call last):
 File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/log/file_task_handler.py",
 line 589, in _read_from_logs_server
   log_type = LogType.TRIGGER if ti.triggerer_job else LogType.WORKER
 
   AttributeError: 'TaskInstanceHistory' object has no attribute 'triggerer_job'
   ```
   
   So I didn't dig much deeper at the time. But after working around this 
`AttributeError`, your approach looks much simpler!
   
   I'm currently using airflow 2.10, so I didn't quite catch your suggestion at 
first, as the webserver has changed quite a bit in 3.0. But now it finally 
makes sense.
   
   Thanks again!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-06 Thread via GitHub


jason810496 commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2076713853


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   Thanks for the try. 
   
   IMO, we could still modify the logic on API side to `ti = TIH or TI` ( 
select TIH first, if None, then select TI. It should be the same as your 
approach but implement on API side ).
   
   For the typing part, we can include `TaskInstanceHistory` as well ( `ti: TI 
| TIH` )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-06 Thread via GitHub


oboki commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2075139859


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   @jason810496
   
   Thanks for the review!
   
   I tried to follow your suggestion by making changes to the `get_log` 
endpoint, but I ran into a few issues.
   
   
https://github.com/apache/airflow/blob/31c7c9626bab3e39a5a0592b87e220007d94d6b9/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L105-L126
   
   First, it seems that the `try_number` parameter isn't actually used in this 
context.
   
   The `if ti is None:` condition doesn't behave as expected, so `ti` is always 
a `TaskInstance`, not a `TaskInstanceHistory`.
   
   
https://github.com/apache/airflow/blob/31c7c9626bab3e39a5a0592b87e220007d94d6b9/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L118
   
   I tried updating the logic to fetch `TaskInstanceHistory` when appropriate, 
but ran into type issues, as many of the subsequent calls expect a 
`TaskInstance` rather than a `TaskInstanceHistory`.
   
   And even if this is resolved at the API level, the request still goes 
through the `log_reader`, which in the end relies on the `file_task_handler`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-06 Thread via GitHub


oboki commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2075139859


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   @jason810496
   
   Thanks for the review!
   
   I tried to follow your suggestion by making changes to the `get_log` 
endpoint, but I ran into a few issues.
   
   
https://github.com/apache/airflow/blob/31c7c9626bab3e39a5a0592b87e220007d94d6b9/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L105-L125
   
   First, it seems that the `try_number` parameter isn't actually used in this 
context.
   
   The `if ti is None:` condition doesn't behave as expected, so `ti` is always 
a `TaskInstance`, not a `TaskInstanceHistory`.
   
   
https://github.com/apache/airflow/blob/31c7c9626bab3e39a5a0592b87e220007d94d6b9/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L118
   
   I tried updating the logic to fetch `TaskInstanceHistory` when appropriate, 
but ran into type issues, as many of the subsequent calls expect a 
`TaskInstance` rather than a `TaskInstanceHistory`.
   
   And even if this is resolved at the API level, the request still goes 
through the `log_reader`, which in the end relies on the `file_task_handler`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-05 Thread via GitHub


jason810496 commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2073380307


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   Should we fetch `TaskInstanceHistory` in the `get_log` endpoint instead of 
making changes here?
   
   
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L104-L125



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: resolve 404 log error for non-latest task tries in multi-host worker environments [airflow]

2025-05-05 Thread via GitHub


jason810496 commented on code in PR #50175:
URL: https://github.com/apache/airflow/pull/50175#discussion_r2073380307


##
airflow-core/src/airflow/utils/log/file_task_handler.py:
##
@@ -474,6 +480,20 @@ def _get_log_retrieval_url(
 hostname = ti.triggerer_job.hostname
 log_relative_path = self.add_triggerer_suffix(log_relative_path, 
job_id=ti.triggerer_job.id)
 else:
+from airflow.models.taskinstancehistory import TaskInstanceHistory
+
+ti = (
+session.query(TaskInstanceHistory)
+.filter(
+TaskInstanceHistory.dag_id == ti.dag_id,
+TaskInstanceHistory.task_id == ti.task_id,
+TaskInstanceHistory.run_id == ti.run_id,
+TaskInstanceHistory.map_index == ti.map_index,
+TaskInstanceHistory.try_number == try_number,
+)
+.one_or_none()
+or ti
+)

Review Comment:
   Should we fetch `TaskInstanceHistory` in the get_log endpoint instead of 
making changes here?
   
   
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py#L104-L125



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org