Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
potiuk commented on PR #44839: URL: https://github.com/apache/airflow/pull/44839#issuecomment-2536052061 This is a fantastic improvement. And it will make "airflow standalone" finally getting really usefuil for "local experience". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb merged PR #44839: URL: https://github.com/apache/airflow/pull/44839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb commented on PR #44839: URL: https://github.com/apache/airflow/pull/44839#issuecomment-2535804727 Looks like I left a load of `validate_database_executor_compatibility` in the tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb commented on code in PR #44839: URL: https://github.com/apache/airflow/pull/44839#discussion_r1880075982 ## airflow/dag_processing/manager.py: ## @@ -557,36 +490,16 @@ def _run_parsing_loop(self): self.log.debug("Received %s signal from DagFileProcessorAgent", agent_signal) if agent_signal == DagParsingSignal.TERMINATE_MANAGER: -if span.is_recording(): -span.add_event(name="terminate") self.terminate() break elif agent_signal == DagParsingSignal.END_MANAGER: -if span.is_recording(): -span.add_event(name="end") self.end() sys.exit(os.EX_OK) -elif agent_signal == DagParsingSignal.AGENT_RUN_ONCE: -# continue the loop to parse dags -pass elif isinstance(agent_signal, CallbackRequest): self._add_callback_to_queue(agent_signal) else: raise ValueError(f"Invalid message {type(agent_signal)}") -if not ready and not self._async_mode: Review Comment: Nope, we don't care about it anymore, nothing ready we'll loop round and wait again. (It was since before we had `timeout=None` so it would block, but we don't have that anymore) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
kaxil commented on code in PR #44839: URL: https://github.com/apache/airflow/pull/44839#discussion_r1880066065 ## airflow/dag_processing/manager.py: ## @@ -557,36 +490,16 @@ def _run_parsing_loop(self): self.log.debug("Received %s signal from DagFileProcessorAgent", agent_signal) if agent_signal == DagParsingSignal.TERMINATE_MANAGER: -if span.is_recording(): -span.add_event(name="terminate") self.terminate() break elif agent_signal == DagParsingSignal.END_MANAGER: -if span.is_recording(): -span.add_event(name="end") self.end() sys.exit(os.EX_OK) -elif agent_signal == DagParsingSignal.AGENT_RUN_ONCE: -# continue the loop to parse dags -pass elif isinstance(agent_signal, CallbackRequest): self._add_callback_to_queue(agent_signal) else: raise ValueError(f"Invalid message {type(agent_signal)}") -if not ready and not self._async_mode: Review Comment: Do we still need the "ready" check? (I haven't dug into why that was needed) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb commented on code in PR #44839: URL: https://github.com/apache/airflow/pull/44839#discussion_r1879920358 ## tests/dag_processing/test_processor.py: ## @@ -656,49 +654,3 @@ def test_counter_for_last_num_of_db_queries(self): session=session, ): self._process_file(dag_filepath, TEST_DAG_FOLDER, session) - - -class TestProcessorAgent: Review Comment: a) This was in the wrong file. b) These tests were just of the sync mode controls, so not needed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb commented on code in PR #44839: URL: https://github.com/apache/airflow/pull/44839#discussion_r1879925927 ## airflow/dag_processing/manager.py: ## @@ -97,7 +97,6 @@ class DagFileStat: class DagParsingSignal(enum.Enum): """All signals sent to parser.""" -AGENT_RUN_ONCE = "agent_run_once" TERMINATE_MANAGER = "terminate_manager" END_MANAGER = "end_manager" Review Comment: In a future PR these will be removed too, as there is a simpler way we can achieve this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb commented on code in PR #44839: URL: https://github.com/apache/airflow/pull/44839#discussion_r1879916448 ## providers/tests/celery/executors/test_celery_kubernetes_executor.py: ## @@ -46,9 +46,6 @@ def test_is_production_default_value(self): def test_serve_logs_default_value(self): assert not CeleryKubernetesExecutor.serve_logs -def test_is_single_threaded_default_value(self): -assert not CeleryKubernetesExecutor.is_single_threaded Review Comment: Note: I've left the property on the class though, so that the provider will continue to work with Airflow 2.x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb commented on code in PR #44839: URL: https://github.com/apache/airflow/pull/44839#discussion_r1879912749 ## airflow/utils/orm_event_handlers.py: ## @@ -46,6 +46,7 @@ def connect(dbapi_connection, connection_record): def set_sqlite_pragma(dbapi_connection, connection_record): cursor = dbapi_connection.cursor() cursor.execute("PRAGMA foreign_keys=ON") +cursor.execute("PRAGMA journal_mode=WAL") Review Comment: This is essentially the change, everything else is removing code that isn't needed anymore! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Remove "single process" restrictions on SQLite in favour of using WAL mode [airflow]
ashb opened a new pull request, #44839: URL: https://github.com/apache/airflow/pull/44839 Since 2010(!) sqlite has had a WAL, or Write-Ahead Log mode of journalling which allos multiple concurrent readers and one writer. More than good enough for us for "local" use. The primary driver for this change was a realisation that it is possible and to reduce the amount of code in complexity in DagProcessorManager before reworking it for AIP-72 support :- we have a lot of code in the DagProcessorManager to support `if async_mode` that makes understanding the flow complex. Some useful docs and articles about this mode: - [The offical docs](https://sqlite.org/wal.html) - [Simon Willison's TIL](https://til.simonwillison.net/sqlite/enabling-wal-mode) - [fly.io article about scaling read concurrency](https://fly.io/blog/sqlite-internals-wal/) This still keeps the warning against using SQLite in production, but it greatly reduces the restrictions what combos and settings can use this. In short, when using an SQLite db it is now possible to: - use LocalExecutor, including with more than 1 concurrent worker slot - have multiple DAG parsing processes (even before AIP-72/TaskSDK changes to that) We execute the `PRAGMA journal_mode` every time we connect, which is more often that is strictly needed as this is one of the few modes thatis persistent and a property of the DB file just for ease and to ensure that it it is in the mode we want. I have tested this with `breeze -b sqlite start_airflow` and a kicking off a lot of tasks concurrently. Will this be without problems? No, not entirely, but due to the scheduler+webserver+api server process we've _already_ got the case where multiple processes are operating on the DB file. This change just makes the best use of that following the guidance of the SQLite project: Ensuring that only a single process accesses the DB concurrently is not a requirement anymore! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org