Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on PR #50454: URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869852292 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
eladkal merged PR #50454: URL: https://github.com/apache/airflow/pull/50454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on PR #50454: URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869827601 Sure, thanks for the answer and it does make sense. Also thanks for the review and suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
eladkal commented on PR #50454: URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869826895 > BTW should we also bump other packages using get_df to common-sql=1.27.0 to have better type definition? We bump min version when there is a reason.. otherwise we will always have latest version as minimum. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on PR #50454: URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869824994 BTW should we bump other packages using `get_df` to common-sql=1.27.0 to have better type definition? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083510135 ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: Sure, I've updated with todo comment and this pr link hoping that would help. If there are more information or discussion about this integration on whether polars or Elasticsearch document in the future, I would get back here and try implement it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083510135 ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: Sure, I've updated with todo comment and this pull link hoping that would help. If there is more information or discussion about this integration on whether polars or Elasticsearch official document in the future, I would go back and try implement it. ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: Sure, I've updated with todo comment and this pull link hoping that would help. If there is more information or discussion about this integration on whether polars or Elasticsearch official document in the future, I would get back here and try implement it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083510135 ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: Sure, I've updated with todo comment and this pr link hoping that would help. If there is more information or discussion about this integration on whether polars or Elasticsearch official document in the future, I would get back here and try implement it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
eladkal commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083507819 ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: OK can you just add a comment in the `_get_polars_df` provide some context about why it's incompatible and what needs to be done to make it work (just leaving instructions to whoever decide to tackle this) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
eladkal commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083507819 ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: OK can you just add a todo comment in the `_get_polars_df` provide some context about why it's incompatible and what needs to be done to make it work (just leaving instructions to whoever decide to tackle this) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083501851 ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: Since the `ElasticsearchSQLHook`has its own `ElasticsearchSQLCursor` which is not compatable with polars thus if we comment the implementation above and try like ```python @pytest.mark.parametrize( "df_type", ["pandas", "polars"], ) def test_get_df(self, df_type): statement = "SELECT * FROM hollywood.actors" df = self.db_hook.get_df(statement, df_type=df_type) assert list(df.columns) == ["index", "name", "firstname", "age"] assert df.values.tolist() == ROWS self.conn.close.assert_called_once_with() self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) ``` would get this error ``` ___ TestElasticsearchSQLHook.test_get_df[polars] ___ providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:175: in test_get_df df = self.db_hook.get_df(statement, df_type=df_type) providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:458: in get_df return self._get_polars_df(sql, parameters, **kwargs) providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:513: in _get_polars_df return pl.read_database(sql, connection=conn, execute_options=execute_options, **kwargs) .venv/lib/python3.12/site-packages/polars/io/database/functions.py:251: in read_database ).to_polars( .venv/lib/python3.12/site-packages/polars/io/database/_executor.py:563: in to_polars raise NotImplementedError(msg) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083502123 ## providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py: ## @@ -219,6 +219,14 @@ def get_uri(self) -> str: return uri +def _get_polars_df( +self, +sql, +parameters: list | tuple | Mapping[str, Any] | None = None, +**kwargs, +): +raise NotImplementedError("Polars is not supported for Elasticsearch") Review Comment: I implement it to prevent user to use the `df_type=polars` when using `get_df` since there is incompatible issue which would lead to error below. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083502123 ## providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py: ## @@ -219,6 +219,14 @@ def get_uri(self) -> str: return uri +def _get_polars_df( +self, +sql, +parameters: list | tuple | Mapping[str, Any] | None = None, +**kwargs, +): +raise NotImplementedError("Polars is not supported for Elasticsearch") Review Comment: I implement it to prevent user to use the `df_type=polars` when using `get_df`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083501851 ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: Since the `ElasticsearchSQLHook`has its own `ElasticsearchSQLCursor` which is not compatable with polars thus if we try like ```python @pytest.mark.parametrize( "df_type", ["pandas", "polars"], ) def test_get_df(self, df_type): statement = "SELECT * FROM hollywood.actors" df = self.db_hook.get_df(statement, df_type=df_type) assert list(df.columns) == ["index", "name", "firstname", "age"] assert df.values.tolist() == ROWS self.conn.close.assert_called_once_with() self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) ``` would get this error ``` ___ TestElasticsearchSQLHook.test_get_df[polars] ___ providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:175: in test_get_df df = self.db_hook.get_df(statement, df_type=df_type) providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:458: in get_df return self._get_polars_df(sql, parameters, **kwargs) providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:513: in _get_polars_df return pl.read_database(sql, connection=conn, execute_options=execute_options, **kwargs) .venv/lib/python3.12/site-packages/polars/io/database/functions.py:251: in read_database ).to_polars( .venv/lib/python3.12/site-packages/polars/io/database/_executor.py:563: in to_polars raise NotImplementedError(msg) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
eladkal commented on code in PR #50454: URL: https://github.com/apache/airflow/pull/50454#discussion_r2083499815 ## providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py: ## @@ -219,6 +219,14 @@ def get_uri(self) -> str: return uri +def _get_polars_df( +self, +sql, +parameters: list | tuple | Mapping[str, Any] | None = None, +**kwargs, +): +raise NotImplementedError("Polars is not supported for Elasticsearch") Review Comment: Do we need to have specific implementation? The hook doesn't implement specific implementation for pandas so why would it need one for polars? ## providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py: ## @@ -177,6 +177,10 @@ def test_get_df_pandas(self): self.spy_agency.assert_spy_called(self.cur.close) self.spy_agency.assert_spy_called(self.cur.execute) +def test_get_df_polars(self): Review Comment: I think if you parameterized the `test_get_df_pandas` to include polars that would work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]
guan404ming commented on PR #50454: URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869690628 This is the last one of the whole migration! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org