Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on PR #50454:
URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869852292

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


eladkal merged PR #50454:
URL: https://github.com/apache/airflow/pull/50454


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on PR #50454:
URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869827601

   Sure, thanks for the answer and it does make sense. Also thanks for the 
review and suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


eladkal commented on PR #50454:
URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869826895

   > BTW should we also bump other packages using get_df to common-sql=1.27.0 
to have better type definition?
   
   We bump min version when there is a reason.. otherwise we will always have 
latest version as minimum.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on PR #50454:
URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869824994

   BTW should we bump other packages using `get_df` to common-sql=1.27.0 to 
have better type definition?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083510135


##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   Sure, I've updated with todo comment and this pr link hoping that would help.
   If there are more information or discussion about this integration on 
whether polars or Elasticsearch document in the future, I would get back here 
and try implement it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083510135


##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   Sure, I've updated with todo comment and this pull link hoping that would 
help.
   If there is more information or discussion about this integration on whether 
polars or Elasticsearch official document in the future, I would go back and 
try implement it.



##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   Sure, I've updated with todo comment and this pull link hoping that would 
help.
   If there is more information or discussion about this integration on whether 
polars or Elasticsearch official document in the future, I would get back here 
and try implement it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083510135


##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   Sure, I've updated with todo comment and this pr link hoping that would help.
   If there is more information or discussion about this integration on whether 
polars or Elasticsearch official document in the future, I would get back here 
and try implement it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


eladkal commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083507819


##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   OK can you just add a comment in the `_get_polars_df` provide some context 
about why it's incompatible and what needs to be done to make it work (just 
leaving instructions to whoever decide to tackle this)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


eladkal commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083507819


##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   OK can you just add a todo comment in the `_get_polars_df` provide some 
context about why it's incompatible and what needs to be done to make it work 
(just leaving instructions to whoever decide to tackle this)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083501851


##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   Since the `ElasticsearchSQLHook`has its own `ElasticsearchSQLCursor` which 
is not compatable with polars thus if we comment the implementation above and 
try like 
   
   ```python
   @pytest.mark.parametrize(
   "df_type",
   ["pandas", "polars"],
   )
   def test_get_df(self, df_type):
   statement = "SELECT * FROM hollywood.actors"
   df = self.db_hook.get_df(statement, df_type=df_type)
   
   assert list(df.columns) == ["index", "name", "firstname", "age"]
   assert df.values.tolist() == ROWS
   
   self.conn.close.assert_called_once_with()
   self.spy_agency.assert_spy_called(self.cur.close)
   self.spy_agency.assert_spy_called(self.cur.execute)
   ```
   
   
   would get this error
   ```
   ___ 
TestElasticsearchSQLHook.test_get_df[polars] 
___
   
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:175:
 in test_get_df
   df = self.db_hook.get_df(statement, df_type=df_type)
   providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:458: in 
get_df
   return self._get_polars_df(sql, parameters, **kwargs)
   providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:513: in 
_get_polars_df
   return pl.read_database(sql, connection=conn, 
execute_options=execute_options, **kwargs)
   .venv/lib/python3.12/site-packages/polars/io/database/functions.py:251: in 
read_database
   ).to_polars(
   .venv/lib/python3.12/site-packages/polars/io/database/_executor.py:563: in 
to_polars
   raise NotImplementedError(msg)
```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083502123


##
providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py:
##
@@ -219,6 +219,14 @@ def get_uri(self) -> str:
 
 return uri
 
+def _get_polars_df(
+self,
+sql,
+parameters: list | tuple | Mapping[str, Any] | None = None,
+**kwargs,
+):
+raise NotImplementedError("Polars is not supported for Elasticsearch")

Review Comment:
   I implement it to prevent user to use the `df_type=polars` when using 
`get_df` since there is incompatible issue which would lead to error below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083502123


##
providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py:
##
@@ -219,6 +219,14 @@ def get_uri(self) -> str:
 
 return uri
 
+def _get_polars_df(
+self,
+sql,
+parameters: list | tuple | Mapping[str, Any] | None = None,
+**kwargs,
+):
+raise NotImplementedError("Polars is not supported for Elasticsearch")

Review Comment:
   I implement it to prevent user to use the `df_type=polars` when using 
`get_df`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083501851


##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   Since the `ElasticsearchSQLHook`has its own `ElasticsearchSQLCursor` which 
is not compatable with polars thus if we try like 
   
   ```python
   @pytest.mark.parametrize(
   "df_type",
   ["pandas", "polars"],
   )
   def test_get_df(self, df_type):
   statement = "SELECT * FROM hollywood.actors"
   df = self.db_hook.get_df(statement, df_type=df_type)
   
   assert list(df.columns) == ["index", "name", "firstname", "age"]
   assert df.values.tolist() == ROWS
   
   self.conn.close.assert_called_once_with()
   self.spy_agency.assert_spy_called(self.cur.close)
   self.spy_agency.assert_spy_called(self.cur.execute)
   ```
   
   
   would get this error
   ```
   ___ 
TestElasticsearchSQLHook.test_get_df[polars] 
___
   
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:175:
 in test_get_df
   df = self.db_hook.get_df(statement, df_type=df_type)
   providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:458: in 
get_df
   return self._get_polars_df(sql, parameters, **kwargs)
   providers/common/sql/src/airflow/providers/common/sql/hooks/sql.py:513: in 
_get_polars_df
   return pl.read_database(sql, connection=conn, 
execute_options=execute_options, **kwargs)
   .venv/lib/python3.12/site-packages/polars/io/database/functions.py:251: in 
read_database
   ).to_polars(
   .venv/lib/python3.12/site-packages/polars/io/database/_executor.py:563: in 
to_polars
   raise NotImplementedError(msg)
```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


eladkal commented on code in PR #50454:
URL: https://github.com/apache/airflow/pull/50454#discussion_r2083499815


##
providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py:
##
@@ -219,6 +219,14 @@ def get_uri(self) -> str:
 
 return uri
 
+def _get_polars_df(
+self,
+sql,
+parameters: list | tuple | Mapping[str, Any] | None = None,
+**kwargs,
+):
+raise NotImplementedError("Polars is not supported for Elasticsearch")

Review Comment:
   Do we need to have specific implementation?
   The hook doesn't implement specific implementation for pandas so why would 
it need one for polars?



##
providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py:
##
@@ -177,6 +177,10 @@ def test_get_df_pandas(self):
 self.spy_agency.assert_spy_called(self.cur.close)
 self.spy_agency.assert_spy_called(self.cur.execute)
 
+def test_get_df_polars(self):

Review Comment:
   I think if you parameterized the `test_get_df_pandas` to include polars that 
would work



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Migrate `ElasticsearchSQLHook` to use `get_df` [airflow]

2025-05-11 Thread via GitHub


guan404ming commented on PR #50454:
URL: https://github.com/apache/airflow/pull/50454#issuecomment-2869690628

   This is the last one of the whole migration!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org