This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new becbf8b94213 [SPARK-47560][PYTHON][CONNECT] Avoid RPC to validate column name with cached schema becbf8b94213 is described below commit becbf8b942132b82e7b906c63ea6077649329b93 Author: Ruifeng Zheng <ruife...@apache.org> AuthorDate: Tue Mar 26 16:01:26 2024 +0800 [SPARK-47560][PYTHON][CONNECT] Avoid RPC to validate column name with cached schema ### What changes were proposed in this pull request? If the column name exists in schema, avoid `df.select` validation ### Why are the changes needed? https://github.com/apache/spark/commit/6f87fe2f513d1b1a022f0d03b6c81d73d7cfb228 caches the schema, so if the column name exists in schema, we don't not need to validate it with `df.select` which requires additional RPC ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #45717 from zhengruifeng/py_df_getitem_validate. Authored-by: Ruifeng Zheng <ruife...@apache.org> Signed-off-by: Ruifeng Zheng <ruife...@apache.org> --- python/pyspark/sql/connect/dataframe.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 2a22d02387ae..74a4efbe3a79 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -1736,7 +1736,10 @@ class DataFrame: # validate the column name if not hasattr(self._session, "is_mock_session"): - self.select(item).isLocal() + # Different from __getattr__, the name here can be quoted like df['`id`']. + # Only validate the name when it is not in the cached schema. + if item not in self.columns: + self.select(item).isLocal() return Column( ColumnReference( --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org