inputFiles

GitBox Wed, 23 Nov 2022 19:00:25 -0800


cloud-fan commented on code in PR #38742:
URL: https://github.com/apache/spark/pull/38742#discussion_r1031014545



##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -797,6 +796,137 @@ def schema(self) -> StructType:
         else:
             return self._schema
 
+    @property
+    def isLocal(self) -> bool:
+        """Returns ``True`` if the :func:`collect` and :func:`take` methods 
can be run locally
+        (without any Spark executors).
+
+        .. versionadded:: 3.4.0
+
+        Returns
+        -------
+        bool
+        """
+        if self._plan is None:
+            raise Exception("Cannot analyze on empty plan.")
+        query = self._plan.to_proto(self._session)
+        return self._session.is_local(query)
+
+    @property
+    def isStreaming(self) -> bool:
+        """Returns ``True`` if this :class:`DataFrame` contains one or more 
sources that
+        continuously return data as it arrives. A :class:`DataFrame` that 
reads data from a
+        streaming source must be executed as a :class:`StreamingQuery` using 
the :func:`start`
+        method in :class:`DataStreamWriter`.  Methods that return a single 
answer, (e.g.,
+        :func:`count` or :func:`collect`) will throw an 
:class:`AnalysisException` when there
+        is a streaming source present.
+
+        .. versionadded:: 3.4.0
+
+        Notes
+        -----
+        This API is evolving.
+
+        Returns
+        -------
+        bool
+            Whether it's streaming DataFrame or not.
+        """
+        if self._plan is None:
+            raise Exception("Cannot analyze on empty plan.")
+        query = self._plan.to_proto(self._session)
+        return self._session.is_streaming(query)
+
+    def printSchema(self) -> None:
+        """Prints out the schema in the tree format.
+
+        .. versionadded:: 3.4.0
+
+        Returns
+        -------
+        None
+        """
+        if self._plan is None:
+            raise Exception("Cannot analyze on empty plan.")
+        query = self._plan.to_proto(self._session)
+        print(self._session.tree_string(query))
+
+    def semanticHash(self) -> int:

Review Comment:
   This is a developer API in Dataset, do we really need to provide it in Spark 
connect?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/semanticHash/sameSemantics/inputFiles

Reply via email to