Hi Vibhatha,
I helped you post this question to another community. There is one answer by
someone else for your reference.
To access the logical plan or optimized plan, you can register a custom
QueryExecutionListener and retrieve the plans during the query execution
process. Here's an example of how to do it in Scala:
> import org.apache.spark.sql.{SparkSession, QueryExecutionListener}
>
> // Create a custom QueryExecutionListener
> class CustomQueryExecutionListener extends QueryExecutionListener {
> override def onSuccess(funcName: String, qe:
> org.apache.spark.sql.execution.QueryExecution, durationNs: Long): Unit = {
> // Retrieve the logical plan
> val logicalPlan = qe.logical
>
> // Retrieve the optimized plan
> val optimizedPlan = qe.optimizedPlan
>
> // Process the plans with your custom function
> processPlans(logicalPlan, optimizedPlan)
> }
>
> override def onFailure(funcName: String, qe:
> org.apache.spark.sql.execution.QueryExecution, exception: Exception): Unit =
> {}
> }
>
> // Create a SparkSession
> val spark = SparkSession.builder()
> .appName("Example")
> .getOrCreate()
>
> // Register the custom QueryExecutionListener
> spark.listenerManager.register(new CustomQueryExecutionListener)
>
> // Perform your DataFrame operations
> val df = spark.read.csv("path/to/file.csv")
> val filteredDF = df.filter(df("column") > 10)
> val resultDF = filteredDF.select("column1", "column2")
>
> // Trigger the execution of the DF to invoke the listener
> resultDF.show()
Thank You & Best Regards
Winston Lai
________________________________
From: Vibhatha Abeykoon <[email protected]>
Sent: Wednesday, August 2, 2023 5:03:15 PM
To: Ruifeng Zheng <[email protected]>
Cc: Winston Lai <[email protected]>; [email protected]
<[email protected]>
Subject: Re: Extracting Logical Plan
I understand. I sort of drew the same conclusion. But I wasn’t sure. Thanks
everyone for taking time on this.
On Wed, Aug 2, 2023 at 2:29 PM Ruifeng Zheng
<[email protected]<mailto:[email protected]>> wrote:
In Spark Connect, I think the only API to show optimized plan is
`df.explain("extended")` as Winston mentioned, but it is not a LogicalPlan
object.
On Wed, Aug 2, 2023 at 4:36 PM Vibhatha Abeykoon
<[email protected]<mailto:[email protected]>> wrote:
Hello Ruifeng,
Thank you for these pointers. Would it be different if I use the Spark connect?
I am not using the regular SparkSession. I am pretty new to these APIs.
Appreciate your thoughts.
On Wed, Aug 2, 2023 at 2:00 PM Ruifeng Zheng
<[email protected]<mailto:[email protected]>> wrote:
Hi Vibhatha,
I think those APIs are still avaiable?
```
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.4.1
/_/
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 11.0.19)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = spark.range(0, 10)
df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala> df.queryExecution
res0: org.apache.spark.sql.execution.QueryExecution =
== Parsed Logical Plan ==
Range (0, 10, step=1, splits=Some(12))
== Analyzed Logical Plan ==
id: bigint
Range (0, 10, step=1, splits=Some(12))
== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12))
== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
scala> df.queryExecution.optimizedPlan
res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Range (0, 10, step=1, splits=Some(12))
```
On Wed, Aug 2, 2023 at 3:58 PM Vibhatha Abeykoon
<[email protected]<mailto:[email protected]>> wrote:
Hi Winston,
I need to use the LogicalPlan object and process it with another function I
have written. In earlier Spark versions we can access that via the dataframe
object. So if it can be accessed via the UI, is there an API to access the
object?
On Wed, Aug 2, 2023 at 1:24 PM Winston Lai
<[email protected]<mailto:[email protected]>> wrote:
Hi Vibhatha,
How about reading the logical plan from Spark UI, do you have access to the
Spark UI? I am not sure what infra you run your Spark jobs on. Usually you
should be able to view the logical and physical plan under Spark UI in text
version at least. It is independent from the language (e.g., scala/Python/R)
that you use to run Spark.
On Wednesday, August 2, 2023, Vibhatha Abeykoon
<[email protected]<mailto:[email protected]>> wrote:
Hi Winston,
I am looking for a way to access the LogicalPlan object in Scala. Not sure if
explain function would serve the purpose.
On Wed, Aug 2, 2023 at 9:14 AM Winston Lai
<[email protected]<mailto:[email protected]>> wrote:
Hi Vibhatha,
Have you tried pyspark.sql.DataFrame.explain — PySpark 3.4.1 documentation
(apache.org)<https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.explain.html#pyspark.sql.DataFrame.explain>
before? I am not sure what infra that you have, you can try this first. If it
doesn't work, you may share more info such as what platform you are running
your Spark jobs on, what cloud servies you are using ...
On Wednesday, August 2, 2023, Vibhatha Abeykoon
<[email protected]<mailto:[email protected]>> wrote:
Hello,
I recently upgraded the Spark version to 3.4.1 and I have encountered a few
issues. In my previous code, I was able to extract the logical plan using
`df.queryExecution` (df: DataFrame and in Scala), but it seems like in the
latest API it is not supported. Is there a way to extract the logical plan or
optimized plan from a dataframe or dataset in Spark 3.4.1?
Best,
Vibhatha
--
Vibhatha Abeykoon
--
Vibhatha Abeykoon
--
Ruifeng Zheng
E-mail: [email protected]<mailto:[email protected]>
--
Vibhatha Abeykoon
--
Ruifeng Zheng
E-mail: [email protected]<mailto:[email protected]>
--
Vibhatha Abeykoon