Hi,

I've been trying to find out the answer to the question about UNION ALL and
SELECTs @ https://stackoverflow.com/q/47837955/1305344

> If I have Spark SQL statement of the form SELECT [...] UNION ALL SELECT
[...], will the two SELECT statements be executed in parallel? In my
specific use case the two SELECTs are querying two different database
tables. In contrast to what I would have expected, the Spark UI seems to
suggest that the two SELECT statements are performed sequentially.

How to know if the two separate SELECTs are executed in parallel or not?
What are the tools to know it?

My answer was to use explain operator that would show...well...physical
plan, but am not sure how to read it to know whether a query plan is going
to be executed in parallel or not.

I then used the underlying RDD lineage (using rdd.toDebugString) hoping
that gives me the answer, but...I'm not so sure.

For a query like the following:

val q = spark.range(1).union(spark.range(2))

I thought that since both SELECTs are codegen'ed they could be executed in
parallel, but when switched to the RDD lineage I lost my confidence given
there's just one single stage (!)

scala> q.rdd.toDebugString
res4: String =
(16) MapPartitionsRDD[17] at rdd at <console>:26 []
 |   MapPartitionsRDD[16] at rdd at <console>:26 []
 |   UnionRDD[15] at rdd at <console>:26 []
 |   MapPartitionsRDD[11] at rdd at <console>:26 []
 |   MapPartitionsRDD[10] at rdd at <console>:26 []
 |   ParallelCollectionRDD[9] at rdd at <console>:26 []
 |   MapPartitionsRDD[14] at rdd at <console>:26 []
 |   MapPartitionsRDD[13] at rdd at <console>:26 []
 |   ParallelCollectionRDD[12] at rdd at <console>:26 []

What am I missing and how to be certain whether and what parts of a query
are going to be executed in parallel?

Please help...

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

Reply via email to