Hi,
from a quick glance over your transformations, sortCol should be sorted.
Are you using Spark 3.2 or above? Can you try again with AQE turned off
in that case?
https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution
Enrico
Am 16.09.22 um 23:28 schrieb
If with "won't affect the performance" you mean "parquet is splittable
though it uses snappy", then yes. Splittable files allow for optimal
parallelization, which "won't affect performance".
Spark writing data will split the data into multiple files already (here
parquet files). Even if each