Re: [Spark Internals]: Is sort order preserved after partitioned write?

2022-09-17 Thread Enrico Minack
Hi, from a quick glance over your transformations, sortCol should be sorted. Are you using Spark 3.2 or above? Can you try again with AQE turned off in that case? https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution Enrico Am 16.09.22 um 23:28 schrieb

Re: Splittable or not?

2022-09-17 Thread Enrico Minack
If with "won't affect the performance" you mean "parquet is splittable though it uses snappy", then yes. Splittable files allow for optimal parallelization, which "won't affect performance". Spark writing data will split the data into multiple files already (here parquet files). Even if each