subject:"Renaming a DataFrame column makes Spark lose partitioning information"

Re: Renaming a DataFrame column makes Spark lose partitioning information

2020-08-05 Thread Antoine Wendlinger

Well that's great ! Thank you very much :) Antoine On Tue, Aug 4, 2020 at 11:22 PM Terry Kim wrote: > This is fixed in Spark 3.0 by https://github.com/apache/spark/pull/26943: > > scala> :paste > // Entering paste mode (ctrl-D to finish) > > Seq((1, 2)) > .toDF("a", "b") >

Re: Renaming a DataFrame column makes Spark lose partitioning information

2020-08-04 Thread Terry Kim

This is fixed in Spark 3.0 by https://github.com/apache/spark/pull/26943: scala> :paste // Entering paste mode (ctrl-D to finish) Seq((1, 2)) .toDF("a", "b") .repartition($"b") .withColumnRenamed("b", "c") .repartition($"c") .explain() // Exiting paste mode, now

Renaming a DataFrame column makes Spark lose partitioning information

2020-08-04 Thread Antoine Wendlinger

Hi, When renaming a DataFrame column, it looks like Spark is forgetting the partition information: Seq((1, 2)) .toDF("a", "b") .repartition($"b") .withColumnRenamed("b", "c") .repartition($"c") .explain() Gives the following plan: == Physical Plan ==