Hi All, Just checking in to see if anyone has any advice on this.
Thanks, Rishi On Mon, Mar 2, 2020 at 9:21 PM Rishi Shah <rishishah.s...@gmail.com> wrote: > Hi All, > > I have 2 large tables (~1TB), I used the following to save both the > tables. Then when I try to join both tables with join_column, it still does > shuffle & sort before the join. Could someone please help? > > df.repartition(2000).write.bucketBy(1, > join_column).sortBy(join_column).saveAsTable(tablename) > > -- > Regards, > > Rishi Shah > -- Regards, Rishi Shah