What I did: I have two datasets I need to join. One of the datasets does not change so I bucket it once and save in a table. It looks something like:
spark.table("profiles").bucketBy(500, "uid").saveAsTable("profiles_bkt"). Now I have another dataset that I bucket "online": spark.sql(".....").createOrReplaceTempView("sessions"). spark.table("sessions").bucketBy(500, "uid").saveAsTable("sessions_bkt"). And then I have the simples join: SELECT profiles_bkt.profile, s*truct*(sessions_bkt.*) AS session FROM sessions_bkt LEFT OUTER JOIN profiles_bkt ON sessions_bkt.uid = profiles_bkt.uid What I sometimes receive: java.lang.AssertionError: assertion failed: There should be only one distinct value of the number pre-shuffle partitions among registered Exchange operator. Any clue?