Any way to make catalyst optimise away join

jelmer Fri, 29 Nov 2019 01:51:03 -0800

I have 2 dataframes , lets call them A and B,

A is made up out of [unique_id, field1]
B is made up out of [unique_id, field2]


The have the exact same number of rows, and every id in A is also present
in B

if I execute a join like this A.join(B,
Seq("unique_id")).select($"unique_id", $"field1") then spark will do an
expensive join even though it does not have to because all the fields it
needs are in A. is there some trick I can use so that catalyst will
optimise this join away ?

Any way to make catalyst optimise away join

Reply via email to