Spark join produce duplicate rows in resultset

Meena Rajani Sat, 21 Oct 2023 14:05:23 -0700

Hello all:

I am using spark sql to join two tables. To my surprise I am
getting redundant rows. What could be the cause.



select rev.* from rev
inner join customer c
on rev.custumer_id =c.id
inner join product p
rev.sys = p.sys
rev.prin = p.prin
rev.scode= p.bcode

left join item I
on rev.sys = i.sys
rev.custumer_id = I.custumer_id
rev. scode = I.scode

where rev.custumer_id = '123456789'

The first part of the code brings one row

select rev.* from rev
inner join customer c
on rev.custumer_id =c.id
inner join product p
rev.sys = p.sys
rev.prin = p.prin
rev.scode= p.bcode


The  item has two rows which have common attributes  and the* final join
should result in 2 rows. But I am seeing 4 rows instead.*

left join item I
on rev.sys = i.sys
rev.custumer_id = I.custumer_id
rev. scode = I.scode



Regards,
Meena

Spark join produce duplicate rows in resultset

Reply via email to