Hi,

I wanted to broadcast a Dataframe to all executors and do an operation
similar to join, but might return a variable number of rows than the rows
in each partition and could use multiple rows to produce one row.
I am trying to create a custom join operator for this use case. It would be
great if you could point me to a similar code.
My thought process to do this was to create a HashedRelation from the
Dataframe and broadcast that HashedRelation and then extract internal rows
on each partition at the executor level.

Thanks
Mura

Reply via email to