Re: What is the best way to JOIN two 10TB csv files and three 100kb files on Spark?

Takeshi Yamamuro Fri, 05 Feb 2016 09:45:31 -0800

Hi,

How about using broadcast joins?
largeDf.join(broadcast(smallDf), "joinKey")


On Sat, Feb 6, 2016 at 2:25 AM, Rex X <dnsr...@gmail.com> wrote:

> Dear all,
>
> The new DataFrame of spark is extremely fast. But out cluster have limited
> RAM (~500GB).
>
> What is the best way to do such a big table Join?
>
> Any sample code is greatly welcome!
>
>
> Best,
> Rex
>
>


-- 
---
Takeshi Yamamuro

Re: What is the best way to JOIN two 10TB csv files and three 100kb files on Spark?

Reply via email to