Hi, How about using broadcast joins? largeDf.join(broadcast(smallDf), "joinKey")
On Sat, Feb 6, 2016 at 2:25 AM, Rex X <dnsr...@gmail.com> wrote: > Dear all, > > The new DataFrame of spark is extremely fast. But out cluster have limited > RAM (~500GB). > > What is the best way to do such a big table Join? > > Any sample code is greatly welcome! > > > Best, > Rex > > -- --- Takeshi Yamamuro