Hi, I am sorry to bother again. When I do join as follow: df = sqlContext.sql("selet a.someItem, b.someItem from a full outer join b on condition1 *or* condition2") df.first()
The program failed at the result size is bigger than spark.driver.maxResultSize. It is really strange, as one record is no way bigger than 1G. When I do join on just one condition or equity condition, there will be no problem. Could anyone help me, please? Thanks a lot in advance. Cheers Gen On Sun, Aug 9, 2015 at 9:08 PM, gen tang <gen.tan...@gmail.com> wrote: > Hi, > > I might have a stupid question about sparksql's implementation of join on > not equality conditions, for instance condition1 or condition2. > > In fact, Hive doesn't support such join, as it is very difficult to > express such conditions as a map/reduce job. However, sparksql supports > such operation. So I would like to know how spark implement it. > > As I observe such join runs very slow, I guess that spark implement it by > doing filter on the top of cartesian product. Is it true? > > Thanks in advance for your help. > > Cheers > Gen > > >