Re: 2 tables join happens at Hive but not in spark

Davies Liu Wed, 18 May 2016 10:43:28 -0700

What the schema of the two tables looks like? Could you also show the
explain of the query?


On Sat, Feb 27, 2016 at 2:10 AM, Sandeep Khurana <sand...@infoworks.io> wrote:
> Hello
>
> We have 2 tables  (tab1, tab2) exposed using hive. The data is in different
> hdfs folders. We are trying to join these 2 tables on certain single column
> using sparkR join. But inspite of join columns having same values, it
> returns zero rows.
>
> But when I run the same join sql in hive, from hive console, to get the
> count(*), I do get millions of records meeting the join criteria.
>
> The join columns are of 'int' type. Also, when I join 'tab1' from one of
> these 2 tables for which join is not working with another 3rd table 'tab3'
> separately, that join works.
>
> To debug , we selected just 1 row in the sparkR script from tab1 and also 1
> row row having the same value of join column from tab2 also. We used
> 'select' sparkR function for this. Now, our dataframes for tab1 and tab2
> have single row each and the join columns have same value in both, but still
> joining these 2 dataframes having single row each and with same join column,
> the join returned zero rows.
>
>
> We are running the script from rstudio. It does not give any error. It runs
> fine. But gives zero join results whereas on hive I do get many rows for
> same join. Any idea what might be the cause of this?
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: 2 tables join happens at Hive but not in spark

Reply via email to