What the schema of the two tables looks like? Could you also show the explain of the query?
On Sat, Feb 27, 2016 at 2:10 AM, Sandeep Khurana <sand...@infoworks.io> wrote: > Hello > > We have 2 tables (tab1, tab2) exposed using hive. The data is in different > hdfs folders. We are trying to join these 2 tables on certain single column > using sparkR join. But inspite of join columns having same values, it > returns zero rows. > > But when I run the same join sql in hive, from hive console, to get the > count(*), I do get millions of records meeting the join criteria. > > The join columns are of 'int' type. Also, when I join 'tab1' from one of > these 2 tables for which join is not working with another 3rd table 'tab3' > separately, that join works. > > To debug , we selected just 1 row in the sparkR script from tab1 and also 1 > row row having the same value of join column from tab2 also. We used > 'select' sparkR function for this. Now, our dataframes for tab1 and tab2 > have single row each and the join columns have same value in both, but still > joining these 2 dataframes having single row each and with same join column, > the join returned zero rows. > > > We are running the script from rstudio. It does not give any error. It runs > fine. But gives zero join results whereas on hive I do get many rows for > same join. Any idea what might be the cause of this? > > > > -- > Architect > Infoworks.io > http://Infoworks.io --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org