Hi All, In sql say for example I have table1 (moveid) and table2 (movieid,moviename) in sql we write something like select moviename ,movieid,count(1) from table2 inner join table table1 on table1.movieid=table2.moveid group by .... , here in sql table1 has only one column where as table 2 has two columns still the join works , same way in spark can join on keys from both the rdd's ? –
when I tried to join two rdd in spark both the rdd's should have number of elements for that I need to add a dummy value 0 for example is there other way around or am I doing completely wrong ? val lines=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.data") val movienamesfile=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.item") val moviesid=lines.map(x => x.split("\t")).map(x => (x(1),0)) val test=moviesid.map(x => x._1) val movienames=movienamesfile.map(x => x.split("\\|")).map(x => (x(0),x(1))) val movienamejoined=moviesid.join(movienames).distinct() Thanks Sri -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-inner-join-tp25193.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org