spark inner join

kali.tumm...@gmail.com Sat, 24 Oct 2015 13:32:38 -0700

Hi All, 

In sql say for example I have table1 (moveid) and table2 (movieid,moviename)
in sql we write something like select moviename ,movieid,count(1) from
table2 inner join table table1 on table1.movieid=table2.moveid group by ....
, here in sql table1 has only one column where as table 2 has two columns
still the join works , same way in spark can join on keys from both the
rdd's ? –


when I tried to join two rdd in spark both the rdd's should have number of
elements for that I need to add a dummy value 0 for example is there other
way around or am I doing completely wrong ?

val
lines=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.data")
    val
movienamesfile=sc.textFile("C:\\Users\\kalit_000\\Desktop\\udemy_spark\\ml-100k\\u.item")
    val moviesid=lines.map(x => x.split("\t")).map(x => (x(1),0))
    val test=moviesid.map(x => x._1)
    val movienames=movienamesfile.map(x => x.split("\\|")).map(x =>
(x(0),x(1)))
    val movienamejoined=moviesid.join(movienames).distinct()

Thanks
Sri



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-inner-join-tp25193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

spark inner join

Reply via email to