Hello All,

I have a following SQL query like this:

select a.a_id, b.b_id, c.c_id from table_a a join table_b b on a.a_id =
b.a_id join table_c c on b.b_id = c.b_id

In scala i have done this so far:

table_a_rdd = sc.textFile(...)
table_b_rdd = sc.textFile(...)
table_c_rdd = sc.textFile(...)

val table_a_rowRDD = table_a_rdd.map(_.split("\\x07")).map(line =>
(line(0), line))
val table_b_rowRDD = table_a_rdd.map(_.split("\\x07")).map(line =>
(line(0), line))
val table_c_rowRDD = table_a_rdd.map(_.split("\\x07")).map(line =>
(line(0), line))

Each line has the first value at its primary key.

While I can join 2 RDDs using table_a_rowRDD.join(table_b_rowRDD) to join,
is it possible to join multiple RDDs in a single expression? like
table_a_rowRDD.join(table_b_rowRDD).join(table_c_rowRDD) ? Also, how can I
specify the column on which I can join multiple RDDs?

Reply via email to