the new DataSet API is supposed to provide type safety and type checks at
compile time
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#join-operations

It does this indeed for a lot of places, but I found it still doesn't have
a type safe join:

val ds1 = hc.sql("select col1, col2 from mytable")

val ds2 = hc.sql("select col3 , col4 from mytable2")

val ds3 = ds1.joinWith(ds2, ds1.col("col1") === ds2.col("col2"))

here spark has no way to make sure (at compile time) that the two columns
being joined together
, "col1" and "col2" are of matching types. This is contrast to rdd join,
where it would be detected at compile time.

am I missing something?

thanks

Reply via email to