the new DataSet API is supposed to provide type safety and type checks at compile time https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#join-operations
It does this indeed for a lot of places, but I found it still doesn't have a type safe join: val ds1 = hc.sql("select col1, col2 from mytable") val ds2 = hc.sql("select col3 , col4 from mytable2") val ds3 = ds1.joinWith(ds2, ds1.col("col1") === ds2.col("col2")) here spark has no way to make sure (at compile time) that the two columns being joined together , "col1" and "col2" are of matching types. This is contrast to rdd join, where it would be detected at compile time. am I missing something? thanks