Hi spark users and developers,

I'm using spark 1.5.1 (I have no choice because this is what we used). I
ran into some very unexpected behaviour when I did some join operations
lately. I cannot post my actual code here and the following code is not for
practical reasons but it should demonstrate the issue.

val base = sc.parallelize(( 0 to 49).map(i =>(i,0)) ++ (50 to
99).map((_,1))).toDF("id", "label")
val d1=base.where($"label" === 0)
val d2=base.where($"label" === 1)
d1.join(d2, d1("id") === d2("id"),
"left_outer").drop(d2("label")).select(d1("label"))


The above code will throw an exception saying the column label is not
found. Do you have a reason for throwing an exception when the column has
not been dropped for d1("label")?

Best Regards,

Jerry

Reply via email to