Hi Jerry What do you expect the outcome to be?
This is Spark 1.6.1 I see this without dropping d2! scala> d1.join(d2, d1("id") === d2("id"), "left_outer").select(d1("label")).collect res15: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0]) Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 28 March 2016 at 22:34, Jerry Lam <chiling...@gmail.com> wrote: > Hi spark users and developers, > > I'm using spark 1.5.1 (I have no choice because this is what we used). I > ran into some very unexpected behaviour when I did some join operations > lately. I cannot post my actual code here and the following code is not for > practical reasons but it should demonstrate the issue. > > val base = sc.parallelize(( 0 to 49).map(i =>(i,0)) ++ (50 to > 99).map((_,1))).toDF("id", "label") > val d1=base.where($"label" === 0) > val d2=base.where($"label" === 1) > d1.join(d2, d1("id") === d2("id"), > "left_outer").drop(d2("label")).select(d1("label")) > > > The above code will throw an exception saying the column label is not > found. Do you have a reason for throwing an exception when the column has > not been dropped for d1("label")? > > Best Regards, > > Jerry >