Hi, Thanks for your response. I am not clear about why the query is ambiguous.
val both = df_2.join(df_1, df_2("country")===df_1("country"), "left_outer") I thought df_2("country")===df_1("country") indicates that the country field in the 2 dataframes should match and df_2("country") is the equivalent of df_2.country in SQL, while df_1("country") is the equivalent of df_1.country in SQL. So I am not sure why it is ambiguous. In Spark 1.2.0 I have used the same logic using SparkSQL and Tables ( e.g. "WHERE tab1.country = tab2.country") and had no problems getting the correct result. thanks On Wed, Mar 25, 2015 at 11:05 AM, Michael Armbrust <mich...@databricks.com> wrote: > Unfortunately you are now hitting a bug (that is fixed in master and will > be released in 1.3.1 hopefully next week). However, even with that your > query is still ambiguous and you will need to use aliases: > > val df_1 = df.filter( df("event") === 0) > . select("country", "cnt").as("a") > val df_2 = df.filter( df("event") === 3) > . select("country", "cnt").as("b") > val both = df_2.join(df_1, $"a.country" === $"b.country"), "left_outer") > > > > On Tue, Mar 24, 2015 at 11:57 PM, S Krishna <skrishna...@gmail.com> wrote: > >> Hi, >> >> Thanks for your response. I modified my code as per your suggestion, but >> now I am getting a runtime error. Here's my code: >> >> val df_1 = df.filter( df("event") === 0) >> . select("country", "cnt") >> >> val df_2 = df.filter( df("event") === 3) >> . select("country", "cnt") >> >> df_1.show() >> //produces the following output : >> // country cnt >> // tw 3000 >> // uk 2000 >> // us 1000 >> >> df_2.show() >> //produces the following output : >> // country cnt >> // tw 25 >> // uk 200 >> // us 95 >> >> val both = df_2.join(df_1, df_2("country")===df_1("country"), >> "left_outer") >> >> I am getting the following error when executing the join statement: >> >> java.util.NoSuchElementException: next on empty iterator. >> >> This error seems to be originating at DataFrame.join (line 133 in >> DataFrame.scala). >> >> The show() results show that both dataframes do have columns named >> "country" and that they are non-empty. I also tried the simpler join ( i.e. >> df_2.join(df_1) ) and got the same error stated above. >> >> I would like to know what is wrong with the join statement above. >> >> thanks >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Mar 24, 2015 at 6:08 PM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> You need to use `===`, so that you are constructing a column expression >>> instead of evaluating the standard scala equality method. Calling methods >>> to access columns (i.e. df.county is only supported in python). >>> >>> val join_df = df1.join( df2, df1("country") === df2("country"), >>> "left_outer") >>> >>> On Tue, Mar 24, 2015 at 5:50 PM, SK <skrishna...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am trying to port some code that was working in Spark 1.2.0 on the >>>> latest >>>> version, Spark 1.3.0. This code involves a left outer join between two >>>> SchemaRDDs which I am now trying to change to a left outer join between >>>> 2 >>>> DataFrames. I followed the example for left outer join of DataFrame at >>>> >>>> https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html >>>> >>>> Here's my code, where df1 and df2 are the 2 dataframes I am joining on >>>> the >>>> "country" field: >>>> >>>> val join_df = df1.join( df2, df1.country == df2.country, >>>> "left_outer") >>>> >>>> But I got a compilation error that value country is not a member of >>>> sql.DataFrame >>>> >>>> I also tried the following: >>>> val join_df = df1.join( df2, df1("country") == df2("country"), >>>> "left_outer") >>>> >>>> I got a compilation error that it is a Boolean whereas a Column is >>>> required. >>>> >>>> So what is the correct Column expression I need to provide for joining >>>> the 2 >>>> dataframes on a specific field ? >>>> >>>> thanks >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/column-expression-in-left-outer-join-for-DataFrame-tp22209.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >