I've only tried it in python On Tue, Jun 23, 2015 at 12:16 PM Ignacio Blasco <elnopin...@gmail.com> wrote:
> That issue happens only in python dsl? > El 23/6/2015 5:05 p. m., "Bob Corsaro" <rcors...@gmail.com> escribió: > >> Thanks! The solution: >> >> https://gist.github.com/dokipen/018a1deeab668efdf455 >> >> On Mon, Jun 22, 2015 at 4:33 PM Davies Liu <dav...@databricks.com> wrote: >> >>> Right now, we can not figure out which column you referenced in >>> `select`, if there are multiple row with the same name in the joined >>> DataFrame (for example, two `value`). >>> >>> A workaround could be: >>> >>> numbers2 = numbers.select(df.name, df.value.alias('other')) >>> rows = numbers.join(numbers2, >>> (numbers.name==numbers2.name) & (numbers.value != >>> numbers2.other), >>> how="inner") \ >>> .select(numbers.name, numbers.value, numbers2.other) \ >>> .collect() >>> >>> On Mon, Jun 22, 2015 at 12:53 PM, Ignacio Blasco <elnopin...@gmail.com> >>> wrote: >>> > Sorry thought it was scala/spark >>> > >>> > El 22/6/2015 9:49 p. m., "Bob Corsaro" <rcors...@gmail.com> escribió: >>> >> >>> >> That's invalid syntax. I'm pretty sure pyspark is using a DSL to >>> create a >>> >> query here and not actually doing an equality operation. >>> >> >>> >> On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco <elnopin...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Probably you should use === instead of == and !== instead of != >>> >>> >>> >>> Can anyone explain why the dataframe API doesn't work as I expect it >>> to >>> >>> here? It seems like the column identifiers are getting confused. >>> >>> >>> >>> https://gist.github.com/dokipen/4b324a7365ae87b7b0e5 >>> >>