Re: SQL vs. DataFrame API

Bob Corsaro Tue, 23 Jun 2015 09:21:23 -0700

I've only tried it in python

On Tue, Jun 23, 2015 at 12:16 PM Ignacio Blasco <elnopin...@gmail.com>
wrote:


> That issue happens only in python dsl?
> El 23/6/2015 5:05 p. m., "Bob Corsaro" <rcors...@gmail.com> escribió:
>
>> Thanks! The solution:
>>
>> https://gist.github.com/dokipen/018a1deeab668efdf455
>>
>> On Mon, Jun 22, 2015 at 4:33 PM Davies Liu <dav...@databricks.com> wrote:
>>
>>> Right now, we can not figure out which column you referenced in
>>> `select`, if there are multiple row with the same name in the joined
>>> DataFrame (for example, two `value`).
>>>
>>> A workaround could be:
>>>
>>> numbers2 = numbers.select(df.name, df.value.alias('other'))
>>> rows = numbers.join(numbers2,
>>>                     (numbers.name==numbers2.name) & (numbers.value !=
>>> numbers2.other),
>>>                     how="inner") \
>>>               .select(numbers.name, numbers.value, numbers2.other) \
>>>               .collect()
>>>
>>> On Mon, Jun 22, 2015 at 12:53 PM, Ignacio Blasco <elnopin...@gmail.com>
>>> wrote:
>>> > Sorry thought it was scala/spark
>>> >
>>> > El 22/6/2015 9:49 p. m., "Bob Corsaro" <rcors...@gmail.com> escribió:
>>> >>
>>> >> That's invalid syntax. I'm pretty sure pyspark is using a DSL to
>>> create a
>>> >> query here and not actually doing an equality operation.
>>> >>
>>> >> On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco <elnopin...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Probably you should use === instead of == and !== instead of !=
>>> >>>
>>> >>> Can anyone explain why the dataframe API doesn't work as I expect it
>>> to
>>> >>> here? It seems like the column identifiers are getting confused.
>>> >>>
>>> >>> https://gist.github.com/dokipen/4b324a7365ae87b7b0e5
>>>
>>

Re: SQL vs. DataFrame API

Reply via email to