This is simply not generally true, no, and not in this case. The
programmatic and SQL APIs overlap a lot, and where they do, they're
essentially aliases. Use whatever is more natural.
What I wouldn't recommend doing is emulating SQL-like behavior in custom
code, UDFs, etc. The native operators will be faster.
Sometimes you have to go outside SQL where necessary, like in UDFs or
complex aggregation logic. Then you can't use SQL.

On Fri, Dec 24, 2021 at 12:05 PM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> yeah I think that in practice you will always find that dataframes can
> give issues regarding a lot of things, and then you can argue. In the SPARK
> conference, I think last year, it was shown that more than 92% or 95% use
> the SPARK SQL API, if I am not mistaken.
>
> I think that you can do the entire processing at one single go.
>
> Can you please write down the end to end SQL and share without the 16000
> iterations?
>
>
> Regards,
> Gourav Sengupta
>
>
> On Fri, Dec 24, 2021 at 5:16 PM Andrew Davidson <aedav...@ucsc.edu> wrote:
>
>> Hi Sean and Gourav
>>
>>
>>
>> Thanks for the suggestions. I thought that both the sql and dataframe
>> apis are wrappers around the same frame work? Ie. catalysts.
>>
>>
>>
>> I tend to mix and match my code. Sometimes I find it easier to write
>> using sql some times dataframes. What is considered best practices?
>>
>>
>>
>> Here is an example
>>
>>
>>
>>

Reply via email to