Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

Sean Owen Thu, 17 Nov 2022 09:47:14 -0800

Weird, does Teradata not support LIMIT n? looking at the Spark source code
suggests it won't. The syntax is "SELECT TOP"? I wonder if that's why the
generic query that seems to test existence loses the LIMIT.
But, that "SELECT 1" test seems to be used for MySQL, Postgres, so I'm
still not sure where it's coming from or if it's coming from Spark. You're
using the teradata dialect I assume. Can you use the latest Spark to test?


On Thu, Nov 17, 2022 at 11:31 AM Ramakrishna Rayudu <
ramakrishna560.ray...@gmail.com> wrote:

> Yes I am sure that we are not generating this kind of queries. Okay then
> problem is  LIMIT is not coming up in query. Can you please suggest me any
> direction.
>
> Thanks,
> Rama
>
> On Thu, Nov 17, 2022, 10:56 PM Sean Owen <sro...@gmail.com> wrote:
>
>> Hm, the existence queries even in 2.4.x had LIMIT 1. Are you sure nothing
>> else is generating or changing those queries?
>>
>> On Thu, Nov 17, 2022 at 11:20 AM Ramakrishna Rayudu <
>> ramakrishna560.ray...@gmail.com> wrote:
>>
>>> We are using spark 2.4.4 version.
>>> I can see two types of queries in DB logs.
>>>
>>> SELECT 1 FROM (INPUT_QUERY) SPARK_GEN_SUB_0
>>>
>>> SELECT * FROM (INPUT_QUERY) SPARK_GEN_SUB_0 WHERE 1=0
>>>
>>> When we see `SELECT *` which ending up with `Where 1=0`  but query
>>> starts with `SELECT 1` there is no where condition.
>>>
>>> Thanks,
>>> Rama
>>>
>>> On Thu, Nov 17, 2022, 10:39 PM Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> Hm, actually that doesn't look like the queries that Spark uses to test
>>>> existence, which will be "SELECT 1 ... LIMIT 1" or "SELECT * ... WHERE 1=0"
>>>> depending on the dialect. What version, and are you sure something else is
>>>> not sending those queries?
>>>>
>>>> On Thu, Nov 17, 2022 at 11:02 AM Ramakrishna Rayudu <
>>>> ramakrishna560.ray...@gmail.com> wrote:
>>>>
>>>>> Hi Sean,
>>>>>
>>>>> Thanks for your response I think it has the performance impact because
>>>>> if the query return one million rows then in the response It's self we 
>>>>> will
>>>>> one million rows unnecessarily like below.
>>>>>
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> 1
>>>>> .
>>>>> .
>>>>> 1
>>>>>
>>>>>
>>>>> Its impact the performance. Can we any alternate solution for this.
>>>>>
>>>>> Thanks,
>>>>> Rama
>>>>>
>>>>>
>>>>> On Thu, Nov 17, 2022, 10:17 PM Sean Owen <sro...@gmail.com> wrote:
>>>>>
>>>>>> This is a query to check the existence of the table upfront.
>>>>>> It is nearly a no-op query; can it have a perf impact?
>>>>>>
>>>>>> On Thu, Nov 17, 2022 at 10:42 AM Ramakrishna Rayudu <
>>>>>> ramakrishna560.ray...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Team,
>>>>>>>
>>>>>>> I am facing one issue. Can you please help me on this.
>>>>>>>
>>>>>>> <https://stackoverflow.com/>
>>>>>>>
>>>>>>>    1.
>>>>>>>
>>>>>>>
>>>>>>> <https://stackoverflow.com/posts/74477662/timeline>
>>>>>>>
>>>>>>> We are connecting Tera data from spark SQL with below API
>>>>>>>
>>>>>>> Dataset<Row> jdbcDF = spark.read().jdbc(connectionUrl, tableQuery, 
>>>>>>> connectionProperties);
>>>>>>>
>>>>>>> when we execute above logic on large table with million rows every time 
>>>>>>> we are seeing below
>>>>>>>
>>>>>>> extra query is executing every time as this resulting performance hit 
>>>>>>> on DB.
>>>>>>>
>>>>>>> This below information we got from DBA. We dont have any logs on
>>>>>>> SPARK SQL.
>>>>>>>
>>>>>>> SELECT 1 FROM ONE_MILLION_ROWS_TABLE;
>>>>>>>
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1
>>>>>>> 1
>>>>>>>
>>>>>>> Can you please clarify why this query is executing or is there any
>>>>>>> chance that this type of query is executing from our code it self while
>>>>>>> check for rows count from dataframe.
>>>>>>>
>>>>>>> Please provide me your inputs on this.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Rama
>>>>>>>
>>>>>>

Re: [Spark SQL]: Is it possible that spark SQL appends "SELECT 1 " to the query

Reply via email to