Re: Validate spark sql

Gourav Sengupta Tue, 26 Dec 2023 06:10:10 -0800

Dear friend,

thanks a ton was looking for linting for SQL for a long time, looks like
https://sqlfluff.com/ is something that can be used :)


Thank you so much, and wish you all a wonderful new year.

Regards,
Gourav

On Tue, Dec 26, 2023 at 4:42 AM Bjørn Jørgensen <bjornjorgen...@gmail.com>
wrote:

> You can try sqlfluff <https://sqlfluff.com/> it's a linter for SQL code
> and it seems to have support for sparksql
> <https://pypi.org/project/sqlfluff/>
>
>
> man. 25. des. 2023 kl. 17:13 skrev ram manickam <ramsidm...@gmail.com>:
>
>> Thanks Mich, Nicholas. I tried looking over the stack overflow post and
>> none of them
>> Seems to cover the syntax validation. Do you know if it's even possible
>> to do syntax validation in spark?
>>
>> Thanks
>> Ram
>>
>> On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Well not to put too finer point on it, in a public forum, one ought to
>>> respect the importance of open communication. Everyone has the right to ask
>>> questions, seek information, and engage in discussions without facing
>>> unnecessary patronization.
>>>
>>>
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
>>>> This is a user-list question, not a dev-list question. Moving this
>>>> conversation to the user list and BCC-ing the dev list.
>>>>
>>>> Also, this statement
>>>>
>>>> > We are not validating against table or column existence.
>>>>
>>>> is not correct. When you call spark.sql(…), Spark will lookup the table
>>>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them.
>>>>
>>>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So
>>>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a
>>>> validation-only operation.
>>>>
>>>> This question of validating SQL is already discussed on Stack Overflow
>>>> <https://stackoverflow.com/q/46973729/877069>. You may find some
>>>> useful tips there.
>>>>
>>>> Nick
>>>>
>>>>
>>>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> Yes, you can validate the syntax of your PySpark SQL queries without
>>>> connecting to an actual dataset or running the queries on a cluster.
>>>> PySpark provides a method for syntax validation without executing the
>>>> query. Something like below
>>>> ____              __
>>>>      / __/__  ___ _____/ /__
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>    /__ / .__/\_,_/_/ /_/\_\   version 3.4.0
>>>>       /_/
>>>>
>>>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11)
>>>> Spark context Web UI available at http://rhes75:4040
>>>> Spark context available as 'sc' (master = local[*], app id =
>>>> local-1703410019374).
>>>> SparkSession available as 'spark'.
>>>> >>> from pyspark.sql import SparkSession
>>>> >>> spark = SparkSession.builder.appName("validate").getOrCreate()
>>>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session;
>>>> only runtime SQL configurations will take effect.
>>>> >>> sql = "SELECT * FROM <TABLE> WHERE <COLUMN> = some value"
>>>> >>> try:
>>>> ...   spark.sql(sql)
>>>> ...   print("is working")
>>>> ... except Exception as e:
>>>> ...   print(f"Syntax error: {e}")
>>>> ...
>>>> Syntax error:
>>>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14)
>>>>
>>>> == SQL ==
>>>> SELECT * FROM <TABLE> WHERE <COLUMN> = some value
>>>> --------------^^^
>>>>
>>>> Here we only check for syntax errors and not the actual existence of
>>>> query semantics. We are not validating against table or column existence.
>>>>
>>>> This method is useful when you want to catch obvious syntax errors
>>>> before submitting your PySpark job to a cluster, especially when you don't
>>>> have access to the actual data.
>>>>
>>>> In summary
>>>>
>>>>    - Theis method validates syntax but will not catch semantic errors
>>>>    - If you need more comprehensive validation, consider using a
>>>>    testing framework and a small dataset.
>>>>    - For complex queries, using a linter or code analysis tool can
>>>>    help identify potential issues.
>>>>
>>>> HTH
>>>>
>>>>
>>>> Mich Talebzadeh,
>>>> Dad | Technologist | Solutions Architect | Engineer
>>>> London
>>>> United Kingdom
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, 24 Dec 2023 at 07:57, ram manickam <ramsidm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> Is there a way to validate pyspark sql to validate only syntax
>>>>> errors?. I cannot connect do actual data set to perform this validation.
>>>>> Any help would be appreciated.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Ram
>>>>>
>>>>
>>>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: Validate spark sql

Reply via email to