Dear friend, thanks a ton was looking for linting for SQL for a long time, looks like https://sqlfluff.com/ is something that can be used :)
Thank you so much, and wish you all a wonderful new year. Regards, Gourav On Tue, Dec 26, 2023 at 4:42 AM Bjørn Jørgensen <bjornjorgen...@gmail.com> wrote: > You can try sqlfluff <https://sqlfluff.com/> it's a linter for SQL code > and it seems to have support for sparksql > <https://pypi.org/project/sqlfluff/> > > > man. 25. des. 2023 kl. 17:13 skrev ram manickam <ramsidm...@gmail.com>: > >> Thanks Mich, Nicholas. I tried looking over the stack overflow post and >> none of them >> Seems to cover the syntax validation. Do you know if it's even possible >> to do syntax validation in spark? >> >> Thanks >> Ram >> >> On Sun, Dec 24, 2023 at 12:49 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Well not to put too finer point on it, in a public forum, one ought to >>> respect the importance of open communication. Everyone has the right to ask >>> questions, seek information, and engage in discussions without facing >>> unnecessary patronization. >>> >>> >>> >>> Mich Talebzadeh, >>> Dad | Technologist | Solutions Architect | Engineer >>> London >>> United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Sun, 24 Dec 2023 at 18:27, Nicholas Chammas < >>> nicholas.cham...@gmail.com> wrote: >>> >>>> This is a user-list question, not a dev-list question. Moving this >>>> conversation to the user list and BCC-ing the dev list. >>>> >>>> Also, this statement >>>> >>>> > We are not validating against table or column existence. >>>> >>>> is not correct. When you call spark.sql(…), Spark will lookup the table >>>> references and fail with TABLE_OR_VIEW_NOT_FOUND if it cannot find them. >>>> >>>> Also, when you run DDL via spark.sql(…), Spark will actually run it. So >>>> spark.sql(“drop table my_table”) will actually drop my_table. It’s not a >>>> validation-only operation. >>>> >>>> This question of validating SQL is already discussed on Stack Overflow >>>> <https://stackoverflow.com/q/46973729/877069>. You may find some >>>> useful tips there. >>>> >>>> Nick >>>> >>>> >>>> On Dec 24, 2023, at 4:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >>>> wrote: >>>> >>>> >>>> Yes, you can validate the syntax of your PySpark SQL queries without >>>> connecting to an actual dataset or running the queries on a cluster. >>>> PySpark provides a method for syntax validation without executing the >>>> query. Something like below >>>> ____ __ >>>> / __/__ ___ _____/ /__ >>>> _\ \/ _ \/ _ `/ __/ '_/ >>>> /__ / .__/\_,_/_/ /_/\_\ version 3.4.0 >>>> /_/ >>>> >>>> Using Python version 3.9.16 (main, Apr 24 2023 10:36:11) >>>> Spark context Web UI available at http://rhes75:4040 >>>> Spark context available as 'sc' (master = local[*], app id = >>>> local-1703410019374). >>>> SparkSession available as 'spark'. >>>> >>> from pyspark.sql import SparkSession >>>> >>> spark = SparkSession.builder.appName("validate").getOrCreate() >>>> 23/12/24 09:28:02 WARN SparkSession: Using an existing Spark session; >>>> only runtime SQL configurations will take effect. >>>> >>> sql = "SELECT * FROM <TABLE> WHERE <COLUMN> = some value" >>>> >>> try: >>>> ... spark.sql(sql) >>>> ... print("is working") >>>> ... except Exception as e: >>>> ... print(f"Syntax error: {e}") >>>> ... >>>> Syntax error: >>>> [PARSE_SYNTAX_ERROR] Syntax error at or near '<'.(line 1, pos 14) >>>> >>>> == SQL == >>>> SELECT * FROM <TABLE> WHERE <COLUMN> = some value >>>> --------------^^^ >>>> >>>> Here we only check for syntax errors and not the actual existence of >>>> query semantics. We are not validating against table or column existence. >>>> >>>> This method is useful when you want to catch obvious syntax errors >>>> before submitting your PySpark job to a cluster, especially when you don't >>>> have access to the actual data. >>>> >>>> In summary >>>> >>>> - Theis method validates syntax but will not catch semantic errors >>>> - If you need more comprehensive validation, consider using a >>>> testing framework and a small dataset. >>>> - For complex queries, using a linter or code analysis tool can >>>> help identify potential issues. >>>> >>>> HTH >>>> >>>> >>>> Mich Talebzadeh, >>>> Dad | Technologist | Solutions Architect | Engineer >>>> London >>>> United Kingdom >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Sun, 24 Dec 2023 at 07:57, ram manickam <ramsidm...@gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> Is there a way to validate pyspark sql to validate only syntax >>>>> errors?. I cannot connect do actual data set to perform this validation. >>>>> Any help would be appreciated. >>>>> >>>>> >>>>> Thanks >>>>> Ram >>>>> >>>> >>>> > > -- > Bjørn Jørgensen > Vestre Aspehaug 4, 6010 Ålesund > Norge > > +47 480 94 297 >