the life cycle shuffle Dependency

2023-12-24 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of a

Re: Validate spark sql

2023-12-24 Thread Nicholas Chammas
This is a user-list question, not a dev-list question. Moving this conversation to the user list and BCC-ing the dev list. Also, this statement > We are not validating against table or column existence. is not correct. When you call spark.sql(…), Spark will lookup the table references and fail

Re: Validate spark sql

2023-12-24 Thread Mich Talebzadeh
Yes, you can validate the syntax of your PySpark SQL queries without connecting to an actual dataset or running the queries on a cluster. PySpark provides a method for syntax validation without executing the query. Something like below __ / __/__ ___ _/ /__ _\ \/ _ \