Not anything specific in my mind. Any IDE which is open to plugins can use it (e.g: VS Code and Jetbrains) to validate execution plans in the background and mark syntax errors based on the result.
On Thu, Sep 30, 2021 at 4:40 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > What IDEs do you have in mind? > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 30 Sept 2021 at 15:20, Ali Behjati <bahja...@gmail.com> wrote: > >> Yeah it doesn't remove the need of testing on sample data. It would be >> more of syntax check rather than test. I have witnessed that syntax errors >> occur a lot. >> >> Maybe after having dry-run we will be able to create some automation >> around basic syntax checking for IDEs too. >> >> On Thu, Sep 30, 2021 at 4:15 PM Sean Owen <sro...@gmail.com> wrote: >> >>> If testing, wouldn't you actually want to execute things? even if at a >>> small scale, on a sample of data? >>> >>> On Thu, Sep 30, 2021 at 9:07 AM Ali Behjati <bahja...@gmail.com> wrote: >>> >>>> Hey everyone, >>>> >>>> >>>> By dry run I mean ability to validate the execution plan but not >>>> executing it within the code. I was wondering whether this exists in spark >>>> or not. I couldn't find it anywhere. >>>> >>>> If it doesn't exist I want to propose adding such a feature in spark. >>>> >>>> Why is it useful? >>>> 1. Faster testing: When using pyspark or spark on scala/java without >>>> DataSet we are prone to typos and mistakes about column names and other >>>> logical problems. Unfortunately IDEs won't help much and when dealing with >>>> Big Data, testing by running the code takes a lot of time. So this way we >>>> can understand typos very fast. >>>> >>>> 2. (Continuous) Integrity checks: When there are upstream and >>>> downstream pipelines, we can understand breaking changes much faster by >>>> running downstream pipelines in "dry run" mode. >>>> >>>> I believe it is not so hard to implement and I volunteer to work on it >>>> if the community approves this feature request. >>>> >>>> It can be tackled in different ways. I have two Ideas for >>>> implementation: >>>> 1. Noop (No Op) executor engine >>>> 2. On reads just infer schema and replace it with empty table with same >>>> schema >>>> >>>> Thanks, >>>> Ali >>>> >>>