A colleague kindly pointed out about giving an example of output which wll be added to README
Doing analysis for column Postcode Json formatted output { "Postcode": { "exists": true, "num_rows": 93348, "data_type": "string", "null_count": 21921, "null_percentage": 23.48, "distinct_count": 38726, "distinct_percentage": 41.49 } } Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Tue, 21 May 2024 at 16:21, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > I just wanted to share a tool I built called *spark-column-analyzer*. > It's a Python package that helps you dig into your Spark DataFrames with > ease. > > Ever spend ages figuring out what's going on in your columns? Like, how > many null values are there, or how many unique entries? Built with data > preparation for Generative AI in mind, it aids in data imputation and > augmentation – key steps for creating realistic synthetic data. > > *Basics* > > - *Effortless Column Analysis:* It calculates all the important stats > you need for each column, like null counts, distinct values, percentages, > and more. No more manual counting or head scratching! > - *Simple to Use:* Just toss in your DataFrame and call the > analyze_column function. Bam! Insights galore. > - *Makes Data Cleaning easier:* Knowing your data's quality helps you > clean it up way faster. This package helps you figure out where the missing > values are hiding and how much variety you've got in each column. > - *Detecting skewed columns* > - *Open Source and Friendly:* Feel free to tinker, suggest > improvements, or even contribute some code yourself! We love collaboration > in the Spark community. > > *Installation:* > > Using pip from the link: https://pypi.org/project/spark-column-analyzer/ > > > *pip install spark-column-analyzer* > Also you can clone the project from gitHub > > > *git clone https://github.com/michTalebzadeh/spark_column_analyzer.git > <https://github.com/michTalebzadeh/spark_column_analyzer.git>* > > The details are in the attached RENAME file > > Let me know what you think! Feedback is always welcome. > > HTH > > Mich Talebzadeh, > Technologist | Architect | Data Engineer | Generative AI | FinCrime > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >