On Fri, Mar 15, 2024 at 3:10 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> > No Data Transfer During Creation: --> Data transfer occurs only when an > action is triggered. > Distributed Processing: --> DataFrames are distributed for parallel > execution, not stored entirely on the driver node. > Lazy Evaluation Optimization: --> Delaying data transfer until necessary > enhances performance. > Shuffle vs. Partitioning: --> Data movement during partitioning is not > considered a shuffle in Spark terminology. > Shuffles involve more complex data rearrangement. > So just to be clear the transformations are always executed on the worker node but it is just transferred until an action on the dataframe is triggered. Am I correct ? If so, then how do I generate a large dataset ? I may need something like that for synthetic data for testing. Any way to do that ? -- Regards, Sreyan Chakravarty