On Fri, Mar 15, 2024 at 3:10 AM Mich Talebzadeh <mich.talebza...@gmail.com>

> No Data Transfer During Creation: --> Data transfer occurs only when an
> action is triggered.
> Distributed Processing: --> DataFrames are distributed for parallel
> execution, not stored entirely on the driver node.
> Lazy Evaluation Optimization: --> Delaying data transfer until necessary
> enhances performance.
> Shuffle vs. Partitioning: --> Data movement during partitioning is not
> considered a shuffle in Spark terminology.
> Shuffles involve more complex data rearrangement.

So just to be clear the transformations are always executed on the worker
node but it is just transferred until an action on the dataframe is

Am I correct ?

If so, then how do I generate a large dataset ?

I may need something like that for synthetic data for testing. Any way to
do that ?

Sreyan Chakravarty

Reply via email to