Re: pyspark - Where are Dataframes created from Python objects stored?

Sreyan Chakravarty Mon, 18 Mar 2024 00:16:47 -0700

On Fri, Mar 15, 2024 at 3:10 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:


>
> No Data Transfer During Creation: --> Data transfer occurs only when an
> action is triggered.
> Distributed Processing: --> DataFrames are distributed for parallel
> execution, not stored entirely on the driver node.
> Lazy Evaluation Optimization: --> Delaying data transfer until necessary
> enhances performance.
> Shuffle vs. Partitioning: --> Data movement during partitioning is not
> considered a shuffle in Spark terminology.
> Shuffles involve more complex data rearrangement.
>

So just to be clear the transformations are always executed on the worker
node but it is just transferred until an action on the dataframe is
triggered.

Am I correct ?

If so, then how do I generate a large dataset ?

I may need something like that for synthetic data for testing. Any way to
do that ?


-- 
Regards,
Sreyan Chakravarty

Re: pyspark - Where are Dataframes created from Python objects stored?

Reply via email to