I think you're talking about koalas, which is in Spark 3.2, but that is
unrelated to toPandas(), nor to the question of how it differs from
collect().
Shuffle is also unrelated.

On Wed, Nov 3, 2021 at 3:45 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> As I understood in the previous versions of Spark the data could not be
> processed and stored in Pandas data frames in a distributed mode as these
> data frames store data in RAM which is the driver in this case.
> However, I was under the impression that this limitation no longer exists
> in 3.2? So if you have a k8s cluster with 64GB of RAM for one node and 8GB
> of RAM for others, and PySpark running in cluster mode,  how do you expect
> the process to confine itself to the master node? What will happen if you
> increase executor node(s) RAM to 64GB temporarily (balanced k8s cluster)
> and run the job again?
>
> Worth noting that the current Spark on k8s  does not support external
> shuffle. For now we have two parameters for Dynamic Resource Allocation.
> These are
>
>  --conf spark.dynamicAllocation.enabled=true \
>  --conf spark.dynamicAllocation.shuffleTracking.enabled=true \
>
>
> The idea is to use dynamic resource allocation where the driver tracks
> the shuffle files and evicts only executors not storing active shuffle
> files. So in a nutshell these shuffle files are stored in the executors
> themselves in the absence of the external shuffle. The model works on the
> basis of the "one-container-per-Pod" model
> <https://kubernetes.io/docs/concepts/workloads/pods/> meaning that for
> each node of the cluster there will be one node running the driver and each
> remaining node running one executor each.
>
>
>
> HTH
> ,
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>

Reply via email to