You will probably need to do a couple of things. One, you will need to
probably increase the "spark.sql.broadcastTimeout" setting. As well, when
you broadcast a variable it gets replicated once per executor not once per
machine so you will need to increase your executor size and allow more
cores to run per executor. Depending on if you are using pyspark or not,
you will also need to remember that if you are trying to use this large
variable in a python process (RDD functions, UDFs, etc) that that variable
will be transferred to python memory space once per python process that
gets spawned which means that you could ultimately end up with many more
copies of that variable in memory at any given point in time than you may
have intended.

On Wed, Apr 10, 2019 at 9:40 AM V0lleyBallJunki3 <venkatda...@gmail.com>
wrote:

> I am using spark.sparkContext.broadcast() to broadcast. Is this even true
> if
> the memory on our machines is 244 Gb a 70 Gb variable can't be broadcasted
> even with high network speed?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to