Given a fixed amount of memory allocated to your workers, more memory per
executor means fewer executors can execute in parallel. This means it takes
longer to finish all of the tasks. Set high enough, and your executors can
find no worker with enough memory and so they all are stuck waiting for
resources. The reason the tasks seem to take longer is really that they
spend time waiting for an executor rather than spend more time running.
That's my first guess.

If you want Spark to use more memory on your machines, give workers more
memory. It sounds like there is no value in increasing executor memory as
it only means you are underutilizing the CPU of your cluster by not running
as many tasks in parallel as would be optimal.
 Hi all,

I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records),
with a cluster of 3 nodes

Simple calculations like count take approximately 5s when using the default
value of executor.memory (512MB). When I scale this up to 2GB, several
Tasks take 1m or more (while most still are <1s), and tasks hang
indefinitely if I set it to 4GB or higher.

While these worker nodes aren't very powerful, they seem to have enough RAM
to handle this:

Running 'free –m' shows I have >7GB free on each worker.

Any tips on why these jobs would hang when given more available RAM?

Thanks
Ben

------------------------------

The information contained in this e-mail is confidential and/or proprietary
to Capital One and/or its affiliates. The information transmitted herewith
is intended only for use by the individual or entity to which it is
addressed.  If the reader of this message is not the intended recipient,
you are hereby notified that any review, retransmission, dissemination,
distribution, copying or other use of, or taking of any action in reliance
upon this information is strictly prohibited. If you have received this
communication in error, please contact the sender and delete the material
from your computer.

Reply via email to