I only ran HDFS on the same nodes as Spark and that worked out great
performance and robustness wise. However, I did not run Hadoop itself to
do any computations/jobs on the same nodes. My expectation is that if
you actually ran both at the same time with your configuration, the
performance would be pretty bad. It's mostly about memory really and
then CPU(s) etc.
OD
On 6/20/14, 2:41 PM, Sameer Tilak wrote:
Dear Spark users,
I have a small 4 node Hadoop cluster. Each node is a VM -- 4 virtual
cores, 8GB memory and 500GB disk. I am currently running Hadoop on it.
I would like to run Spark (in standalone mode) along side Hadoop on
the same nodes. Given the configuration of my nodes, will that work?
Does anyone has any experience in terms of stability and performance
of running Spark and Hadoop on somewhat resource-constrained nodes. I
was looking at the Spark documentation and there is a way to configure
memory and cores for the and worker nodes and memory for the master
node: SPARK_WORKER_CORES, SPARK_WORKER_MEMORY, SPARK_DAEMON_MEMORY.
Any recommendations on how to share resource between HAdoop and Spark?