Hi there,

I have a CDH cluster set up, and I tried using the Spark parcel come with
Cloudera Manager, but it turned out they even don't have the run-example
shell command in the bin folder. Then I removed it from the cluster and
cloned the incubator-spark into the name node of my cluster, and built from
source there successfully with everything as default.

I ran a few examples and everything seems work fine in the local mode. Then
I am thinking about scale it to my cluster, which is what the "DISTRIBUTE +
ACTIVATE" command does in Cloudera Manager. I want to add all the datanodes
to the slaves and think I should run Spark in the standalone mode.

Say I am trying to set up Spark in the standalone mode following this
instruction:
https://spark.incubator.apache.org/docs/latest/spark-standalone.html
However, it says "Once started, the master will print out a
spark://HOST:PORT URL for itself, which you can use to connect workers to
it, or pass as the "master" argument to SparkContext. You can also find
this URL on the master's web UI, which is http://localhost:8080 by default."

After I started the master, there is no URL printed on the screen and
neither the web UI is running.
Here is the output:
[root@box incubator-spark]# ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to
/root/bwang_spark_new/incubator-spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-box.out

First Question: am I even in the ballpark to run Spark in standalone mode
if I try to fully utilize my cluster? I saw there are four ways to launch
Spark on a cluster, AWS-EC2, Spark in standalone, Apache Meso, Hadoop
Yarn... which I guess standalone mode is the way to go?

Second Question: how to get the Spark URL of the cluster, why the output is
not like what the instruction says?

Best regards,

Bin

Reply via email to