Hi Matt, I'm not very familiar with setup on ec2; the closest I can point you at is to look at the "launch_cluster" in ec2/spark_ec2.py, where the ports seem to be configured.
On Thu, Jul 17, 2014 at 1:29 PM, Matt Work Coarr <mattcoarr.w...@gmail.com> wrote: > Thanks Marcelo! This is a huge help!! > > Looking at the executor logs (in a vanilla spark install, I'm finding them > in $SPARK_HOME/work/*)... > > It launches the executor, but it looks like the CoarseGrainedExecutorBackend > is having trouble talking to the driver (exactly what you said!!!). > > Do you know what the range of random ports that is used for the the > executor-to-driver? Is that range adjustable? Any config setting or > environment variable? > > I manually setup my ec2 security group to include all the ports that the > spark ec2 script ($SPARK_HOME/ec2/spark_ec2.py) sets up in it's security > groups. They included (for those listed above 10000): > 19999 > 50060 > 50070 > 50075 > 60060 > 60070 > 60075 > > Obviously I'll need to make some adjustments to my EC2 security group! Just > need to figure out exactly what should be in there. To keep things simple, > I just have one security group for the master, slaves, and the driver > machine. > > In listing the port ranges in my current security group I looked at the > ports that spark_ec2.py sets up as well as the ports listed in the "spark > standalone mode" documentation page under "configuring ports for network > security": > > http://spark.apache.org/docs/latest/spark-standalone.html > > > Here are the relevant fragments from the executor log: > > Spark Executor Command: "/cask/jdk/bin/java" "-cp" > "::/cask/spark/conf:/cask/spark/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/cask/spark/lib/datanucleus-api-jdo-3. > > 2.1.jar:/cask/spark/lib/datanucleus-rdbms-3.2.1.jar:/cask/spark/lib/datanucleus-core-3.2.2.jar" > "-XX:MaxPermSize=128m" "-Dspark.akka.frameSize=100" "-Dspark.akka. > > frameSize=100" "-Xms512M" "-Xmx512M" > "org.apache.spark.executor.CoarseGrainedExecutorBackend" > "akka.tcp://spark@ip-10-202-11-191.ec2.internal:46787/user/CoarseGra > > inedScheduler" "0" "ip-10-202-8-45.ec2.internal" "8" > "akka.tcp://sparkWorker@ip-10-202-8-45.ec2.internal:7101/user/Worker" > "app-20140717195146-0000" > > ======================================== > > ... > > 14/07/17 19:51:47 DEBUG NativeCodeLoader: Trying to load the custom-built > native-hadoop library... > > 14/07/17 19:51:47 DEBUG NativeCodeLoader: Failed to load native-hadoop with > error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path > > 14/07/17 19:51:47 DEBUG NativeCodeLoader: > java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib > > 14/07/17 19:51:47 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > > 14/07/17 19:51:47 DEBUG JniBasedUnixGroupsMappingWithFallback: Falling back > to shell based > > 14/07/17 19:51:47 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping > impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping > > 14/07/17 19:51:48 DEBUG Groups: Group mapping > impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; > cacheTimeout=300000 > > 14/07/17 19:51:48 DEBUG SparkHadoopUtil: running as user: ec2-user > > ... > > > 14/07/17 19:51:48 INFO CoarseGrainedExecutorBackend: Connecting to driver: > akka.tcp://spark@ip-10-202-11-191.ec2.internal:46787/user/CoarseGrainedScheduler > > 14/07/17 19:51:48 INFO WorkerWatcher: Connecting to worker > akka.tcp://sparkWorker@ip-10-202-8-45.ec2.internal:7101/user/Worker > > 14/07/17 19:51:49 INFO WorkerWatcher: Successfully connected to > akka.tcp://sparkWorker@ip-10-202-8-45.ec2.internal:7101/user/Worker > > 14/07/17 19:53:29 ERROR CoarseGrainedExecutorBackend: Driver Disassociated > [akka.tcp://sparkExecutor@ip-10-202-8-45.ec2.internal:55670] -> > [akka.tcp://spark@ip-10-202-11-191.ec2.internal:46787] disassociated! > Shutting down. > > > Thanks a bunch! > Matt > > > On Thu, Jul 17, 2014 at 1:21 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> When I meant the executor log, I meant the log of the process launched >> by the worker, not the worker. In my CDH-based Spark install, those >> end up in /var/run/spark/work. >> >> If you look at your worker log, you'll see it's launching the executor >> process. So there should be something there. >> >> Since you say it works when both are run in the same node, that >> probably points to some communication issue, since the executor needs >> to connect back to the driver. Check to see if you don't have any >> firewalls blocking the ports Spark tries to use. (That's one of the >> non-resource-related cases that will cause that message.) -- Marcelo