Hi Andrew,

I submitted that within the cluster. Looks like the standalone-cluster mode
didn't put the jars to its http server, and passed the file:/... to the
driver node. That's why the driver node couldn't find the jars.

However, I copied my files to all slaves, it still didn't work, see my
second email.

I have no idea why yarn-client didn't work. I'm suspecting the following
code is problematic, possible?

I have multiple files that needs the SparkContext, so I put it in an object
(instead of the main function), and SContext is imported to multiple places.

object SContext {

  var conf = new
SparkConf().setAppName(Conf.getString("spark.conf.app_name")).setMaster(Conf.getString("spark.conf.master"))
  var sc = new SparkContext(conf)

}

spark.conf.master is "yarn-cluster" in my application.conf, but I think
spark-submit will override the master mode, right?



Jianshi




On Wed, Jun 18, 2014 at 12:37 AM, Andrew Or <and...@databricks.com> wrote:

> Standalone-client mode is not officially supported at the moment. For
> standalone-cluster and yarn-client modes, however, they should work.
>
> For both modes, are you running spark-submit from within the cluster, or
> outside of it? If the latter, could you try running it from within the
> cluster and see if it works? (Does your rtgraph.jar exist on the machine
> from which you run spark-submit?)
>
>
> 2014-06-17 2:41 GMT-07:00 Jianshi Huang <jianshi.hu...@gmail.com>:
>
> Hi,
>>
>> I've stuck using either yarn-client or standalone-client mode. Either
>> will stuck when I submit jobs, the last messages it printed were:
>>
>> ...
>> 14/06/17 02:37:17 INFO spark.SparkContext: Added JAR
>> file:/x/home/jianshuang/tmp/lib/commons-vfs2.jar at
>> http://10.196.195.25:56377/jars/commons-vfs2.jar with timestamp
>> 1402997837065
>> 14/06/17 02:37:17 INFO spark.SparkContext: Added JAR
>> file:/x/home/jianshuang/tmp/rtgraph.jar at
>> http://10.196.195.25:56377/jars/rtgraph.jar with timestamp 1402997837065
>> 14/06/17 02:37:17 INFO cluster.YarnClusterScheduler: Created
>> YarnClusterScheduler
>> 14/06/17 02:37:17 INFO yarn.ApplicationMaster$$anon$1: Adding shutdown
>> hook for context org.apache.spark.SparkContext@6655cf60
>>
>> I can use yarn-cluster to run my app but it's not very convenient to
>> monitor the progress.
>>
>> Standalone-cluster mode doesn't work, it reports file not found error:
>>
>> Driver successfully submitted as driver-20140617023956-0003
>> ... waiting before polling master for driver state
>> ... polling master for driver state
>> State of driver-20140617023956-0003 is ERROR
>> Exception from cluster was: java.io.FileNotFoundException: File
>> file:/x/home/jianshuang/tmp/rtgraph.jar does not exist
>>
>>
>> I'm using Spark 1.0.0 and my submit command looks like this:
>>
>>   ~/spark/spark-1.0.0-hadoop2.4.0/bin/spark-submit --name 'rtgraph'
>> --class com.paypal.rtgraph.demo.MapReduceWriter --master spark://
>> lvshdc5en0015.lvs.paypal.com:7077 --jars `find lib -type f | tr '\n'
>> ','` --executor-memory 20G --total-executor-cores 96 --deploy-mode cluster
>> rtgraph.jar
>>
>> List of jars I put in --jars option are:
>>
>> accumulo-core.jar
>> accumulo-fate.jar
>> accumulo-minicluster.jar
>> accumulo-trace.jar
>> accumulo-tracer.jar
>> chill_2.10-0.3.6.jar
>> commons-math.jar
>> commons-vfs2.jar
>> config-1.2.1.jar
>> gson.jar
>> guava.jar
>> joda-convert-1.2.jar
>> joda-time-2.3.jar
>> kryo-2.21.jar
>> libthrift.jar
>> quasiquotes_2.10-2.0.0-M8.jar
>> scala-async_2.10-0.9.1.jar
>> scala-library-2.10.4.jar
>> scala-reflect-2.10.4.jar
>>
>>
>> Anyone has hint what went wrong? Really confused.
>>
>>
>> Cheers,
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to