Thanks So much!

I did put sleep on my code to have the UI available.

Now from the UI, I can see:

·         In the “SparkProperty” Section,  the spark.jars and spark.files are 
set as what I want.

·         In the “Classpath Entries” Section, my jars and files paths are 
there(with a HDFS path)

And I check the HTTP file server directory, the stuctrue is like:
     D:\data\temp
                          \ --spark-UUID
                               \-- httpd-UUID
                                    \jars [empty]
                                    \files [empty]

So I guess the files and jars and not properly downloaded from HDFS to these 
folders?

I’m using standalone mode.

Any ideas?

Thanks
Dong Lei

From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Tuesday, June 9, 2015 4:46 PM
To: Dong Lei
Cc: user@spark.apache.org
Subject: Re: ClassNotDefException when using spark-submit with multiple jars 
and files located on HDFS

You can put a Thread.sleep(100000) in the code to have the UI available for 
quiet some time. (Put it just before starting any of your transformations) Or 
you can enable the spark history 
server<https://spark.apache.org/docs/latest/monitoring.html> too. I believe 
--jars<https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
 would download the dependency jars on all your worker machines (can be found 
in spark work dir of your application along with stderr stdout files).

Thanks
Best Regards

On Tue, Jun 9, 2015 at 1:29 PM, Dong Lei 
<dong...@microsoft.com<mailto:dong...@microsoft.com>> wrote:
Thanks Akhil:

The driver fails so fast to get a look at 4040. Is there any other way to see 
the download and ship process of the files?

Is driver supposed to download these jars from HDFS to some location, then ship 
them to excutors?
I can see from log that the driver downloaded the application jar but not the 
other jars specified by “—jars”.

Or I misunderstand the usage of “--jars”, and the jars should be already in 
every worker, driver will not download them?
Is there some useful docs?

Thanks
Dong Lei


From: Akhil Das 
[mailto:ak...@sigmoidanalytics.com<mailto:ak...@sigmoidanalytics.com>]
Sent: Tuesday, June 9, 2015 3:24 PM
To: Dong Lei
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: ClassNotDefException when using spark-submit with multiple jars 
and files located on HDFS

Once you submits the application, you can check in the driver UI (running on 
port 4040) Environment Tab to see whether those jars you added got shipped or 
not. If they are shipped and still you are getting NoClassDef exceptions then 
it means that you are having a jar conflict which you can resolve by putting 
the jar with the class in it on the top of your classpath.

Thanks
Best Regards

On Tue, Jun 9, 2015 at 9:05 AM, Dong Lei 
<dong...@microsoft.com<mailto:dong...@microsoft.com>> wrote:
Hi, spark-users:

I’m using spark-submit to submit multiple jars and files(all in HDFS) to run a 
job, with the following command:

Spark-submit
  --class myClass
 --master spark://localhost:7077/
  --deploy-mode cluster
  --jars hdfs://localhost/1.jar, hdfs://localhost/2.jar
  --files hdfs://localhost/1.txt, hdfs://localhost/2.txt
 hdfs://localhost/main.jar

the stderr in the driver showed java.lang.ClassNotDefException for a class in 
1.jar.

I checked the log that spark has added these jars:
     INFO SparkContext: Added JAR hdfs:// …1.jar
     INFO SparkContext: Added JAR hdfs:// …2.jar

In the folder of the driver, I only saw the main.jar is copied to that place, 
but  the other jars and files were not there

Could someone explain how should I pass the jars and files needed by the main 
jar to spark?

If my class in main.jar refer to these files with a relative path, will spark 
copy these files into one folder?

BTW, my class works in a client mode with all jars and files in local.

Thanks
Dong Lei


Reply via email to