The problem lies with getting the driver classes into the primordial class 
loader when running on YARN.

Basically I need to somehow set the SPARK_CLASSPATH or compute_classpath.sh 
when running on YARN. I’m not sure how to do this when YARN is handling all the 
file copy.

From: Nathan 
<nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>>
Date: Wednesday, 15 April 2015 11:49 pm
To: "Wang, Daoyuan" <daoyuan.w...@intel.com<mailto:daoyuan.w...@intel.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: RE: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0

Tried with 1.3.0 release (built myself) & the most recent 1.3.1 Snapshot off 
the 1.3 branch.

Haven't tried with 1.4/master.

________________________________
From: Wang, Daoyuan [daoyuan.w...@intel.com<mailto:daoyuan.w...@intel.com>]
Sent: Wednesday, April 15, 2015 5:22 PM
To: Nathan McCarthy; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0

Can you provide your spark version?

Thanks,
Daoyuan

From: Nathan McCarthy [mailto:nathan.mccar...@quantium.com.au]
Sent: Wednesday, April 15, 2015 1:57 PM
To: Nathan McCarthy; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0

Just an update, tried with the old JdbcRDD and that worked fine.

From: Nathan 
<nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>>
Date: Wednesday, 15 April 2015 1:57 pm
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0

Hi guys,

Trying to use a Spark SQL context’s .load(“jdbc", …) method to create a DF from 
a JDBC data source. All seems to work well locally (master = local[*]), however 
as soon as we try and run on YARN we have problems.

We seem to be running into problems with the class path and loading up the JDBC 
driver. I’m using the jTDS 1.3.1 driver, net.sourceforge.jtds.jdbc.Driver.

./bin/spark-shell --jars /tmp/jtds-1.3.1.jar --master yarn-client

When trying to run I get an exception;

scala> sqlContext.load("jdbc", Map("url" -> 
"jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "dbtable" -> 
"CUBE.DIM_SUPER_STORE_TBL”))

java.sql.SQLException: No suitable driver found for 
jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd

Thinking maybe we need to force load the driver, if I supply “driver” -> 
“net.sourceforge.jtds.jdbc.Driver” to .load we get;

scala> sqlContext.load("jdbc", Map("url" -> 
"jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "driver" -> 
"net.sourceforge.jtds.jdbc.Driver", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL”))

java.lang.ClassNotFoundException: net.sourceforge.jtds.jdbc.Driver
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:191)
        at 
org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:97)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:21)

Yet if I run a Class.forName() just from the shell;

scala> Class.forName("net.sourceforge.jtds.jdbc.Driver")
res1: Class[_] = class net.sourceforge.jtds.jdbc.Driver

No problem finding the JAR. I’ve tried in both the shell, and running with 
spark-submit (packing the driver in with my application as a fat JAR). Nothing 
seems to work.

I can also get a connection in the driver/shell no problem;

scala> import java.sql.DriverManager
import java.sql.DriverManager
scala> 
DriverManager.getConnection("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd")
res3: java.sql.Connection = 
net.sourceforge.jtds.jdbc.JtdsConnection@2a67ecd0<mailto:net.sourceforge.jtds.jdbc.JtdsConnection@2a67ecd0>

I’m probably missing some class path setting here. In 
jdbc.DefaultSource.createRelation it looks like the call to Class.forName 
doesn’t specify a class loader so it just uses the default Java behaviour to 
reflectively get the class loader. It almost feels like its using a different 
class loader.

I also tried seeing if the class path was there on all my executors by running;

import scala.collection.JavaConverters._
sc.parallelize(Seq(1,2,3,4)).flatMap(_ => 
java.sql.DriverManager.getDrivers().asScala.map(d => s”$d | 
${d.acceptsURL("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd")}")).collect().foreach(println)

This successfully returns;

15/04/15 01:07:37 INFO scheduler.DAGScheduler: Job 0 finished: collect at 
Main.scala:46, took 1.495597 s
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true

As a final test we tried with postgres driver and had the same problem. Any 
ideas?

Cheers,
Nathan

Reply via email to