Can you provide your spark version? Thanks, Daoyuan
From: Nathan McCarthy [mailto:nathan.mccar...@quantium.com.au] Sent: Wednesday, April 15, 2015 1:57 PM To: Nathan McCarthy; user@spark.apache.org Subject: Re: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0 Just an update, tried with the old JdbcRDD and that worked fine. From: Nathan <nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>> Date: Wednesday, 15 April 2015 1:57 pm To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0 Hi guys, Trying to use a Spark SQL context's .load("jdbc", ...) method to create a DF from a JDBC data source. All seems to work well locally (master = local[*]), however as soon as we try and run on YARN we have problems. We seem to be running into problems with the class path and loading up the JDBC driver. I'm using the jTDS 1.3.1 driver, net.sourceforge.jtds.jdbc.Driver. ./bin/spark-shell --jars /tmp/jtds-1.3.1.jar --master yarn-client When trying to run I get an exception; scala> sqlContext.load("jdbc", Map("url" -> "jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL")) java.sql.SQLException: No suitable driver found for jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd Thinking maybe we need to force load the driver, if I supply "driver" -> "net.sourceforge.jtds.jdbc.Driver" to .load we get; scala> sqlContext.load("jdbc", Map("url" -> "jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "driver" -> "net.sourceforge.jtds.jdbc.Driver", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL")) java.lang.ClassNotFoundException: net.sourceforge.jtds.jdbc.Driver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:97) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:21) Yet if I run a Class.forName() just from the shell; scala> Class.forName("net.sourceforge.jtds.jdbc.Driver") res1: Class[_] = class net.sourceforge.jtds.jdbc.Driver No problem finding the JAR. I've tried in both the shell, and running with spark-submit (packing the driver in with my application as a fat JAR). Nothing seems to work. I can also get a connection in the driver/shell no problem; scala> import java.sql.DriverManager import java.sql.DriverManager scala> DriverManager.getConnection("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd") res3: java.sql.Connection = net.sourceforge.jtds.jdbc.JtdsConnection@2a67ecd0<mailto:net.sourceforge.jtds.jdbc.JtdsConnection@2a67ecd0> I'm probably missing some class path setting here. In jdbc.DefaultSource.createRelation it looks like the call to Class.forName doesn't specify a class loader so it just uses the default Java behaviour to reflectively get the class loader. It almost feels like its using a different class loader. I also tried seeing if the class path was there on all my executors by running; import scala.collection.JavaConverters._ sc.parallelize(Seq(1,2,3,4)).flatMap(_ => java.sql.DriverManager.getDrivers().asScala.map(d => s"$d | ${d.acceptsURL("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd")}")).collect().foreach(println) This successfully returns; 15/04/15 01:07:37 INFO scheduler.DAGScheduler: Job 0 finished: collect at Main.scala:46, took 1.495597 s org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true As a final test we tried with postgres driver and had the same problem. Any ideas? Cheers, Nathan