Hi Randy, z.load() supposed to make dependencies available to all driver and executors.
However, it might not work correctly in yarn-client mode. Are you using yarn-client mode? Best, moon On Mon, Aug 24, 2015 at 9:12 AM Randy Gelhausen <rgel...@gmail.com> wrote: > Any ideas? > > Is z.load supposed to make dependencies available to all Spark JVMs > (driver AND executors)? > > Thanks, > -Randy > > On Sun, Aug 23, 2015 at 2:41 PM, Randy Gelhausen <rgel...@gmail.com> > wrote: > >> It seems Spark executors are not being provided with the requisite >> dependencies. With spark-shell I can pass --jars /path/to/dep.jar. How can >> we achieve this with Zeppelin, preferable inside a Note? >> >> %spark.dep >> z.addRepo("hortonworks").url(" >> http://repo.hortonworks.com/content/repositories/releases/") >> z.load("org.apache.phoenix:phoenix-spark:4.4.0.2.3.0.0-2557") >> z.load("org.apache.phoenix:phoenix-core:4.4.0.2.3.0.0-2557") >> z.load("com.databricks:spark-csv_2.10:1.2.0") >> >> %spark >> import org.apache.spark.sql._ >> import org.apache.phoenix.spark._ >> import java.sql.Connection >> import java.sql.DriverManager >> >> val input = "/user/root/crimes/atlanta" >> val zkUrl = "docker.dev:2181:/hbase-unsecure" >> val table = "CRIMES" >> >> // Read CSV file, clean field names >> var df = >> sqlContext.read.format("com.databricks.spark.csv").option("header", >> "true").option("DROPMALFORMED", "true").load(input) >> val columns = df.columns.map(x => x.toUpperCase.replaceAll(" ", "_")) >> df = df.toDF(columns:_*) >> >> df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> >> table, "zkUrl" -> zkUrl)) >> >> Results: >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 1.0 (TID 5, docker.dev): java.lang.RuntimeException: java.sql.SQLException: >> No suitable driver found for jdbc:phoenix:docker.dev:2181:/hbase-unsecure; >> at >> org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1030) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at >> org.apache.spark.scheduler.Task.run(Task.scala:70) at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) Caused by: java.sql.SQLException: >> No suitable driver found for jdbc:phoenix:docker.dev:2181:/hbase-unsecure; >> at java.sql.DriverManager.getConnection(DriverManager.java:689) at >> java.sql.DriverManager.getConnection(DriverManager.java:208) at >> org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92) >> at >> org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80) >> at >> org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68) >> at >> org.apache.phoenix.mapreduce.PhoenixRecordWriter.<init>(PhoenixRecordWriter.java:49) >> at >> org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:55) >> ... 8 more Driver stacktrace: at >> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >> at scala.Option.foreach(Option.scala:236) at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> > >