Hi Jonathan, Spark only needs the client JAR. It contains all the other Phoenix dependencies as well.
I'm not sure exactly what the issue you're seeing is. I just downloaded and extracted fresh copies of Spark 1.5.2 (pre-built with user-provided Hadoop), and the latest Phoenix 4.6.0 binary release. I copied the 'phoenix-4.6.0-HBase-1.1-client.jar' to /tmp and created a 'spark-defaults.conf' in the 'conf' folder of the Spark install with the following: spark.executor.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar spark.driver.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar I then launched the 'spark-shell', and was able to execute: import org.apache.phoenix.spark._ >From there, you should be able to use the methods provided by the phoenix-spark integration within the Spark shell. Good luck, Josh On Tue, Dec 8, 2015 at 8:51 PM, Cox, Jonathan A <ja...@sandia.gov> wrote: > I am trying to get Spark up and running with Phoenix, but the installation > instructions are not clear to me, or there is something else wrong. I’m > using Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a standalone install > (no HDFS or cluster) with Debian Linux 8 (Jessie) x64. I’m also using Java > 1.8.0_40. > > > > The instructions state: > > 1. Ensure that all requisite Phoenix / HBase platform dependencies > are available on the classpath for the Spark executors and drivers > > 2. One method is to add the phoenix-4.4.0-client.jar to > ‘SPARK_CLASSPATH’ in spark-env.sh, or setting both > ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ in > spark-defaults.conf > > > > *First off, what are “all requisite Phoenix / HBase platform > dependencies”?* #2 suggests that all I need to do is add > ‘phoenix-4.6.0-HBase-1.1-client.jar’ to Spark’s class path. But what about > ‘phoenix-spark-4.6.0-HBase-1.1.jar’ or ‘phoenix-core-4.6.0-HBase-1.1.jar’? > Do either of these (or anything else) need to be added to Spark’s class > path? > > > > Secondly, if I follow the instructions exactly, and add only > ‘phoenix-4.6.0-HBase-1.1-client.jar’ to ‘spark-defaults.conf’: > > spark.executor.extraClassPath > /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar > > spark.driver.extraClassPath > /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar > > Then I get the following error when starting the interactive Spark shell > with ‘spark-shell’: > > 15/12/08 18:38:05 WARN ObjectStore: Version information not found in > metastore. hive.metastore.schema.verification is not enabled so recording > the schema version 1.2.0 > > 15/12/08 18:38:05 WARN ObjectStore: Failed to get database default, > returning NoSuchObjectException > > 15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class should > not accessed in runtime. > > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) > > … > > > > <console>:10: error: not found: value sqlContext > > import sqlContext.implicits._ > > ^ > > <console>:10: error: not found: value sqlContext > > import sqlContext.sql > > > > On the other hand, if I include all three of the aforementioned JARs, I > get the same error. However, *if I include only the > ‘phoenix-spark-4.6.0-HBase-1.1.jar’*, spark-shell seems so launch without > error. Nevertheless, if I then try the simple tutorial commands in > spark-shell, I get the following: > > *Spark output:* SQL context available as sqlContext. > > > > *scala >>* import org.apache.spark.SparkContext > > import org.apache.spark.sql.SQLContext > > import org.apache.phoenix.spark._ > > > > val sqlContext = new SQLContext(sc) > > > > val df = > sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1", > "zkUrl" -> "phoenix-server:2181") > > > > *Spark error:* > > *java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/HBaseConfiguration* > > at > org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71) > > at > org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39) > > at > org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38) > > at > org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42) > > at > org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50) > > at > org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37) > > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) > > > > This final error seems similar to the one in mailing list post Phoenix-spark > : NoClassDefFoundError: HBaseConfiguration > <http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E> > < > http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>. > But the question does not seem to have been answered satisfactory. Also > note, if I include all three JARs, as he did, I get an error when launching > spark-shell. > > > > *Can you please clarify what is the proper way to install and configure > Phoenix with Spark?* > > > > Sincerely, > > Jonathan >