I've found a (rather perplexing) partial solution. If I leave the spark.driver.extraClassPath out completely, and instead do "spark-shell --jars /usr/hdp/current/phoenix-client/phoenix-client.jar", it seems to work perfectly! Note that the jar there is the phoenix-client.jar as shipped with HDP (is that a backport of a jar made in a later version of Phoenix)?
Note that I haven't been able to get the "--jars" method to work by specifying other jars, only that one. What's perplexing is that if I use the exact same jar in the spark.driver.extraClassPath config directive (only, no executorExtraClassPath), I get errors about the metastore (see below for the trace). This raises several questions: 1. What is the difference between "--jars" and "spark.driver.extraClassPath"? What does each one do? 2. How do I replicate the "--jars" switch in a config file? 3. Is the "--jars" solution stable, or will some minor tweak break it? Given that this config issue seems to be close to voodoo, I'm hesitant to commit to Phoenix, even though I can get it working, until I'm confident that we'll be able to continue doing so. 4. What is the metastore, and what are the errors its showing? Does that have to do with Phoenix? Does Phoenix somehow interfere with SLF4J? Here are the errors when I set the spark.driver.extraClassPath to /usr/hdp/current/phoenix-client/phoenix-client.jar. Note that we get them just running spark-shell, even before entering any code. [root@sandbox ~]# spark-shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.0.0-169/phoenix/phoenix-4.4.0.2.4.0.0-169-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 16/07/05 21:56:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable .... 16/07/05 21:57:18 INFO HiveMetaStore: Added admin role in metastore 16/07/05 21:57:18 INFO HiveMetaStore: Added public role in metastore 16/07/05 21:57:18 WARN Hive: Failed to access metastore. This class should not accessed in runtime. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) ... ... 73 more ... <console>:16: error: not found: value sqlContext import sqlContext.implicits._ ^ <console>:16: error: not found: value sqlContext import sqlContext.sql On 7/5/16, Josh Mahonin <jmaho...@gmail.com> wrote: > Hi Robert, > > I recommend following up with HDP on this issue. > > The underlying problem is that the 'phoenix-spark-4.4.0.2.4.0.0-169.jar' > they've provided isn't actually a fat client JAR, it's missing many of the > required dependencies. They might be able to provide the correct JAR for > you, but you'd have to check with them. It may also be possible for you to > manually include all of the necessary JARs on the Spark classpath to mimic > the fat jar, but that's fairly ugly and time consuming. > > FWIW, the HDP 2.5 Tech Preview seems to include the correct JAR, though I > haven't personally tested it out yet. > > Good luck, > > Josh > > On Tue, Jul 5, 2016 at 2:00 AM, Robert James <srobertja...@gmail.com> > wrote: > >> I'm trying to use Phoenix on Spark, and can't get around this error: >> >> java.lang.NoClassDefFoundError: >> org/apache/hadoop/hbase/HBaseConfiguration >> at >> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82) >> >> DETAILS: >> 1. I'm running HDP 2.4.0.0-169 >> 2. Using phoenix-sqlline, I can access Phoenix perfectly >> 3. Using hbase shell, I can access HBase perfectly >> 4. I added the following lines to /etc/spark/conf/spark-defaults.conf >> >> spark.driver.extraClassPath >> /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar >> spark.executor.extraClassPath >> /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar >> >> 5. Steps to reproduce the error: >> # spark-shell >> ... >> scala> import org.apache.phoenix.spark._ >> import org.apache.phoenix.spark._ >> >> scala> sqlContext.load("org.apache.phoenix.spark", Map("table" -> >> "EMAIL_ENRON", "zkUrl" -> "localhost:2181")) >> warning: there were 1 deprecation warning(s); re-run with -deprecation >> for details >> java.lang.NoClassDefFoundError: >> org/apache/hadoop/hbase/HBaseConfiguration >> at >> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82) >> >> // Or, this gets the same error >> scala> val rdd = sc.phoenixTableAsRDD("EMAIL_ENRON", Seq("MAIL_FROM", >> "MAIL_TO"), zkUrl=Some("localhost")) >> java.lang.NoClassDefFoundError: >> org/apache/hadoop/hbase/HBaseConfiguration >> at >> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82) >> at >> org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:38) >> >> 6. I've tried every permutation I can think of, and also spent hours >> Googling. Some times I can get different errors, but always errors. >> Interestingly, if I manage to load the HBaseConfiguration class >> manually (by specifying classpaths and then import), I get a >> "phoenixTableAsRDD is not a member of SparkContext" error. >> >> How can I use Phoenix from within Spark? I'm really eager to do so, >> but haven't been able to. >> >> Also: Can someone give me some background on the underlying issues >> here? Trial-and-error-plus-google is not exactly high quality >> engineering; I'd like to understand the problem better. >> >