Hi Jonathan,

Spark only needs the client JAR. It contains all the other Phoenix
dependencies as well.

I'm not sure exactly what the issue you're seeing is. I just downloaded and
extracted fresh copies of Spark 1.5.2 (pre-built with user-provided
Hadoop), and the latest Phoenix 4.6.0 binary release.

I copied the 'phoenix-4.6.0-HBase-1.1-client.jar' to /tmp and created a
'spark-defaults.conf' in the 'conf' folder of the Spark install with the
following:

spark.executor.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar
spark.driver.extraClassPath /tmp/phoenix-4.6.0-HBase-1.1-client.jar

I then launched the 'spark-shell', and was able to execute:

import org.apache.phoenix.spark._

>From there, you should be able to use the methods provided by the
phoenix-spark integration within the Spark shell.

Good luck,

Josh

On Tue, Dec 8, 2015 at 8:51 PM, Cox, Jonathan A <ja...@sandia.gov> wrote:

> I am trying to get Spark up and running with Phoenix, but the installation
> instructions are not clear to me, or there is something else wrong. I’m
> using Spark 1.5.2, HBase 1.1.2 and Phoenix 4.6.0 with a standalone install
> (no HDFS or cluster) with Debian Linux 8 (Jessie) x64. I’m also using Java
> 1.8.0_40.
>
>
>
> The instructions state:
>
> 1.       Ensure that all requisite Phoenix / HBase platform dependencies
> are available on the classpath for the Spark executors and drivers
>
> 2.       One method is to add the phoenix-4.4.0-client.jar to
> ‘SPARK_CLASSPATH’ in spark-env.sh, or setting both
> ‘spark.executor.extraClassPath’ and ‘spark.driver.extraClassPath’ in
> spark-defaults.conf
>
>
>
> *First off, what are “all requisite Phoenix / HBase platform
> dependencies”?* #2 suggests that all I need to do is add
>  ‘phoenix-4.6.0-HBase-1.1-client.jar’ to Spark’s class path. But what about
> ‘phoenix-spark-4.6.0-HBase-1.1.jar’ or ‘phoenix-core-4.6.0-HBase-1.1.jar’?
> Do either of these (or anything else) need to be added to Spark’s class
> path?
>
>
>
> Secondly, if I follow the instructions exactly, and add only
> ‘phoenix-4.6.0-HBase-1.1-client.jar’ to ‘spark-defaults.conf’:
>
> spark.executor.extraClassPath
> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>
> spark.driver.extraClassPath
> /usr/local/phoenix/phoenix-4.6.0-HBase-1.1-client.jar
>
> Then I get the following error when starting the interactive Spark shell
> with ‘spark-shell’:
>
> 15/12/08 18:38:05 WARN ObjectStore: Version information not found in
> metastore. hive.metastore.schema.verification is not enabled so recording
> the schema version 1.2.0
>
> 15/12/08 18:38:05 WARN ObjectStore: Failed to get database default,
> returning NoSuchObjectException
>
> 15/12/08 18:38:05 WARN Hive: Failed to access metastore. This class should
> not accessed in runtime.
>
> org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>
>                 at
> org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
>
> …
>
>
>
> <console>:10: error: not found: value sqlContext
>
>        import sqlContext.implicits._
>
>               ^
>
> <console>:10: error: not found: value sqlContext
>
>        import sqlContext.sql
>
>
>
> On the other hand, if I include all three of the aforementioned JARs, I
> get the same error. However, *if I include only the
> ‘phoenix-spark-4.6.0-HBase-1.1.jar’*, spark-shell seems so launch without
> error. Nevertheless, if I then try the simple tutorial commands in
> spark-shell, I get the following:
>
> *Spark output:* SQL context available as sqlContext.
>
>
>
> *scala >>* import org.apache.spark.SparkContext
>
> import org.apache.spark.sql.SQLContext
>
> import org.apache.phoenix.spark._
>
>
>
>                                 val sqlContext = new SQLContext(sc)
>
>
>
>                                 val df =
> sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1",
> "zkUrl" -> "phoenix-server:2181")
>
>
>
>                 *Spark error:*
>
>                                 *java.lang.NoClassDefFoundError:
> org/apache/hadoop/hbase/HBaseConfiguration*
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:71)
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:39)
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.phoenixConf(PhoenixRDD.scala:38)
>
>                 at
> org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:42)
>
>                 at
> org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
>
>                 at
> org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
>
>                 at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
>
>
>
> This final error seems similar to the one in mailing list post Phoenix-spark
> : NoClassDefFoundError: HBaseConfiguration
> <http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>
> <
> http://mail-archives.apache.org/mod_mbox/phoenix-user/201511.mbox/ajax/%3CCAKwwsRSEJHkotiF28kzumDZM6kgBVeTJNGUoJnZcLiuEGCTjHQ%40mail.gmail.com%3E>.
> But the question does not seem to have been answered satisfactory. Also
> note, if I include all three JARs, as he did, I get an error when launching
> spark-shell.
>
>
>
> *Can you please clarify what is the proper way to install and configure
> Phoenix with Spark?*
>
>
>
> Sincerely,
>
> Jonathan
>

Reply via email to