Looking into this on the HDP side. Please feel free to reach out via HDP channels instead of Apache channels.

Thanks for letting us know as well.

Josh Mahonin wrote:
Hi Robert,

I recommend following up with HDP on this issue.

The underlying problem is that the 'phoenix-spark-4.4.0.2.4.0.0-169.jar'
they've provided isn't actually a fat client JAR, it's missing many of
the required dependencies. They might be able to provide the correct JAR
for you, but you'd have to check with them. It may also be possible for
you to manually include all of the necessary JARs on the Spark classpath
to mimic the fat jar, but that's fairly ugly and time consuming.

FWIW, the HDP 2.5 Tech Preview seems to include the correct JAR, though
I haven't personally tested it out yet.

Good luck,

Josh

On Tue, Jul 5, 2016 at 2:00 AM, Robert James <srobertja...@gmail.com
<mailto:srobertja...@gmail.com>> wrote:

    I'm trying to use Phoenix on Spark, and can't get around this error:

    java.lang.NoClassDefFoundError:
    org/apache/hadoop/hbase/HBaseConfiguration
             at
    
org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)

    DETAILS:
    1. I'm running HDP 2.4.0.0-169
    2. Using phoenix-sqlline, I can access Phoenix perfectly
    3. Using hbase shell, I can access HBase perfectly
    4. I added the following lines to /etc/spark/conf/spark-defaults.conf

    spark.driver.extraClassPath
    /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar
    spark.executor.extraClassPath
    /usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar

    5. Steps to reproduce the error:
    # spark-shell
    ...
    scala> import org.apache.phoenix.spark._
    import org.apache.phoenix.spark._

    scala> sqlContext.load("org.apache.phoenix.spark", Map("table" ->
    "EMAIL_ENRON", "zkUrl" -> "localhost:2181"))
    warning: there were 1 deprecation warning(s); re-run with -deprecation
    for details
    java.lang.NoClassDefFoundError:
    org/apache/hadoop/hbase/HBaseConfiguration
             at
    
org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)

    // Or, this gets the same error
    scala> val rdd = sc.phoenixTableAsRDD("EMAIL_ENRON", Seq("MAIL_FROM",
    "MAIL_TO"), zkUrl=Some("localhost"))
    java.lang.NoClassDefFoundError:
    org/apache/hadoop/hbase/HBaseConfiguration
             at
    
org.apache.phoenix.spark.PhoenixRDD.getPhoenixConfiguration(PhoenixRDD.scala:82)
             at
    
org.apache.phoenix.spark.PhoenixRDD.phoenixConf$lzycompute(PhoenixRDD.scala:38)

    6. I've tried every permutation I can think of, and also spent hours
    Googling.  Some times I can get different errors, but always errors.
    Interestingly, if I manage to load the HBaseConfiguration class
    manually (by specifying classpaths and then import), I get a
    "phoenixTableAsRDD is not a member of SparkContext" error.

    How can I use Phoenix from within Spark?  I'm really eager to do so,
    but haven't been able to.

    Also: Can someone give me some background on the underlying issues
    here? Trial-and-error-plus-google is not exactly high quality
    engineering; I'd like to understand the problem better.


Reply via email to