Re: Missing content in phoenix after writing from Spark

Saif Addin Mon, 17 Sep 2018 12:28:29 -0700

Thanks for the patience, sorry maybe I sent incomplete information. We are
loading the following jars and still getting: *executor 1):
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.phoenix.query.QueryServicesOptions*


http://central.maven.org/maven2/org/apache/hbase/hbase-client/2.1.0/hbase-client-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-common/2.1.0/hbase-common-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop-compat/2.1.0/hbase-hadoop-compat-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-mapreduce/2.1.0/hbase-mapreduce-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.1.0/hbase-shaded-miscellaneous-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-protocol/2.1.0/hbase-protocol-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-protocol-shaded/2.1.0/hbase-protocol-shaded-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.1.0/hbase-shaded-protobuf-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/2.1.0/hbase-shaded-netty-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-server/2.1.0/hbase-server-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop2-compat/2.1.0/hbase-hadoop2-compat-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-metrics/2.1.0/hbase-metrics-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-metrics-api/2.1.0/hbase-metrics-api-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-zookeeper/2.1.0/hbase-zookeeper-2.1.0.jar

http://central.maven.org/maven2/org/apache/phoenix/phoenix-spark/5.0.0-HBase-2.0/phoenix-spark-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-core/5.0.0-HBase-2.0/phoenix-core-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver/5.0.0-HBase-2.0/phoenix-queryserver-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver-client/5.0.0-HBase-2.0/phoenix-queryserver-client-5.0.0-HBase-2.0.jar

http://central.maven.org/maven2/org/apache/twill/twill-zookeeper/0.13.0/twill-zookeeper-0.13.0.jar
http://central.maven.org/maven2/org/apache/twill/twill-discovery-core/0.13.0/twill-discovery-core-0.13.0.jar

Not sure which one I could be missing??

On Fri, Sep 14, 2018 at 7:34 PM Josh Elser <els...@apache.org> wrote:

> Uh, you're definitely not using the right JARs :)
>
> You'll want the phoenix-client.jar for the Phoenix JDBC driver and the
> phoenix-spark.jar for the Phoenix RDD.
>
> On 9/14/18 1:08 PM, Saif Addin wrote:
> > Hi, I am attempting to make connection with Spark but no success so far.
> >
> > For writing into Phoenix, I am trying this:
> >
> > tdd.toDF("ID", "COL1", "COL2",
> > "COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
> > "zookeper-host-url:2181").option("table",
> > htablename).mode("overwrite").save()
> >
> > But getting:
> > *java.sql.SQLException: ERROR 103 (08004): Unable to establish
> connection.*
> > *
> > *
> > For reading, on the other hand, attempting this:
> >
> > val hbConf = HBaseConfiguration.create()
> > val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
> > hbConf.addResource(new Path(hbaseSitePath))
> >
> > spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68", Array("ID"),
> > conf = hbConf)
> >
> > Gets me
> > *java.lang.NoClassDefFoundError: Could not initialize class
> > org.apache.phoenix.query.QueryServicesOptions*
> > *
> > *
> > I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
> > phoenix-queryserver-client-5.0.0-HBase-2.0.jar
> > Any thoughts? I have an hbase-site.xml file with more configuration but
> > not sure how to get it to be read in the saving instance.
> > Thanks
> >
> > On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <els...@apache.org
> > <mailto:els...@apache.org>> wrote:
> >
> >     Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if
> >     Spark has already moved beyond that.
> >
> >     On 9/12/18 11:00 PM, Saif Addin wrote:
> >      > Thanks, we'll try Spark Connector then. Thought it didn't support
> >     newest
> >      > Spark Versions
> >      >
> >      > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang
> >     <cloud.pos...@gmail.com <mailto:cloud.pos...@gmail.com>
> >      > <mailto:cloud.pos...@gmail.com <mailto:cloud.pos...@gmail.com>>>
> >     wrote:
> >      >
> >      >     It seems columns data missing mapping information of the
> >     schema. if
> >      >     you want to use this way to write HBase table,  you can
> create an
> >      >     HBase table and uses Phoenix mapping it.
> >      >
> >      >     ----------------------------------------
> >      >         Jaanai Zhang
> >      >         Best regards!
> >      >
> >      >
> >      >
> >      >     Thomas D'Silva <tdsi...@salesforce.com
> >     <mailto:tdsi...@salesforce.com>
> >      >     <mailto:tdsi...@salesforce.com
> >     <mailto:tdsi...@salesforce.com>>> 于2018年9月13日周四 上午6:03写道：
> >      >
> >      >         Is there a reason you didn't use the spark-connector to
> >      >         serialize your data?
> >      >
> >      >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin
> >     <saif1...@gmail.com <mailto:saif1...@gmail.com>
> >      >         <mailto:saif1...@gmail.com <mailto:saif1...@gmail.com>>>
> >     wrote:
> >      >
> >      >             Thank you Josh! That was helpful. Indeed, there was a
> >     salt
> >      >             bucket on the table, and the key-column now shows
> >     correctly.
> >      >
> >      >             However, the problem still persists in that the rest
> >     of the
> >      >             columns show as completely empty on Phoenix (appear
> >      >             correctly on Hbase). We'll be looking into this but
> >     if you
> >      >             have any further advice, appreciated.
> >      >
> >      >             Saif
> >      >
> >      >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
> >      >             <els...@apache.org <mailto:els...@apache.org>
> >     <mailto:els...@apache.org <mailto:els...@apache.org>>> wrote:
> >      >
> >      >                 Reminder: Using Phoenix internals forces you to
> >      >                 understand exactly how
> >      >                 the version of Phoenix that you're using
> serializes
> >      >                 data. Is there a
> >      >                 reason you're not using SQL to interact with
> Phoenix?
> >      >
> >      >                 Sounds to me that Phoenix is expecting more data
> >     at the
> >      >                 head of your
> >      >                 rowkey. Maybe a salt bucket that you've defined
> >     on the
> >      >                 table but not
> >      >                 created?
> >      >
> >      >                 On 9/12/18 4:32 PM, Saif Addin wrote:
> >      >                  > Hi all,
> >      >                  >
> >      >                  > We're trying to write tables with all string
> >     columns
> >      >                 from spark.
> >      >                  > We are not using the Spark Connector, instead
> >     we are
> >      >                 directly writing
> >      >                  > byte arrays from RDDs.
> >      >                  >
> >      >                  > The process works fine, and Hbase receives the
> >     data
> >      >                 correctly, and
> >      >                  > content is consistent.
> >      >                  >
> >      >                  > However reading the table from Phoenix, we
> >     notice the
> >      >                 first character of
> >      >                  > strings are missing. This sounds like it's a
> byte
> >      >                 encoding issue, but
> >      >                  > we're at loss. We're using PVarchar to
> >     generate bytes.
> >      >                  >
> >      >                  > Here's the snippet of code creating the RDD:
> >      >                  >
> >      >                  > val tdd = pdd.flatMap(x => {
> >      >                  >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> >      >                  >    for(i <- 0 until cols.length) yield {
> >      >                  >      other stuff for other columns ...
> >      >                  >      ...
> >      >                  >      (rowKey, (column1, column2, column3))
> >      >                  >    }
> >      >                  > })
> >      >                  >
> >      >                  > ...
> >      >                  >
> >      >                  > We then create the following output to be
> written
> >      >                 down in Hbase
> >      >                  >
> >      >                  > val output = tdd.map(x => {
> >      >                  >      val rowKeyByte: Array[Byte] = x._1
> >      >                  >      val immutableRowKey = new
> >      >                 ImmutableBytesWritable(rowKeyByte)
> >      >                  >
> >      >                  >      val kv = new KeyValue(rowKeyByte,
> >      >                  >          PVarchar.INSTANCE.toBytes(column1),
> >      >                  >          PVarchar.INSTANCE.toBytes(column2),
> >      >                  >        PVarchar.INSTANCE.toBytes(column3)
> >      >                  >      )
> >      >                  >      (immutableRowKey, kv)
> >      >                  > })
> >      >                  >
> >      >                  > By the way, we are using *KryoSerializer* in
> >     order to
> >      >                 be able to
> >      >                  > serialize all classes necessary for Hbase
> >     (KeyValue,
> >      >                 BytesWritable, etc).
> >      >                  >
> >      >                  > The key of this table is the one missing data
> when
> >      >                 queried from Phoenix.
> >      >                  > So we guess something is wrong with the byte
> ser.
> >      >                  >
> >      >                  > Any ideas? Appreciated!
> >      >                  > Saif
> >      >
> >      >
> >
>

Re: Missing content in phoenix after writing from Spark

Reply via email to