Re: Missing content in phoenix after writing from Spark

Saif Addin Fri, 14 Sep 2018 10:09:02 -0700

Hi, I am attempting to make connection with Spark but no success so far.

For writing into Phoenix, I am trying this:


tdd.toDF("ID", "COL1", "COL2",
"COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
"zookeper-host-url:2181").option("table",
htablename).mode("overwrite").save()

But getting:
*java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.*

For reading, on the other hand, attempting this:

val hbConf = HBaseConfiguration.create()
val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
hbConf.addResource(new Path(hbaseSitePath))

spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68", Array("ID"), conf
= hbConf)

Gets me
*java.lang.NoClassDefFoundError: Could not initialize class
org.apache.phoenix.query.QueryServicesOptions*

I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
phoenix-queryserver-client-5.0.0-HBase-2.0.jar

Any thoughts? I have an hbase-site.xml file with more configuration but not
sure how to get it to be read in the saving instance.

Thanks

On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <[email protected]> wrote:

> Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if
> Spark has already moved beyond that.
>
> On 9/12/18 11:00 PM, Saif Addin wrote:
> > Thanks, we'll try Spark Connector then. Thought it didn't support newest
> > Spark Versions
> >
> > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     It seems columns data missing mapping information of the schema. if
> >     you want to use this way to write HBase table,  you can create an
> >     HBase table and uses Phoenix mapping it.
> >
> >     ----------------------------------------
> >         Jaanai Zhang
> >         Best regards!
> >
> >
> >
> >     Thomas D'Silva <[email protected]
> >     <mailto:[email protected]>> 于2018年9月13日周四 上午6:03写道：
> >
> >         Is there a reason you didn't use the spark-connector to
> >         serialize your data?
> >
> >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <[email protected]
> >         <mailto:[email protected]>> wrote:
> >
> >             Thank you Josh! That was helpful. Indeed, there was a salt
> >             bucket on the table, and the key-column now shows correctly.
> >
> >             However, the problem still persists in that the rest of the
> >             columns show as completely empty on Phoenix (appear
> >             correctly on Hbase). We'll be looking into this but if you
> >             have any further advice, appreciated.
> >
> >             Saif
> >
> >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
> >             <[email protected] <mailto:[email protected]>> wrote:
> >
> >                 Reminder: Using Phoenix internals forces you to
> >                 understand exactly how
> >                 the version of Phoenix that you're using serializes
> >                 data. Is there a
> >                 reason you're not using SQL to interact with Phoenix?
> >
> >                 Sounds to me that Phoenix is expecting more data at the
> >                 head of your
> >                 rowkey. Maybe a salt bucket that you've defined on the
> >                 table but not
> >                 created?
> >
> >                 On 9/12/18 4:32 PM, Saif Addin wrote:
> >                  > Hi all,
> >                  >
> >                  > We're trying to write tables with all string columns
> >                 from spark.
> >                  > We are not using the Spark Connector, instead we are
> >                 directly writing
> >                  > byte arrays from RDDs.
> >                  >
> >                  > The process works fine, and Hbase receives the data
> >                 correctly, and
> >                  > content is consistent.
> >                  >
> >                  > However reading the table from Phoenix, we notice the
> >                 first character of
> >                  > strings are missing. This sounds like it's a byte
> >                 encoding issue, but
> >                  > we're at loss. We're using PVarchar to generate bytes.
> >                  >
> >                  > Here's the snippet of code creating the RDD:
> >                  >
> >                  > val tdd = pdd.flatMap(x => {
> >                  >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> >                  >    for(i <- 0 until cols.length) yield {
> >                  >      other stuff for other columns ...
> >                  >      ...
> >                  >      (rowKey, (column1, column2, column3))
> >                  >    }
> >                  > })
> >                  >
> >                  > ...
> >                  >
> >                  > We then create the following output to be written
> >                 down in Hbase
> >                  >
> >                  > val output = tdd.map(x => {
> >                  >      val rowKeyByte: Array[Byte] = x._1
> >                  >      val immutableRowKey = new
> >                 ImmutableBytesWritable(rowKeyByte)
> >                  >
> >                  >      val kv = new KeyValue(rowKeyByte,
> >                  >          PVarchar.INSTANCE.toBytes(column1),
> >                  >          PVarchar.INSTANCE.toBytes(column2),
> >                  >        PVarchar.INSTANCE.toBytes(column3)
> >                  >      )
> >                  >      (immutableRowKey, kv)
> >                  > })
> >                  >
> >                  > By the way, we are using *KryoSerializer* in order to
> >                 be able to
> >                  > serialize all classes necessary for Hbase (KeyValue,
> >                 BytesWritable, etc).
> >                  >
> >                  > The key of this table is the one missing data when
> >                 queried from Phoenix.
> >                  > So we guess something is wrong with the byte ser.
> >                  >
> >                  > Any ideas? Appreciated!
> >                  > Saif
> >
> >
>

Re: Missing content in phoenix after writing from Spark

Reply via email to