Hi, I am attempting to make connection with Spark but no success so far.
For writing into Phoenix, I am trying this:
tdd.toDF("ID", "COL1", "COL2",
"COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
"zookeper-host-url:2181").option("table",
htablename).mode("overwrite").save()
But getting:
*java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.*
For reading, on the other hand, attempting this:
val hbConf = HBaseConfiguration.create()
val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
hbConf.addResource(new Path(hbaseSitePath))
spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68", Array("ID"), conf
= hbConf)
Gets me
*java.lang.NoClassDefFoundError: Could not initialize class
org.apache.phoenix.query.QueryServicesOptions*
I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
phoenix-queryserver-client-5.0.0-HBase-2.0.jar
Any thoughts? I have an hbase-site.xml file with more configuration but not
sure how to get it to be read in the saving instance.
Thanks
On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <[email protected]> wrote:
> Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if
> Spark has already moved beyond that.
>
> On 9/12/18 11:00 PM, Saif Addin wrote:
> > Thanks, we'll try Spark Connector then. Thought it didn't support newest
> > Spark Versions
> >
> > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > It seems columns data missing mapping information of the schema. if
> > you want to use this way to write HBase table, you can create an
> > HBase table and uses Phoenix mapping it.
> >
> > ----------------------------------------
> > Jaanai Zhang
> > Best regards!
> >
> >
> >
> > Thomas D'Silva <[email protected]
> > <mailto:[email protected]>> 于2018年9月13日周四 上午6:03写道:
> >
> > Is there a reason you didn't use the spark-connector to
> > serialize your data?
> >
> > On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Thank you Josh! That was helpful. Indeed, there was a salt
> > bucket on the table, and the key-column now shows correctly.
> >
> > However, the problem still persists in that the rest of the
> > columns show as completely empty on Phoenix (appear
> > correctly on Hbase). We'll be looking into this but if you
> > have any further advice, appreciated.
> >
> > Saif
> >
> > On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> > Reminder: Using Phoenix internals forces you to
> > understand exactly how
> > the version of Phoenix that you're using serializes
> > data. Is there a
> > reason you're not using SQL to interact with Phoenix?
> >
> > Sounds to me that Phoenix is expecting more data at the
> > head of your
> > rowkey. Maybe a salt bucket that you've defined on the
> > table but not
> > created?
> >
> > On 9/12/18 4:32 PM, Saif Addin wrote:
> > > Hi all,
> > >
> > > We're trying to write tables with all string columns
> > from spark.
> > > We are not using the Spark Connector, instead we are
> > directly writing
> > > byte arrays from RDDs.
> > >
> > > The process works fine, and Hbase receives the data
> > correctly, and
> > > content is consistent.
> > >
> > > However reading the table from Phoenix, we notice the
> > first character of
> > > strings are missing. This sounds like it's a byte
> > encoding issue, but
> > > we're at loss. We're using PVarchar to generate bytes.
> > >
> > > Here's the snippet of code creating the RDD:
> > >
> > > val tdd = pdd.flatMap(x => {
> > > val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> > > for(i <- 0 until cols.length) yield {
> > > other stuff for other columns ...
> > > ...
> > > (rowKey, (column1, column2, column3))
> > > }
> > > })
> > >
> > > ...
> > >
> > > We then create the following output to be written
> > down in Hbase
> > >
> > > val output = tdd.map(x => {
> > > val rowKeyByte: Array[Byte] = x._1
> > > val immutableRowKey = new
> > ImmutableBytesWritable(rowKeyByte)
> > >
> > > val kv = new KeyValue(rowKeyByte,
> > > PVarchar.INSTANCE.toBytes(column1),
> > > PVarchar.INSTANCE.toBytes(column2),
> > > PVarchar.INSTANCE.toBytes(column3)
> > > )
> > > (immutableRowKey, kv)
> > > })
> > >
> > > By the way, we are using *KryoSerializer* in order to
> > be able to
> > > serialize all classes necessary for Hbase (KeyValue,
> > BytesWritable, etc).
> > >
> > > The key of this table is the one missing data when
> > queried from Phoenix.
> > > So we guess something is wrong with the byte ser.
> > >
> > > Any ideas? Appreciated!
> > > Saif
> >
> >
>