Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if
Spark has already moved beyond that.
On 9/12/18 11:00 PM, Saif Addin wrote:
Thanks, we'll try Spark Connector then. Thought it didn't support newest
Spark Versions
On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.pos...@gmail.com
<mailto:cloud.pos...@gmail.com>> wrote:
It seems columns data missing mapping information of the schema. if
you want to use this way to write HBase table, you can create an
HBase table and uses Phoenix mapping it.
----------------------------------------
Jaanai Zhang
Best regards!
Thomas D'Silva <tdsi...@salesforce.com
<mailto:tdsi...@salesforce.com>> 于2018年9月13日周四 上午6:03写道:
Is there a reason you didn't use the spark-connector to
serialize your data?
On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1...@gmail.com
<mailto:saif1...@gmail.com>> wrote:
Thank you Josh! That was helpful. Indeed, there was a salt
bucket on the table, and the key-column now shows correctly.
However, the problem still persists in that the rest of the
columns show as completely empty on Phoenix (appear
correctly on Hbase). We'll be looking into this but if you
have any further advice, appreciated.
Saif
On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
<els...@apache.org <mailto:els...@apache.org>> wrote:
Reminder: Using Phoenix internals forces you to
understand exactly how
the version of Phoenix that you're using serializes
data. Is there a
reason you're not using SQL to interact with Phoenix?
Sounds to me that Phoenix is expecting more data at the
head of your
rowkey. Maybe a salt bucket that you've defined on the
table but not
created?
On 9/12/18 4:32 PM, Saif Addin wrote:
> Hi all,
>
> We're trying to write tables with all string columns
from spark.
> We are not using the Spark Connector, instead we are
directly writing
> byte arrays from RDDs.
>
> The process works fine, and Hbase receives the data
correctly, and
> content is consistent.
>
> However reading the table from Phoenix, we notice the
first character of
> strings are missing. This sounds like it's a byte
encoding issue, but
> we're at loss. We're using PVarchar to generate bytes.
>
> Here's the snippet of code creating the RDD:
>
> val tdd = pdd.flatMap(x => {
> val rowKey = PVarchar.INSTANCE.toBytes(x._1)
> for(i <- 0 until cols.length) yield {
> other stuff for other columns ...
> ...
> (rowKey, (column1, column2, column3))
> }
> })
>
> ...
>
> We then create the following output to be written
down in Hbase
>
> val output = tdd.map(x => {
> val rowKeyByte: Array[Byte] = x._1
> val immutableRowKey = new
ImmutableBytesWritable(rowKeyByte)
>
> val kv = new KeyValue(rowKeyByte,
> PVarchar.INSTANCE.toBytes(column1),
> PVarchar.INSTANCE.toBytes(column2),
> PVarchar.INSTANCE.toBytes(column3)
> )
> (immutableRowKey, kv)
> })
>
> By the way, we are using *KryoSerializer* in order to
be able to
> serialize all classes necessary for Hbase (KeyValue,
BytesWritable, etc).
>
> The key of this table is the one missing data when
queried from Phoenix.
> So we guess something is wrong with the byte ser.
>
> Any ideas? Appreciated!
> Saif