Re: Missing content in phoenix after writing from Spark

Josh Elser Thu, 13 Sep 2018 07:39:02 -0700

Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure ifSpark has already moved beyond that.


On 9/12/18 11:00 PM, Saif Addin wrote:

Thanks, we'll try Spark Connector then. Thought it didn't support newestSpark Versions

On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.pos...@gmail.com<mailto:cloud.pos...@gmail.com>> wrote:


    It seems columns data missing mapping information of the schema. if
    you want to use this way to write HBase table,  you can create an
    HBase table and uses Phoenix mapping it.

    ----------------------------------------
        Jaanai Zhang
        Best regards!



    Thomas D'Silva <tdsi...@salesforce.com
    <mailto:tdsi...@salesforce.com>> 于2018年9月13日周四 上午6:03写道：

        Is there a reason you didn't use the spark-connector to
        serialize your data?

        On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1...@gmail.com
        <mailto:saif1...@gmail.com>> wrote:

            Thank you Josh! That was helpful. Indeed, there was a salt
            bucket on the table, and the key-column now shows correctly.

            However, the problem still persists in that the rest of the
            columns show as completely empty on Phoenix (appear
            correctly on Hbase). We'll be looking into this but if you
            have any further advice, appreciated.

            Saif

            On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
            <els...@apache.org <mailto:els...@apache.org>> wrote:

                Reminder: Using Phoenix internals forces you to
                understand exactly how
                the version of Phoenix that you're using serializes
                data. Is there a
                reason you're not using SQL to interact with Phoenix?

                Sounds to me that Phoenix is expecting more data at the
                head of your
                rowkey. Maybe a salt bucket that you've defined on the
                table but not
                created?

                On 9/12/18 4:32 PM, Saif Addin wrote:
                 > Hi all,
                 >
                 > We're trying to write tables with all string columns
                from spark.
                 > We are not using the Spark Connector, instead we are
                directly writing
                 > byte arrays from RDDs.
                 >
                 > The process works fine, and Hbase receives the data
                correctly, and
                 > content is consistent.
                 >
                 > However reading the table from Phoenix, we notice the
                first character of
                 > strings are missing. This sounds like it's a byte
                encoding issue, but
                 > we're at loss. We're using PVarchar to generate bytes.
                 >
                 > Here's the snippet of code creating the RDD:
                 >
                 > val tdd = pdd.flatMap(x => {
                 >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
                 >    for(i <- 0 until cols.length) yield {
                 >      other stuff for other columns ...
                 >      ...
                 >      (rowKey, (column1, column2, column3))
                 >    }
                 > })
                 >
                 > ...
                 >
                 > We then create the following output to be written
                down in Hbase
                 >
                 > val output = tdd.map(x => {
                 >      val rowKeyByte: Array[Byte] = x._1
                 >      val immutableRowKey = new
                ImmutableBytesWritable(rowKeyByte)
                 >
                 >      val kv = new KeyValue(rowKeyByte,
                 >          PVarchar.INSTANCE.toBytes(column1),
                 >          PVarchar.INSTANCE.toBytes(column2),
                 >        PVarchar.INSTANCE.toBytes(column3)
                 >      )
                 >      (immutableRowKey, kv)
                 > })
                 >
                 > By the way, we are using *KryoSerializer* in order to
                be able to
                 > serialize all classes necessary for Hbase (KeyValue,
                BytesWritable, etc).
                 >
                 > The key of this table is the one missing data when
                queried from Phoenix.
                 > So we guess something is wrong with the byte ser.
                 >
                 > Any ideas? Appreciated!
                 > Saif

Re: Missing content in phoenix after writing from Spark

Reply via email to