HBase + Spark join

Kristoffer Sjögren Mon, 13 Mar 2017 05:18:08 -0700

Hi

I want to join a Spark RDD with an HBase table. Im familiar with the
different connectors available but couldn't find this functionality.


The idea I have is to first sort the RDD according to a byte[] key [1]
and rdd.mapPartitions so that I each partition contains a unique and
sequentially sorted range of keys that lines up with the key order in
HBase.

I should mention that the RDD will always contain almost all the keys
that are stored in HBase, so full tables scans are fine.

Unfortunately, Spark cannot sort native Java byte[]. And i'm also not
sure if mapPartitions really maintain the total sort order of the
original RDD.

Any suggestions?

Cheers,
-Kristoffer

[1] Guava UnsignedBytes.lexicographicalComparator

HBase + Spark join

Reply via email to