Hi

I want to join a Spark RDD with an HBase table. Im familiar with the
different connectors available but couldn't find this functionality.

The idea I have is to first sort the RDD according to a byte[] key [1]
and rdd.mapPartitions so that I each partition contains a unique and
sequentially sorted range of keys that lines up with the key order in
HBase.

I should mention that the RDD will always contain almost all the keys
that are stored in HBase, so full tables scans are fine.

Unfortunately, Spark cannot sort native Java byte[]. And i'm also not
sure if mapPartitions really maintain the total sort order of the
original RDD.

Any suggestions?

Cheers,
-Kristoffer

[1] Guava UnsignedBytes.lexicographicalComparator

Reply via email to