Hi,

We are evaluating the possibility of writing a custom connector for Phoenix
to access tables in stored in HBase. However, we need some help.

The connector for Presto should be able to read from HBase cluster using
parallel collections. For that the connector has a "ConnectorSplitManager"
which needs to be implemented. To quote from here
<https://prestodb.io/docs/current/develop/connectors.html>:
"
The split manager partitions the data for a table into the individual
chunks that Presto will distribute to workers for processing. For example,
the Hive connector lists the files for each Hive partition and creates one
or more split per file. For data sources that don’t have partitioned data,
a good strategy here is to simply return a single split for the entire
table. This is the strategy employed by the Example HTTP connector.
"

I want to know if there's a way to implement Split Manager so that the data
in HBase can be accessed by parallel connections. I was trying to follow
the code for Phoenix-Spark connector
<https://github.com/apache/phoenix/blob/master/phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRDD.scala>
to
see how it decides getPreferredLocations to create splits, but couldn't
understand.

Any hints or code directions will be helpful.

Regards,
Luqman

Reply via email to