Please take a look at the following methods:

>From HBaseAdmin:

  public List<HRegionInfo> getTableRegions(final TableName tableName)

>From HRegion:

  public static HDFSBlocksDistribution computeHDFSBlocksDistribution(final
Configuration conf,

      final HTableDescriptor tableDescriptor, final HRegionInfo regionInfo)
throws IOException {

FYI

On Wed, Jul 20, 2016 at 11:28 PM, Sachin Jain <sachinjain...@gmail.com>
wrote:

> *Context*
> I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
> Jobs into HBase which will be further available as lookups from HBase
> Table. BaseRelation extends HadoopFSRelation and is used to read and write
> to HBase. Spark Default Source API is used.
>
> *Use Case*
> Now, whenever I perform join operation, Spark creates a logical plan and
> decides which type of join it should execute and as per Spark Strategies
> [0] it checks the size of HBase Table. If it is less than some threshold
> (10 MB) it selects Broadcast Hash join otherwise Sort Merge join.
>
> *Problem Statement*
> I want to know if there is an API or some approach to calculate the size of
> an HBase table.
>
> [0]:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118
>
> Thanks
> -Sachin
>

Reply via email to