Re: Joins Benchmark

anil gupta Tue, 02 Jun 2015 21:07:42 -0700

Hi Siva/Jaime,

In my opinion:
HBase is meant for quick key/value lookup or short range based scans and
Hive is meant for Analytical/Datawarehouse kind of workload. Full table
scan in HBase is not what HBase is known/popular for. Doing joins is not
really a sweet spot for HBase if you are doing full table scans.
If you are doing full table scan in HBase then you can also try running a
MapReduce job over HBase snapshot. Or You could just use Hive OLAP type
workload.


Thanks,
Anil Gupta

On Tue, Jun 2, 2015 at 4:43 PM, Siva <[email protected]> wrote:

> Hi Jaime,
>
> When we ran queries with complex joins (which involves ~10 tables) on
> Phoenix on the tables which has large data, initially we have seen a lot of
> issues, queries failed with errors. We started to tune both hbase and
> phoenix, now few queries are running fine, but queries with larger data set
> still have same issues. Still working on tuning them. The reason for
> failures could be because of small cluster, limited by memory and IO.
>
> On the other hand, same quires with same data size on Hive 14 (with Tez +
> ORC format + SNAPPY compression) were finished with in 70~100 seconds. It
> would be good if Phoenix can publish the performance results on join
> queries.
>
> Thanks,
> Siva.
>
> On Tue, Jun 2, 2015 at 1:47 PM, Jaime Solano <[email protected]> wrote:
>
>> Hi guys,
>>
>> Are there benchmarks or numbers showing how Phoenix performs during the
>> join of two or more huge tables? I'm not familiar with the join
>> implementation, so I'm not sure if there's a limitation regarding number of
>> regions, memory, disk, etc.
>>
>> Any thoughts?
>>
>> Thanks,
>> -Jaime
>>
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Joins Benchmark

Reply via email to