Re: does phoenix+hbase work for tables larger than a few GB?

Konstantinos Kougios Wed, 30 Sep 2015 15:29:55 -0700

Thanks for the reply and the useful information Anil.

I am aware of the difficulties of distributed joins and aggregations andthat phoenix is a layer on top of hbase. It would be great if it couldbe configured to run the queries, even if it takes a lot of time for thequeries to complete.

I got mainly 2 tables of 170GB and 550GB. Aggregation queries on bothfail and even make region servers crash (there is no info in the logsand still don't know why. My server proved to be rock stable so far onother things but you never know).

I am doing full table scans only because so far I was unable to createthe indexes. I tried async indexes too with the map reduce job to createthem but it runs extremely slowly.

In theory full table scans are possible with hbase, so even if it wasslow it shouldn't fail.

My setup is a 64GB AMD opteron server with 16 cores. 3 lxc virtualmachines as region servers with Xmx8G, each running on a 3TB 7200rpmdisk. So somehow I simulate 3x low spec servers with enough ram.

Next thing I will try is give region servers 16GB of RAM. WIth 8GB theyseem to have some memory pressure and I see some slow GC's in the logs.


Cheers




On 30/09/15 21:18, anil gupta wrote:

Hi Konstantinos,
Please find my reply inline.
On Wed, Sep 30, 2015 at 12:10 PM, Konstantinos Kougios<[email protected] <mailto:[email protected]>>wrote:
    Hi all,

    I had various issues with big tables while experimenting the
    couple last weeks.

    The thing that goes to my mind is that hbase (+phoenix) works only
    when there is a fairly powerful cluster and say 1/2 the data can
    fit into the combined servers memory and disks are fast (SSD?) as
    well. It doesn't seem to be able to work when tables are 2x as
    large as the memory allocated to region servers (frankly I think
    it is less)
Anil: Phoenix is just a SQL layer over HBase. From the query in yourprevious emails, it seems like you are doing full table scans withgroup by clauses. IMO, HBase is not a DB to be used for full tablescans. If 90% of your use cases are small range scan or gets thenHBase should work nicely with Terabytes of data. I have a 40 TB tablein prod on 60 node cluster where every RS only has 16GB of heap. Whatkind of workload you are trying to run with HBase?
    Things that constantly fail:

    - non-trivial queries on large tables (with group by, counts,
    joins) with region server out of memory errors or crashes without
    any reason for Xmx of 4G or 8G
Anil: Can you convert these queries into short range based scans? Ifyou are always going to do full table scan, then maybe you need to useMR or Spark for those computation and then tune cluster for full tablescans. Cluster tuning varies with full table scan workload.
    - index creation on the same big tables. Those always fail I think
    around the point when hbase has to flush it's memory regions to
    the disk and couldn't find a solution

    - spark jobs fail unless they are throttled to feed hbase with the
    data it can take . No backpressure?


    There were no replies to my emails regarding the issues, which
    makes me think there aren't solutions (or solutions are pretty
    hard to find and not many ppl know them).

    So after 21 tweaks to the default config, I am still not able to
    operate it as a normal database.
Anil: HBase is actually not a normal RDBMS DB. Its a **keyvaluestore**. Phoenix is providing a SQL layer using HBase API. So, userwill need to deal with pros/cons of a key/value store.
    Should I start believing my config is all wrong or that
    hbase+phoenix is only working if there is a sufficiently powerful
    cluster to handle the data?
Anil: **As per my experience**, HBase+Phoenix will work nicely if youare doing keyvalue lookups and short range scans.I would suggest you to evaluate data model of HBase tables and try toconvert queries to small range scan or lookups.
    I believe it is a great project and the functionality is really
    useful. What's lacking is 3 sample configs for 3 different
    strength clusters.
Anil: I agree that guidance on configuration of HBase and Phoenix canbe improved so that people can get going quickly.
    Thanks




--
Thanks & Regards,
Anil Gupta

Re: does phoenix+hbase work for tables larger than a few GB?

Reply via email to