Re: does phoenix+hbase work for tables larger than a few GB?

anil gupta Wed, 30 Sep 2015 15:51:49 -0700

On a side note: Did you enable short circuit reads? Did you try snappy
compression on your tables?(IMO, 7200rpm disk is on the slower side so try
compressing data on disk). There are some data encoding scheme in HBase.
Have a look at those too.


On Wed, Sep 30, 2015 at 3:45 PM, anil gupta <anilgupt...@gmail.com> wrote:

> Please find my reply inline.
>
> On Wed, Sep 30, 2015 at 3:29 PM, Konstantinos Kougios <
> kostas.koug...@googlemail.com> wrote:
>
>> Thanks for the reply and the useful information Anil.
>>
>> I am aware of the difficulties of distributed joins and aggregations and
>> that phoenix is a layer on top of hbase. It would be great if it could be
>> configured to run the queries, even if it takes a lot of time for the
>> queries to complete.
>>
> Anil: I think, it is doable. But, this might require little bit of hit &
> trial with HBase and Phoenix conf. I would start with increasing HBase and
> Phoenix timeouts.
>
>>
>> I got mainly 2 tables of 170GB and 550GB. Aggregation queries on both
>> fail and even make region servers crash (there is no info in the logs and
>> still don't know why. My server proved to be rock stable so far on other
>> things but you never know).
>>
> Anil: RS should not crash. Are you doing heavy writes along with full
> table scans at same time? In one of your email, i saw stack trace regarding
> Region  split and compactions?
>
>>
>> I am doing full table scans only because so far I was unable to create
>> the indexes. I tried async indexes too with the map reduce job to create
>> them but it runs extremely slowly.
>>
> Anil: This doesnt not sounds good. I haven't use those yet. So, i wont be
> able to help debug the problem. Hopefully, someone else will be able to
> chime in.
>
>>
>> In theory full table scans are possible with hbase, so even if it was
>> slow it shouldn't fail.
>>
> Anil: IMO, if you are doing full table scans, then maybe you should turn
> off blockCache for those queries. Basically, there is a lot of cache churn
> due to full table scans. Cache churn will lead to JVM GC's.
>
>>
>> My setup is a 64GB AMD opteron server with 16 cores. 3 lxc virtual
>> machines as region servers with Xmx8G, each running on a 3TB 7200rpm disk.
>> So somehow I simulate 3x low spec servers with enough ram.
>>
>> Next thing I will try is give region servers 16GB of RAM. WIth 8GB they
>> seem to have some memory pressure and I see some slow GC's in the logs.
>>
> Anil: 16GB ram should help in some cases. Try to disable blockcache for
> full table scans.
>
>>
>> Cheers
>>
>>
>>
>>
>>
>> On 30/09/15 21:18, anil gupta wrote:
>>
>> Hi Konstantinos,
>> Please find my reply inline.
>>
>> On Wed, Sep 30, 2015 at 12:10 PM, Konstantinos Kougios <
>> <kostas.koug...@googlemail.com>kostas.koug...@googlemail.com> wrote:
>>
>>> Hi all,
>>>
>>> I had various issues with big tables while experimenting the couple last
>>> weeks.
>>>
>>> The thing that goes to my mind is that hbase (+phoenix) works only when
>>> there is a fairly powerful cluster and say 1/2 the data can fit into the
>>> combined servers memory and disks are fast (SSD?) as well. It doesn't seem
>>> to be able to work when tables are 2x as large as the memory allocated to
>>> region servers (frankly I think it is less)
>>>
>> Anil: Phoenix is just a SQL layer over HBase. From the query in your
>> previous emails, it seems like you are doing full table scans with group by
>> clauses. IMO, HBase is not a DB to be used for full table scans. If 90% of
>> your use cases are small range scan or gets then HBase should work nicely
>> with Terabytes of data. I have a 40 TB table in prod on 60 node cluster
>> where every RS only has 16GB of heap. What kind of workload you are trying
>> to run with HBase?
>>
>>
>>>
>>> Things that constantly fail:
>>>
>>> - non-trivial queries on large tables (with group by, counts, joins)
>>> with region server out of memory errors or crashes without any reason for
>>> Xmx of 4G or 8G
>>>
>> Anil: Can you convert these queries into short range based scans? If you
>> are always going to do full table scan, then maybe you need to use MR or
>> Spark for those computation and then tune cluster for full table scans.
>> Cluster tuning varies with full table scan workload.
>>
>>> - index creation on the same big tables. Those always fail I think
>>> around the point when hbase has to flush it's memory regions to the disk
>>> and couldn't find a solution
>>>
>> - spark jobs fail unless they are throttled to feed hbase with the data
>>> it can take . No backpressure?
>>>
>>
>>> There were no replies to my emails regarding the issues, which makes me
>>> think there aren't solutions (or solutions are pretty hard to find and not
>>> many ppl know them).
>>>
>>> So after 21 tweaks to the default config, I am still not able to operate
>>> it as a normal database.
>>>
>> Anil: HBase is actually not a normal RDBMS DB. Its a **keyvalue store**.
>> Phoenix is providing a SQL layer using HBase API. So, user will need to
>> deal with pros/cons of a key/value store.
>>
>>>
>>> Should I start believing my config is all wrong or that hbase+phoenix is
>>> only working if there is a sufficiently powerful cluster to handle the data?
>>>
>> Anil: **As per my experience**, HBase+Phoenix will work nicely if you are
>> doing keyvalue lookups and short range scans.
>> I would suggest you to evaluate data model of HBase tables and try to
>> convert queries to small range scan or lookups.
>>
>>>
>>> I believe it is a great project and the functionality is really useful.
>>> What's lacking is 3 sample configs for 3 different strength clusters.
>>>
>> Anil: I agree that guidance on configuration of HBase and Phoenix can be
>> improved so that people can get going quickly.
>>
>>>
>>> Thanks
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Re: does phoenix+hbase work for tables larger than a few GB?

Reply via email to