Re: Hbase and Phoenix Performance improvement

Anil Gupta Wed, 01 Jul 2015 20:23:24 -0700

Hi Nishant,

 Refer to HBase wiki for multiple column families. As per my experience, don't 
try to have more than 2-3 column family. Also group the column in column 
families on basis of access pattern. 
If you don't have an access where you can avoid reading a column family then 
you would not gain any performance. So, evaluate your access patterns before 
you create multiple column families.


Sent from my iPhone

> On Jul 1, 2015, at 6:33 PM, Nishant Patel <nishant.k.pa...@gmail.com> wrote:
> 
> Thanks Puneet and James for your responses.
> 
> Date is not recommended as first part of rowkey. It will create issue during 
> write operation. In real production scenario we will have more data and will 
> have more values for column1 and column2. 
> 
> Will try other things today. Lets see how much I can achieve today.
> 
> Regards,
> Nishant
> 
>> On Wed, Jul 1, 2015 at 9:52 PM, James Taylor <jamestay...@apache.org> wrote:
>> Also, try separating your columns into multiple column families to prevent 
>> having to scan past your 75+ column qualifiers for every query.
>> 
>>> On Wed, Jul 1, 2015 at 4:47 AM, Puneet Kumar Ojha 
>>> <puneet.ku...@pubmatic.com> wrote:
>>> Yes …Salting will improve the scan performance. Try with numbers 5,10,20 . 
>>> As I do not know about the cluster details.
>>> 
>>>  
>>> 
>>> Increase scanner caching to 100000.
>>> 
>>>  
>>> 
>>> Check if SNAPPY is working …I hope you need to put the jars classpath as 
>>> well.
>>> 
>>>  
>>> 
>>> Since the cardinality of the col1 and col2 fields is very small use date as 
>>> first column. Also put date as integer.
>>> 
>>>  
>>> 
>>> Try modifying the memory settings related to heap in hbase site.xml.
>>> 
>>>  
>>> 
>>> Try naming the Column Qualifiers as single alphabets. They consume space 
>>> and takes more time to scan.
>>> 
>>>  
>>> 
>>> Thanks
>>> 
>>> Puneet.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Nishant Patel [mailto:nishant.k.pa...@gmail.com] 
>>> Sent: Wednesday, July 01, 2015 4:33 PM
>>> To: user@phoenix.apache.org
>>> Subject: Re: Hbase and Phoenix Performance improvement
>>> 
>>>  
>>> 
>>> HI Puneet/Martin,
>>> 
>>> Thanks for your response. Please see my answer as below.
>>> 
>>> I have not specified any salt bucket. I have created Phoenix View on 
>>> existing Hbase Table. Can I specify Salt bucket for Phoenix View?
>>> 
>>> After loading Hbase data I alter table to use SNAPPY Compression. Are you 
>>> talking about any other compression?
>>> 
>>> I have set hbase.client.scanner.caching to 500. I tried with 1000 also but 
>>> did not see any performance improvement.
>>> 
>>> I am not using with production system. I have inserted data once and not 
>>> deleting so there should not be problem. There is no load on Hbase servers 
>>> as I am just reading data right now.
>>> 
>>> Sample query is as below.
>>> 
>>> Select column5,count(1) ttr from table where column1='column1' and 
>>> column2='column2' and date>='20150504' and date<='20150704' group by 
>>> column5.
>>> 
>>> I am doing scan based on where condition. Column1, column2 and date is part 
>>> of my rowkey so it should not perform complete table scan. My rowkey design 
>>> is as below
>>> 
>>> column1|column2|date|unique_identifier
>>> 
>>> Regards,
>>> 
>>> Nishant
>>> 
>>>  
>>> 
>>> On Wed, Jul 1, 2015 at 2:07 PM, Martin Pernollet <mpernol...@octo.com> 
>>> wrote:
>>> 
>>> It sounds like you are scanning rather than getting rows based on a known 
>>> row id. Am I wrong?
>>> 
>>> One thing I am currently trying is to have indexed columns and "hot" 
>>> content in one column family and let "cold" content in another family. It 
>>> speed up scanning the table when you need to
>>> 
>>>  
>>> 
>>> Le mer. 1 juil. 2015 à 06:56, Nishant Patel <nishant.k.pa...@gmail.com> a 
>>> écrit :
>>> 
>>> Hi,
>>> 
>>> I am trying to measure performance for Hbase and Phoenix.
>>> 
>>> I have generated 1000 records per day with combination of Column1 and 
>>> Column2.
>>> 
>>> I have created 5 different combination for column1 and column2 and created 
>>> data for 365 days. Total records I have generated 5 * 5 * 365 * 1000 = 
>>> 9125000
>>> 
>>> I am writing 75+ qualifiers in one Column Family for each record.
>>> 
>>>  
>>> 
>>> Rowkey Design is as below : column1|column2|date(yyyyMMdd)|unique 
>>> identifier. I have used one byte character as rowkey separator. I have 
>>> create view in Phoenix on top of Hbase table.
>>> 
>>> My all queries contain column1 , column2 and date as filter condition.
>>> 
>>> If date range is less than 1 month I get response in less than 1 second. if 
>>> date range is 3/6/12 months then response comes in seconds. Sometime it 
>>> takes 25+ seconds for 12 months range.
>>> 
>>> My question is, is it possible to get response in phoenix in less than 1 
>>> second for amount of data I have specified. If yes what kind of tuning need 
>>> to be done? As of now I have not done any changes at Hbase and Phoenix 
>>> except proper rowkey design.
>>> 
>>> I am trying to verify whether phoenix will suit our requirement or not.
>>> 
>>> --
>>> 
>>> Thanks,
>>> Nishant
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Regards,
>>> Nishant Patel
>>> 
> 
> 
> 
> -- 
> Regards,
> Nishant Patel
>

Re: Hbase and Phoenix Performance improvement

Reply via email to