Hi Nishant, Refer to HBase wiki for multiple column families. As per my experience, don't try to have more than 2-3 column family. Also group the column in column families on basis of access pattern. If you don't have an access where you can avoid reading a column family then you would not gain any performance. So, evaluate your access patterns before you create multiple column families.
Sent from my iPhone > On Jul 1, 2015, at 6:33 PM, Nishant Patel <nishant.k.pa...@gmail.com> wrote: > > Thanks Puneet and James for your responses. > > Date is not recommended as first part of rowkey. It will create issue during > write operation. In real production scenario we will have more data and will > have more values for column1 and column2. > > Will try other things today. Lets see how much I can achieve today. > > Regards, > Nishant > >> On Wed, Jul 1, 2015 at 9:52 PM, James Taylor <jamestay...@apache.org> wrote: >> Also, try separating your columns into multiple column families to prevent >> having to scan past your 75+ column qualifiers for every query. >> >>> On Wed, Jul 1, 2015 at 4:47 AM, Puneet Kumar Ojha >>> <puneet.ku...@pubmatic.com> wrote: >>> Yes …Salting will improve the scan performance. Try with numbers 5,10,20 . >>> As I do not know about the cluster details. >>> >>> >>> >>> Increase scanner caching to 100000. >>> >>> >>> >>> Check if SNAPPY is working …I hope you need to put the jars classpath as >>> well. >>> >>> >>> >>> Since the cardinality of the col1 and col2 fields is very small use date as >>> first column. Also put date as integer. >>> >>> >>> >>> Try modifying the memory settings related to heap in hbase site.xml. >>> >>> >>> >>> Try naming the Column Qualifiers as single alphabets. They consume space >>> and takes more time to scan. >>> >>> >>> >>> Thanks >>> >>> Puneet. >>> >>> >>> >>> >>> >>> From: Nishant Patel [mailto:nishant.k.pa...@gmail.com] >>> Sent: Wednesday, July 01, 2015 4:33 PM >>> To: user@phoenix.apache.org >>> Subject: Re: Hbase and Phoenix Performance improvement >>> >>> >>> >>> HI Puneet/Martin, >>> >>> Thanks for your response. Please see my answer as below. >>> >>> I have not specified any salt bucket. I have created Phoenix View on >>> existing Hbase Table. Can I specify Salt bucket for Phoenix View? >>> >>> After loading Hbase data I alter table to use SNAPPY Compression. Are you >>> talking about any other compression? >>> >>> I have set hbase.client.scanner.caching to 500. I tried with 1000 also but >>> did not see any performance improvement. >>> >>> I am not using with production system. I have inserted data once and not >>> deleting so there should not be problem. There is no load on Hbase servers >>> as I am just reading data right now. >>> >>> Sample query is as below. >>> >>> Select column5,count(1) ttr from table where column1='column1' and >>> column2='column2' and date>='20150504' and date<='20150704' group by >>> column5. >>> >>> I am doing scan based on where condition. Column1, column2 and date is part >>> of my rowkey so it should not perform complete table scan. My rowkey design >>> is as below >>> >>> column1|column2|date|unique_identifier >>> >>> Regards, >>> >>> Nishant >>> >>> >>> >>> On Wed, Jul 1, 2015 at 2:07 PM, Martin Pernollet <mpernol...@octo.com> >>> wrote: >>> >>> It sounds like you are scanning rather than getting rows based on a known >>> row id. Am I wrong? >>> >>> One thing I am currently trying is to have indexed columns and "hot" >>> content in one column family and let "cold" content in another family. It >>> speed up scanning the table when you need to >>> >>> >>> >>> Le mer. 1 juil. 2015 à 06:56, Nishant Patel <nishant.k.pa...@gmail.com> a >>> écrit : >>> >>> Hi, >>> >>> I am trying to measure performance for Hbase and Phoenix. >>> >>> I have generated 1000 records per day with combination of Column1 and >>> Column2. >>> >>> I have created 5 different combination for column1 and column2 and created >>> data for 365 days. Total records I have generated 5 * 5 * 365 * 1000 = >>> 9125000 >>> >>> I am writing 75+ qualifiers in one Column Family for each record. >>> >>> >>> >>> Rowkey Design is as below : column1|column2|date(yyyyMMdd)|unique >>> identifier. I have used one byte character as rowkey separator. I have >>> create view in Phoenix on top of Hbase table. >>> >>> My all queries contain column1 , column2 and date as filter condition. >>> >>> If date range is less than 1 month I get response in less than 1 second. if >>> date range is 3/6/12 months then response comes in seconds. Sometime it >>> takes 25+ seconds for 12 months range. >>> >>> My question is, is it possible to get response in phoenix in less than 1 >>> second for amount of data I have specified. If yes what kind of tuning need >>> to be done? As of now I have not done any changes at Hbase and Phoenix >>> except proper rowkey design. >>> >>> I am trying to verify whether phoenix will suit our requirement or not. >>> >>> -- >>> >>> Thanks, >>> Nishant >>> >>> >>> >>> >>> -- >>> >>> Regards, >>> Nishant Patel >>> > > > > -- > Regards, > Nishant Patel >