I agree with Vladamir wrt hotspotting - take a look here at how to salt your table: http://phoenix.incubator.apache.org/salted.html
In general, we find salted tables perform better for both reads and writes. James On Monday, April 14, 2014, Vladimir Rodionov <[email protected]> wrote: > You rowkey schema is not efficient, it will result in region hot spotting > (unless you salt your table). > DATA_BLOCK_ENCODING is beneficial for both in memory and on disk > storage. Compression is must as well. > It is hard to say how block encoding + compression will affect query > performance in your case, but common sense > tell us that the more data you keep in memory block cache - the better > overall performance you have. Block encoding > in your case is a MUST, as for compression - ymmv but usually snappy, > lzo or lz4 improves performance as well. > Block cache in 0.94-8 does not support compression but supports block > encoding. > > -Vladimir Rodionov > > > > On Mon, Apr 14, 2014 at 6:04 PM, James Taylor > <[email protected]<javascript:_e(%7B%7D,'cvml','[email protected]');> > > wrote: > >> Hi, >> Take a look at Doug Meil's HBase blog here: >> http://blogs.apache.org/hbase/ as I think that's pretty relevant for >> Phoenix as well. Also, Mujtaba may be able to provide you with some good >> guidance. >> Thanks, >> James >> >> >> On Tue, Apr 8, 2014 at 2:24 PM, universal localhost < >> [email protected]<javascript:_e(%7B%7D,'cvml','[email protected]');> >> > wrote: >> >>> Hey All, >>> >>> Can someone please suggest on the optimizations in Phoenix or Hbase that >>> I can benefit from >>> for the case where *Rowkeys are much larger as compared to the column >>> values*. >>> >>> In my case, Rowkeys have timestamp. >>> >>> RowKey schema: *DATELOGGED, ORGNAME,* INSTANCEID, TXID >>> Column TXID is a sequence number. >>> >>> I have read a little about DATA_BLOCK_ENCODING and learned that it can >>> benefit the in-cache key compression. >>> >>> I am hoping that by using this compression I can get away with large >>> rowkeys... >>> Any suggestions on how it will affect the query performance? >>> >>> --Uni >>> >> >> >
