Hi Doug *80m records* were in *3 year* data set whose HFile(s) size is around 47 Gb. We have *75 year dataset also *(3,5,7,10,25,50,75 years) and that has a HFile(s) size of 2.7 Tb around for a single table.
We are now planning now to use a *Post Get Observer* that will write the get/read time to the Metric(say) Column Family->Last read column On Thu, Feb 27, 2014 at 2:56 AM, Doug Meil <doug.m...@explorysmedical.com>wrote: > > Hi there, > > On top of what Vladimir already saidŠ > > re: "Table1: 80 m records say Author, Table2 : 5k records say Category" > > Just 80 million records? Hbase tends to be overkill for relatively low > data volumes. > > But if you wish to proceed this path, to extend what was already said, > rather than thinking of it in terms of an RDBMS 2 table design, create a > pre-joined table that has data from both tables as the query target. > > > As for the LRU cache, ³premature optimization is the root of all evil². > :-) > > Best of luck! > > > On 2/24/14, 4:38 PM, "Vikram Singh Chandel" <vikramsinghchan...@gmail.com> > wrote: > > >Hi Vladimir > >We are planing to have around 40Gb for L1 and 150Gb for L2 and when this > >size is breached then we have start cleaning L1 and L2. > >now this cleaning (deletion of records) i needed that LRU info at record > >level, i.e. delete all records which are not been used past 15 days or > >later. > >We will save save this LRU info in a Metric column family. > > > >What we thought of using a Post Get Observer to write the value to Last > >Read column of Metric column family. > >this info we will later use for deletion of records. > > > >Is there any other simpler way. As you said block cache is at table level( > >if i am correct) but we info at record level > > > >Thanks > > > > > >On Tue, Feb 25, 2014 at 1:42 AM, Vladimir Rodionov > ><vrodio...@carrieriq.com>wrote: > > > >> I recommend you work a little bit more on design. > >> NoSQL in general and HBase in particular are not very good at joining > >> tables, but very good at point and range queries. > >> > >> Sure, you can do some optimizations in your current approach: create > >>CACHE > >> table as IN_MEMORY, set TTL for say 1day (or less, depends > >> on the data volume your are able to store ) and utilize HBase internal > >> block cache (which is LRU) for that table. > >> > >> Best regards, > >> Vladimir Rodionov > >> Principal Platform Engineer > >> Carrier IQ, www.carrieriq.com > >> e-mail: vrodio...@carrieriq.com > >> > >> ________________________________________ > >> From: Vikram Singh Chandel [vikramsinghchan...@gmail.com] > >> Sent: Monday, February 24, 2014 11:38 AM > >> To: user@hbase.apache.org > >> Subject: Re: How to get Last access time of a record > >> > >> Hi Vladimir, > >> > >> We are going to implement cache in HBase, let me give you a example > >> > >> We have two tables > >> Table1: 80 m records say Author > >> Table2 : 5k records say Category > >> query : Get details of all publications by Author XYZ broken down by > >> Category > >> > >> We fire a get on Table 1 to get a list of publications ids(hashed) > >> Then we do a scan on Table 2 to get list of publications for each > >>category > >> and then we do Intersection > >> of both list and in the end get the details from publication table. > >> > >> Now suppose same query comes again instead of doing all this computation > >> again we are going to save the intersected results > >> in a table we are calling L2 Cache (there's a L1 also) > >> > >> Hope you would have got idea of what we are trying to achieve. > >> Now if you can help please > >> > >> > >> > >> > >> > >> On Tue, Feb 25, 2014 at 12:20 AM, Vladimir Rodionov < > >> vrodio...@carrieriq.com > >> > wrote: > >> > >> > Interesting. You want to use HBase as a cache. What data are going to > >> > cache? Is it some kind of a cold storage > >> > on tapes or Blu-Ray disks? Just curious. > >> > > >> > Best regards, > >> > Vladimir Rodionov > >> > Principal Platform Engineer > >> > Carrier IQ, www.carrieriq.com > >> > e-mail: vrodio...@carrieriq.com > >> > > >> > ________________________________________ > >> > From: Vikram Singh Chandel [vikramsinghchan...@gmail.com] > >> > Sent: Monday, February 24, 2014 4:25 AM > >> > To: user@hbase.apache.org > >> > Subject: Re: How to get Last access time of a record > >> > > >> > Hi > >> > Hbase provides cache on non processed data, we are implementing a > >>second > >> > level of caching on processed data, > >> > for eg on intersected data between two tables, or on post processed > >>data. > >> > > >> > > >> > On Mon, Feb 24, 2014 at 5:02 PM, haosdent <haosd...@gmail.com> wrote: > >> > > >> > > HBase have already maintained a cache. > >> > > > >> > > >we can get last accessed time for a record > >> > > > >> > > I think you could get this from your application level. > >> > > > >> > > > >> > > On Mon, Feb 24, 2014 at 7:21 PM, Vikram Singh Chandel < > >> > > vikramsinghchan...@gmail.com> wrote: > >> > > > >> > > > Hi > >> > > > > >> > > > We are planning to implement caching mechanism for our Hbase data > >> model > >> > > for > >> > > > that we have to remove the *LRU (least recently used) records* > >>from > >> > the > >> > > > cached table. > >> > > > > >> > > > Is there any way by which we can get last accessed time for a > >>record, > >> > > > primarily the access will be > >> > > > using *Range Scan and Get * > >> > > > > >> > > > -- > >> > > > *Regards* > >> > > > > >> > > > *VIKRAM SINGH CHANDEL* > >> > > > > >> > > > Please do not print this email unless it is absolutely > >> > necessary,Reduce. > >> > > > Reuse. Recycle. Save our planet. > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > Best Regards, > >> > > Haosdent Huang > >> > > > >> > > >> > > >> > > >> > -- > >> > *Regards* > >> > > >> > *VIKRAM SINGH CHANDEL* > >> > > >> > Please do not print this email unless it is absolutely > >>necessary,Reduce. > >> > Reuse. Recycle. Save our planet. > >> > > >> > Confidentiality Notice: The information contained in this message, > >> > including any attachments hereto, may be confidential and is intended > >>to > >> be > >> > read only by the individual or entity to whom this message is > >>addressed. > >> If > >> > the reader of this message is not the intended recipient or an agent > >>or > >> > designee of the intended recipient, please note that any review, use, > >> > disclosure or distribution of this message or its attachments, in any > >> form, > >> > is strictly prohibited. If you have received this message in error, > >> please > >> > immediately notify the sender and/or notificati...@carrieriq.com and > >> > delete or destroy any copy of this message and its attachments. > >> > > >> > >> > >> > >> -- > >> *Regards* > >> > >> *VIKRAM SINGH CHANDEL* > >> > >> Please do not print this email unless it is absolutely necessary,Reduce. > >> Reuse. Recycle. Save our planet. > >> > >> Confidentiality Notice: The information contained in this message, > >> including any attachments hereto, may be confidential and is intended > >>to be > >> read only by the individual or entity to whom this message is > >>addressed. If > >> the reader of this message is not the intended recipient or an agent or > >> designee of the intended recipient, please note that any review, use, > >> disclosure or distribution of this message or its attachments, in any > >>form, > >> is strictly prohibited. If you have received this message in error, > >>please > >> immediately notify the sender and/or notificati...@carrieriq.com and > >> delete or destroy any copy of this message and its attachments. > >> > > > > > > > >-- > >*Regards* > > > >*VIKRAM SINGH CHANDEL* > > > >Please do not print this email unless it is absolutely necessary,Reduce. > >Reuse. Recycle. Save our planet. > > -- *Regards* *VIKRAM SINGH CHANDEL* Please do not print this email unless it is absolutely necessary,Reduce. Reuse. Recycle. Save our planet.