Hi Doug

*80m records* were in *3 year* data set whose HFile(s) size is around 47 Gb.
We have *75 year dataset also *(3,5,7,10,25,50,75 years) and that has a
HFile(s) size of 2.7 Tb around for a single table.

We are now planning now to use a *Post Get Observer*
that will write the get/read time to the Metric(say) Column Family->Last
read column


On Thu, Feb 27, 2014 at 2:56 AM, Doug Meil <doug.m...@explorysmedical.com>wrote:

>
> Hi there,
>
> On top of what Vladimir already saidŠ
>
> re:  "Table1: 80 m records say Author, Table2 : 5k records say Category"
>
> Just 80 million records?  Hbase tends to be overkill for relatively low
> data volumes.
>
> But if you wish to proceed this path, to extend what was already said,
> rather than thinking of it in terms of an RDBMS 2 table design, create a
> pre-joined table that has data from both tables as the query target.
>
>
> As for the LRU cache, ³premature optimization is the root of all evil².
> :-)
>
> Best of luck!
>
>
> On 2/24/14, 4:38 PM, "Vikram Singh Chandel" <vikramsinghchan...@gmail.com>
> wrote:
>
> >Hi Vladimir
> >We are planing to have around 40Gb for L1 and 150Gb for L2 and when this
> >size is breached then we have start cleaning L1 and L2.
> >now this cleaning (deletion of records) i needed that LRU info at record
> >level, i.e. delete all records which are not been used past 15 days or
> >later.
> >We will save save this LRU info in a Metric column family.
> >
> >What we thought of using a Post Get Observer to write the value to Last
> >Read column of Metric column family.
> >this info we will later use for deletion of records.
> >
> >Is there any other simpler way. As you said block cache is at table level(
> >if i am correct) but we info at record level
> >
> >Thanks
> >
> >
> >On Tue, Feb 25, 2014 at 1:42 AM, Vladimir Rodionov
> ><vrodio...@carrieriq.com>wrote:
> >
> >> I recommend you work a little bit more on design.
> >> NoSQL in general and HBase in particular are not very good at joining
> >> tables, but very good at point and range queries.
> >>
> >> Sure, you can do some optimizations in your current approach: create
> >>CACHE
> >> table as IN_MEMORY, set TTL for say 1day (or less, depends
> >> on the data volume your are able to store ) and utilize HBase internal
> >> block cache (which is LRU) for that table.
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: vrodio...@carrieriq.com
> >>
> >> ________________________________________
> >> From: Vikram Singh Chandel [vikramsinghchan...@gmail.com]
> >> Sent: Monday, February 24, 2014 11:38 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: How to get Last access time of a record
> >>
> >> Hi Vladimir,
> >>
> >> We are going to implement cache in HBase, let me give you a example
> >>
> >> We have two tables
> >> Table1: 80 m records say Author
> >> Table2 : 5k records say Category
> >> query : Get details of all publications by Author XYZ broken down by
> >> Category
> >>
> >> We fire a get on Table 1 to get a list of publications ids(hashed)
> >> Then we do a scan on  Table 2 to get list of publications for each
> >>category
> >> and then we do Intersection
> >> of both list and in the end get the details from publication table.
> >>
> >> Now suppose same query comes again instead of doing all this computation
> >> again we are going to save the intersected results
> >> in a table we are calling L2 Cache (there's a L1 also)
> >>
> >> Hope you would have got idea of what we are trying to achieve.
> >> Now if you can help please
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Feb 25, 2014 at 12:20 AM, Vladimir Rodionov <
> >> vrodio...@carrieriq.com
> >> > wrote:
> >>
> >> > Interesting. You want to use HBase as a cache. What data are going to
> >> > cache? Is it some kind of a cold storage
> >> > on tapes or Blu-Ray disks? Just curious.
> >> >
> >> > Best regards,
> >> > Vladimir Rodionov
> >> > Principal Platform Engineer
> >> > Carrier IQ, www.carrieriq.com
> >> > e-mail: vrodio...@carrieriq.com
> >> >
> >> > ________________________________________
> >> > From: Vikram Singh Chandel [vikramsinghchan...@gmail.com]
> >> > Sent: Monday, February 24, 2014 4:25 AM
> >> > To: user@hbase.apache.org
> >> > Subject: Re: How to get Last access time of a record
> >> >
> >> > Hi
> >> > Hbase provides cache on non processed data, we are implementing a
> >>second
> >> > level of caching on processed data,
> >> > for eg on intersected data between two tables, or on post processed
> >>data.
> >> >
> >> >
> >> > On Mon, Feb 24, 2014 at 5:02 PM, haosdent <haosd...@gmail.com> wrote:
> >> >
> >> > > HBase have already maintained a cache.
> >> > >
> >> > > >we can get last accessed time for a record
> >> > >
> >> > > I think you could get this from your application level.
> >> > >
> >> > >
> >> > > On Mon, Feb 24, 2014 at 7:21 PM, Vikram Singh Chandel <
> >> > > vikramsinghchan...@gmail.com> wrote:
> >> > >
> >> > > > Hi
> >> > > >
> >> > > > We are planning to implement caching mechanism for our Hbase data
> >> model
> >> > > for
> >> > > > that we have to remove the *LRU (least recently used)  records*
> >>from
> >> > the
> >> > > > cached table.
> >> > > >
> >> > > > Is there any way by which we can get last accessed time for a
> >>record,
> >> > > > primarily the access will be
> >> > > > using *Range Scan and Get *
> >> > > >
> >> > > > --
> >> > > > *Regards*
> >> > > >
> >> > > > *VIKRAM SINGH CHANDEL*
> >> > > >
> >> > > > Please do not print this email unless it is absolutely
> >> > necessary,Reduce.
> >> > > > Reuse. Recycle. Save our planet.
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Best Regards,
> >> > > Haosdent Huang
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > *Regards*
> >> >
> >> > *VIKRAM SINGH CHANDEL*
> >> >
> >> > Please do not print this email unless it is absolutely
> >>necessary,Reduce.
> >> > Reuse. Recycle. Save our planet.
> >> >
> >> > Confidentiality Notice:  The information contained in this message,
> >> > including any attachments hereto, may be confidential and is intended
> >>to
> >> be
> >> > read only by the individual or entity to whom this message is
> >>addressed.
> >> If
> >> > the reader of this message is not the intended recipient or an agent
> >>or
> >> > designee of the intended recipient, please note that any review, use,
> >> > disclosure or distribution of this message or its attachments, in any
> >> form,
> >> > is strictly prohibited.  If you have received this message in error,
> >> please
> >> > immediately notify the sender and/or notificati...@carrieriq.com and
> >> > delete or destroy any copy of this message and its attachments.
> >> >
> >>
> >>
> >>
> >> --
> >> *Regards*
> >>
> >> *VIKRAM SINGH CHANDEL*
> >>
> >> Please do not print this email unless it is absolutely necessary,Reduce.
> >> Reuse. Recycle. Save our planet.
> >>
> >> Confidentiality Notice:  The information contained in this message,
> >> including any attachments hereto, may be confidential and is intended
> >>to be
> >> read only by the individual or entity to whom this message is
> >>addressed. If
> >> the reader of this message is not the intended recipient or an agent or
> >> designee of the intended recipient, please note that any review, use,
> >> disclosure or distribution of this message or its attachments, in any
> >>form,
> >> is strictly prohibited.  If you have received this message in error,
> >>please
> >> immediately notify the sender and/or notificati...@carrieriq.com and
> >> delete or destroy any copy of this message and its attachments.
> >>
> >
> >
> >
> >--
> >*Regards*
> >
> >*VIKRAM SINGH CHANDEL*
> >
> >Please do not print this email unless it is absolutely necessary,Reduce.
> >Reuse. Recycle. Save our planet.
>
>


-- 
*Regards*

*VIKRAM SINGH CHANDEL*

Please do not print this email unless it is absolutely necessary,Reduce.
Reuse. Recycle. Save our planet.

Reply via email to