1. Endpoint is a kind of Coprocessor, it was added in 0.92. You can though it a little like Relational-Database’s storedProcedure. It's some logicals run on HBase server side. With it you may reduce your app's RPC calls, or as you said, reduce traffic . you can get some help on Coprocessor/Endpoint from here: https://blogs.apache.org/hbase/entry/coprocessor_introduction 2. I still a little confuse what exactly you want with this table struct (Srry for that but my mother-language is not English). You mean t1 is the original data of some ojects, then t2 keep something about the object in t1?(like logs, 10:11 em check t1obj1; 10:13 em buy t1obj1; 10:30 em tookaway t1obj1)? 3. You said 'This data is then sorted by the time part of the returned rowkeys to get the Top N of these.'. Well there may be no necessary to do the sort. HBase keeps data in dictionary-order. Then you just fetch N of them, they are already ordered. 4. I use HBase not long , infectly still a nood on it :) . I would be glad anything can help you.
Best Regards, NN 2012/6/5 Em <mailformailingli...@yahoo.de> > Hi, > > what do you mean by endpoint? > > It would look more like > > T2 { > rowkey: t1_id-(Long.MAX_VALUE - time) > { > family: qualifier = dummyDataSinceOnlyTheRowkeyMatters > } > } > > For every t1_id associated with a specific object, one gets the newest > entry in the T2-table (newest in relation to the key, not the internal > timestamp of creation). > This data is then sorted by the time part of the returned rowkeys to get > the Top N of these. > And then you get N records from t1 again. > > At last, that's what I thought about, though I am not sure that this is > the most efficient way. > > Kind regards, > Em > > Am 05.06.2012 04:33, schrieb NNever: > > Does the Schema like this: > > > > T2{ > > rowkey: rs-time row > > { > > family:qualifier = t1's row > > } > > } > > > > Then you Scan the newest 1000 from T2, and each get it's t1Row, then do > > 1000 Gets from T1 for one page? > > > > 2012/6/5 NNever <nnever...@gmail.com> > > > >> '- I'd like to do the top N stuff on the server side to reduce traffic, > >> will this be possible? ' > >> > >> Endpoint? > >> > >> > >> 2012/6/5 Em <mailformailingli...@yahoo.de> > >> > >>> Hello list, > >>> > >>> let's say I have to fetch a lot of rows for a page-request (say > >>> 1.000-2.000). > >>> The row-keys are a composition of a fixed id of an object and a > >>> sequential ever-increasing id. Salting those keys for balancing may be > >>> taken into consideration. > >>> > >>> I want to do a Join like this one expressed in SQL: > >>> > >>> SELECT t1.columns FROM t1 > >>> JOIN t2 ON (t1.id = t2.id) > >>> WHERE t2.id = fixedID-prefix > >>> > >>> I know that HBase does not support that out of the box. > >>> My approach is to have all the fixed-ids as columns of a row in t1. > >>> Selecting a row, I fetch those columns that are of interest for me, > >>> where each column contains a fixedID for t2. > >>> Now I do a scan on t2 for each fixedID which should return me exactly > >>> one value per fixedID (it's kind of a reverse-timestamp-approach like > in > >>> the HBase-book). > >>> Furthermore I am really only interested in the key itself. I don't care > >>> about the columns (t2 is more like an index). > >>> Having fetched a row per fixedID, I sort based on the sequential part > of > >>> their key and get the top N. > >>> For those top N I'll fetch data from t1. > >>> > >>> The usecase is to fetch the top N most recent entitys of t1 that are > >>> associated with a specific entity in t1 by using t2 as an index. > >>> T2 has one extra benefit over t1: You can do range-scans, if > neccessary. > >>> > >>> Questions: > >>> - since this is triggered by a page-request: Will this return with low > >>> latency? > >>> - is there a possibility to do those Scans in a batch? Maybe I can > >>> combine them into one big scanner, using a custom filter for what I > want? > >>> - do you have thoughts on improving this type of request? > >>> - I'd like to do the top N stuff on the server side to reduce traffic, > >>> will this be possible? > >>> - I am not sure whether a Scan is really what I want. Maybe a Multiget > >>> will fit my needs better combined with a RowFilter? > >>> > >>> > >>> I really work hard on finding the best approach of mapping this > >>> m:n-relation to a HBase schema - so any help is appreciated. > >>> > >>> Please note: I haven't written any line of HBase code so far. Currently > >>> I am studying books, blog-posts, slides and the mailinglists for > >>> learning more about HBase. > >>> > >>> Thanks! > >>> > >>> Kind regards, > >>> Em > >>> > >> > >> > > >