Ok didnt know if the sheer number of gets would be a limiting factor. Thanks
On Tue, Apr 29, 2014 at 7:57 PM, Ted Yu <[email protected]> wrote: > As I said this afternoon: > See the following API in HTable for batching Get's : > > public Result[] get(List<Get> gets) throws IOException { > > Cheers > > > On Tue, Apr 29, 2014 at 7:45 PM, Software Dev > <[email protected]>wrote: > >> Nothing against your code. I just meant that if we are doing a scan >> say for hourly metrics across a 6 month period we are talking about >> 4K+ gets. Is that something that can easily be handled? >> >> On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB) <[email protected]> >> wrote: >> >> Gets a bit hairy when doing say a shitload of gets thought.. no? >> > >> > If you by "hairy" you mean the code is ugly, it was written for maximal >> clarity. >> > I think you'll find a few sensible loops makes it fairly clean. >> > Otherwise I'm not sure what you mean. >> > >> > -----Original Message----- >> > From: Software Dev [mailto:[email protected]] >> > Sent: Tuesday, April 29, 2014 5:02 PM >> > To: [email protected] >> > Subject: Re: Help with row and column design >> > >> >> Yes. See total_usa vs. total_female_usa above. Basically you have to >> pre-store every level of aggregation you care about. >> > >> > Ok I think this makes sense. Gets a bit hairy when doing say a shitload >> of gets thought.. no? >> > >> > On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) <[email protected]> >> wrote: >> >> You don't do a scan, you do a series of gets, which I believe you can >> batch into one call. >> >> >> >> last 5 days query in pseudocode >> >> res1 = Get( hash("2014-04-29") + "2014-04-29") >> >> res2 = Get( hash("2014-04-28") + "2014-04-28") >> >> res3 = Get( hash("2014-04-27") + "2014-04-27") >> >> res4 = Get( hash("2014-04-26") + "2014-04-26") >> >> res5 = Get( hash("2014-04-25") + "2014-04-25") >> >> >> >> For each result you look for the particular column or columns you are >> >> interested in Total_usa = res1.get("c:usa") + res2.get("c:usa") + >> res3.get("c:usa") + ... >> >> Total_female_usa = res1.get("c:usa:sex:f") + ... >> >> >> >> "What happens when we add more fields? Do we just keep adding in more >> column qualifiers? If so, how would we filter across columns to get an >> aggregate total?" >> >> >> >> Yes. See total_usa vs. total_female_usa above. Basically you have to >> pre-store every level of aggregation you care about. >> >> >> >> -----Original Message----- >> >> From: Software Dev [mailto:[email protected]] >> >> Sent: Tuesday, April 29, 2014 4:36 PM >> >> To: [email protected] >> >> Subject: Re: Help with row and column design >> >> >> >>> The downside is it still has a hotspot when inserting, but when >> >>> reading a range of time it does not >> >> >> >> How can you do a scan query between dates when you hash the date? >> >> >> >>> Column qualifiers are just the collection of items you are >> >>> aggregating on. Values are increments. In your case qualifiers might >> >>> look like c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m, >> >>> c:italy:sex:f, c:italy, >> >> >> >> What happens when we add more fields? Do we just keep adding in more >> column qualifiers? If so, how would we filter across columns to get an >> aggregate total? >>
