Re: HBase design schema

2011-04-04 Thread tsuna
On Mon, Apr 4, 2011 at 3:30 PM, Ted Dunning wrote: > OpenTSDB does an interesting thing where they put a primary key in front of > the date.  This limits some of the hot-spotting on inserts.  Each different > kind of query goes to a different machine as well.  The query balancing > won't be as goo

Re: HBase design schema

2011-04-04 Thread Ted Dunning
:miguel-co...@telecom.pt] > Sent: Monday, April 04, 2011 9:12 AM > To: user@hbase.apache.org > Subject: HBase design schema > > Hi, > > I need some help to a schema design on HBase. > > I have 5 dimensions (Time,Site,Referrer Keyword,Country). > My row key is Site+Time. > &

RE: HBase design schema

2011-04-04 Thread Miguel Costa
: segunda-feira, 4 de Abril de 2011 19:24 To: user@hbase.apache.org Subject: RE: HBase design schema I've done almost the same thing at my work. Since I'm running on a VERY small number of servers (2), I pre-aggregate my data into tables in the format... [-MM-DD]|[Keyword]|[Referrer] for t

RE: HBase design schema

2011-04-04 Thread Peter Haidinyak
' this will return all of the referrers for the keyword hospital for the date of 2011-03-05. YMMV -Pete From: Miguel Costa [mailto:miguel-co...@telecom.pt] Sent: Monday, April 04, 2011 9:12 AM To: user@hbase.apache.org Subject: HBase design schema Hi, I need some help to a schema design

Re: HBase design schema

2011-04-04 Thread Ted Dunning
Take a look at OpenTSDB. I think you will be impressed with the speed. Regarding the exponential explosion. Yes. That is a risk in theory. But what happens in practice is that you only create the alternative forms of the file where the simpler key forms are unacceptable due to volume of data.

RE: HBase design schema

2011-04-04 Thread Miguel Costa
: Miguel Costa Subject: Re: HBase design schema Miguel, One option is to use the simplest design and use the key you have. Scanning for a particular period of time will give you all the data in that time period which you can reduce in any way that you like. If that becomes too ineffic

Re: HBase design schema

2011-04-04 Thread Ted Dunning
Miguel, One option is to use the simplest design and use the key you have. Scanning for a particular period of time will give you all the data in that time period which you can reduce in any way that you like. If that becomes too inefficient, a common trick is to build a secondary file that cont

HBase design schema

2011-04-04 Thread Miguel Costa
Hi, I need some help to a schema design on HBase. I have 5 dimensions (Time,Site,Referrer Keyword,Country). My row key is Site+Time. Now I want to answer some questions like what is the top Referrer by Keyword for a site on a Period of Time. Basically I want to cross all the dimension