On Mon, Apr 4, 2011 at 3:30 PM, Ted Dunning wrote:
> OpenTSDB does an interesting thing where they put a primary key in front of
> the date. This limits some of the hot-spotting on inserts. Each different
> kind of query goes to a different machine as well. The query balancing
> won't be as goo
:miguel-co...@telecom.pt]
> Sent: Monday, April 04, 2011 9:12 AM
> To: user@hbase.apache.org
> Subject: HBase design schema
>
> Hi,
>
> I need some help to a schema design on HBase.
>
> I have 5 dimensions (Time,Site,Referrer Keyword,Country).
> My row key is Site+Time.
>
&
: segunda-feira, 4 de Abril de 2011 19:24
To: user@hbase.apache.org
Subject: RE: HBase design schema
I've done almost the same thing at my work. Since I'm running on a VERY
small number of servers (2), I pre-aggregate my data into tables in the
format...
[-MM-DD]|[Keyword]|[Referrer] for t
' this will return all of
the referrers for the keyword hospital for the date of 2011-03-05.
YMMV
-Pete
From: Miguel Costa [mailto:miguel-co...@telecom.pt]
Sent: Monday, April 04, 2011 9:12 AM
To: user@hbase.apache.org
Subject: HBase design schema
Hi,
I need some help to a schema design
Take a look at OpenTSDB.
I think you will be impressed with the speed.
Regarding the exponential explosion. Yes. That is a risk in theory. But
what happens in practice is that you only create the alternative forms of
the file where the simpler key forms are unacceptable due to volume of data.
: Miguel Costa
Subject: Re: HBase design schema
Miguel,
One option is to use the simplest design and use the key you have. Scanning
for a particular period of time will give you all the data in that time
period which you can reduce in any way that you like.
If that becomes too ineffic
Miguel,
One option is to use the simplest design and use the key you have. Scanning
for a particular period of time will give you all the data in that time
period which you can reduce in any way that you like.
If that becomes too inefficient, a common trick is to build a secondary file
that cont
Hi,
I need some help to a schema design on HBase.
I have 5 dimensions (Time,Site,Referrer Keyword,Country).
My row key is Site+Time.
Now I want to answer some questions like what is the top Referrer by Keyword
for a site on a Period of Time.
Basically I want to cross all the dimension