But when you read, you have to approach all regions of that customer,
instead of pinpointing just one which contains that hour you want for
example.

On Friday, November 15, 2013, Ted Yu wrote:

> bq. you must have your customerId, timestamp in the rowkey since you query
> on it
>
> Have you looked at this API in Scan ?
>
>   public Scan setTimeRange(long minStamp, long maxStamp)
>
>
> Cheers
>
>
> On Fri, Nov 15, 2013 at 1:28 PM, Asaf Mesika <[email protected]>
> wrote:
>
> > The problem is that I do know my rowkey design, and it follows people's
> > best practice, but generates a really bad use case which I can't seem to
> > know how to solve yet.
> >
> > The rowkey as I said earlier is:
> > <customerId><bucket><timestampInMs><uniqueId>
> > So when ,for example, you have 1000 customer, and bucket ranges from 1 to
> > 16, you eventually end up with:
> > * 30k regions - What happens, as I presume: you start with one region
> > hosting ALL customers, which is just one. As you pour in more customers
> and
> > more data, the region splitting kicks in. So, after a while, you get to a
> > situation in which most regions hosts a specific customerId, bucket and
> > time duration. For example: customer #10001, bucket 6, 01/07/2013 00:00 -
> > 02/07/2013 17:00.
> > * Empty regions - the first really bad consequence of what I told before
> is
> > that when the time duration is over, no data will ever be written to this
> > region. and Worst - when the TTL you set (lets say 1 month) is over and
> > it's 03/08/2013, this region gets empty!
> >
> > The thing is that you must have your customerId, timestamp in the rowkey
> > since you query on it, but when you do, you will essentially get regions
> > which will not get any more writes to them, and after TTL become zombie
> > regions :)
> >
> > The second bad part of this rowkey design is that some customer will have
> > significantly less traffic than other customers, thus in essence their
> > regions will get written in a very slow rate compared with the high
> traffic
> > customer. When this happens on the same RS - bam: the slow region Puts
> are
> > causing the WAL Queue to get bigger over time, since its region never
> gets
> > to Max Region Size (256MB in our case) thus never gets flushed, thus
> stays
> > in the 1st WAL file. Until when? Until we hit max logs file permitted
> (32)
> > and then regions are flushed forcely. When this happen, we get about 100
> > regions with 3k-3mb store files. You can imagine what happens next.
> >
> > The weirdest thing here is that this rowkey design is very common -
> nothing
> > fancy here, so in essence this phenomenon should have happened to a lot
> of
> > people - but from some reason, I don't see that much writing about it.
> >
> > Thanks!
> >
> > Asaf
> >
> >
> >
> > On Fri, Nov 15, 2013 at 3:51 AM, Jia Wang <[email protected]> wrote:
> >
> > > Then the case is simple, as i said "check your row key design, you can
> > find
> > > the start and end row key for each region, from which you can know why
> > your
> > > request with a specific row key doesn't hit a specified region"
> > >
> > > Cheers
> > > Ramon
> > >
> > >
> > > On Thu, Nov 14, 2013 at 8:47 PM, Asaf Mesika <[email protected]>
> > > wrote:
> > >
> > > > It's from the same table.
> > > > The thing is that some <customerId> simply have less data saved in
> > HBase,
> > > > while others have x50 (max) data.
> > > > I'm trying to check how people designed their rowkey around it, or
> had
> > > > other out-of-the-box solution for it.
> > > >
> > > >
> > > >
> > > > On Thu, Nov 14, 2013 at 12:06 PM, Jia Wang <[email protected]>
> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Are the regions from the same table? If it was, check your row key
> > > > design,
> > > > > you can find the start and end row key for each region, from which
> > you
> > > > can
> > > > > know why your request with a specific row key doesn't hit a
> specified
> > > > > region.
> > > > >
> > > > > If the regions are for different table, you may consider to combine
> > > some
> > > > > cold regions for some tables.
> > > > >
> > > > > Thanks
> > > > > Ramon
> > > > >
> > > > >
> > > > > On Thu, Nov 14, 2013 at 4:59 PM, Asaf Mesika <

Reply via email to