The trade-off we make is to increase our write performance knowing it will negatively impact our read performance. In our case, however, we write a lot of rows that might never be read (depending on the specific deep-dive queries that will be run), so it's an ok trade-off. However, our layout is similar to the one described by Mike, so when we perform a read, we don't have to send to every region in the system, only the few that might have the data we need. Bloom filters also make random reads more efficient.
Online merge is only available in the 0.96.x code, but an offline merge exists for 0.94.x (that may not be a viable option for you). From the command line: hbase org.apache.hadoop.hbase.util.Merge "table" "region1", "region2" However, if you have a specific weekly time that you can use for offline maintenance, writing a utility that splits the heavily used (hot) regions and merges the empty ones would allow you to balance your regions more appropriately across your cluster. --Tom On Wed, Nov 20, 2013 at 8:43 AM, Otis Gospodnetic < [email protected]> wrote: > We use https://github.com/sematext/HBaseWD and I just learned > Amazon.com people are using it and are happy with it, so it may work > for you, too. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Wed, Nov 20, 2013 at 1:00 AM, Asaf Mesika <[email protected]> > wrote: > > Thanks for clearing that out. > > I'm using your message to ping anyone who assist as to it appears the use > > case should happen to a lot of people? > > > > Thanks! > > > > On Wednesday, November 20, 2013, Himanshu Vashishtha wrote: > > > >> Re: "The 32 limit makes HBase go into > >> stress mode, and dump all involving regions contains in those 32 WAL > >> Files." > >> > >> Pardon, I haven't read all your data points/details thoroughly, but the > >> above statement is not true. Rather, it looks at the oldest WAL file, > and > >> flushes those regions which would free that WAL file. > >> > >> But I agree that in general with this kind of workload, we should handle > >> WAL files more intelligently and free up those WAL files which don't > have > >> any dependency (that is, all their entries are already flushed) when > >> archiving. We do that in trunk but not in any released version, though. > >> > >> > >> > >> On Sat, Nov 16, 2013 at 11:16 AM, Asaf Mesika <[email protected]> > >> wrote: > >> > >> > First I forgot to mention that <customerId> in our case is > >> > MD5(<customerId>). > >> > In our case, we have so much data flowing in, that we end up having a > >> > region per <customerId><bucket> pretty quickly and even that, is > splitted > >> > into different regions by specific date duration (timestamp). > >> > > >> > We're not witnessing a hotspot issue. I built some scripts in java and > >> awk, > >> > and saw that 66% of our customers use more than 1Rs. > >> > > >> > We have two main serious issues: primary and secondary. > >> > > >> > Our primary issue being the slow-region vs fast-region. First let's be > >> > reminded that a region represents as I detailed before a specific > >> > <customerId><bucket>. Some customers gets x50 times more data that > other > >> > customers at a specific time frame (2hrs - 1 day). So in a one RS, we > >> have > >> > regions getting 10 write requests per hour, vs 50k write requests per > >> hour. > >> > So the region mapped to the slow-filling customer id, doesn't get to > the > >> > 256MB flush limit and hence isn't flushed, while the regions mapped to > >> the > >> > fast-filling customer id, are flushing very quickly since they are > >> filling > >> > very quickly. > >> > Let's say the 1st WAL file contains the put of a slow-filling > customerId. > >> > the fast-filling customerId, fills up the rest of that file. After > 20-30 > >> > seconds, the file gets rolled, and another file fills up with fast > >> filling > >> > customerId. After a while, we get to 32 WAL Files. The 1st file wasn't > >> > deleted since its region wasn't flushed. The 32 limit makes HBase go > into > >> > stress mode, and dump all involving regions contains in those 32 WAL > >> Files. > >> > In our case, we saw that it flushes 111 regions. Lots of the store > files > >> > are 3k-3mb sized. So our compaction queue start filling up with those > >> store > >> > files needs to be compacted. > >> > At the of the road, the RS gets dead. > >> > > >> > Our secondary issue is those of empty regions - we get to a situation > >> where > >> > a region is mapped to a specific <customerId>, <bucket>, and date > range > >> > (1/7 - 3/7). Those when we are in August (we TTL set to 30 days), > those > >> > regions gets empty and will never get filled again. > >> > We assume this somehow wrecks havoc in the load balancer, and also > MSLAB > >> > probably steals 1-2 GB of memory for those empty regions. > >> > > >> > Thanks! > >> > > >> > > >> > > >> > On Sat, Nov 16, 2013 at 7:25 PM, Mike Axiak <[email protected]> wrote: > >> > > >> > > Hi, > >> > > > >> > > One new key pattern that we're starting to use is a salt based on a > >> > shard. > >> > > For example, let's take your key: > >> > > > >> > > <customerId><bucket><timestampInMs><uniqueId> > >> > > > >> > > Consider a shard between 0 and 15 inclusive. We determine this with: > >> > > > >> > > <shard> = abs(hash32(uniqueId) % 16) > >> > > > >> > > We can then define a salt to be based on customerId and the shard: > >> > > > >> > > <salt> = hash32(<shard><customerId>) > >> > > > >> > > So then the new key becomes: > >> > > > >> > > <salt><customerId><timestampInMs><uniqueId> > >> > > > >> > > This will distribute the data for a given customer across the N > shards > >> > that > >> > > you pick, while having a deterministic function for a given row key > (so > >> > > long as the # of shards you pick is fixed, otherwise you can migrate > >> the > >> > > data). Placing the bucket after the customerId doesn't help > distribute > >> > the > >> > > single customer's data at all. Furthermore, by using a separate hash > >> > > (instead of just <shard><customerId>), you're guaranteeing that new > >> data > >> > > will appear in a somewhat random location (i.e., solving the > problem of > >> > > adding a bunch of new data for a new customer). > >> > > > >> > > I have a key simulation script in python that I can start tweaking > and > >> > > share with people if they'd like. > >> > > > >> > > Hope this helps, > >> > > Mike > >> > > > >> > > > >> > > On Sat, Nov 16, 2013 at 1:16 AM, Ted Yu <[email protected]> > wrote: > >> > > > >> > > > bq. all regions of that customer > >> > > > > >> >
