Can you please elaborate? On Wednesday, November 20, 2013, Otis Gospodnetic wrote:
> We use https://github.com/sematext/HBaseWD and I just learned > Amazon.com people are using it and are happy with it, so it may work > for you, too. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Wed, Nov 20, 2013 at 1:00 AM, Asaf Mesika <[email protected]> > wrote: > > Thanks for clearing that out. > > I'm using your message to ping anyone who assist as to it appears the use > > case should happen to a lot of people? > > > > Thanks! > > > > On Wednesday, November 20, 2013, Himanshu Vashishtha wrote: > > > >> Re: "The 32 limit makes HBase go into > >> stress mode, and dump all involving regions contains in those 32 WAL > >> Files." > >> > >> Pardon, I haven't read all your data points/details thoroughly, but the > >> above statement is not true. Rather, it looks at the oldest WAL file, > and > >> flushes those regions which would free that WAL file. > >> > >> But I agree that in general with this kind of workload, we should handle > >> WAL files more intelligently and free up those WAL files which don't > have > >> any dependency (that is, all their entries are already flushed) when > >> archiving. We do that in trunk but not in any released version, though. > >> > >> > >> > >> On Sat, Nov 16, 2013 at 11:16 AM, Asaf Mesika <[email protected]> > >> wrote: > >> > >> > First I forgot to mention that <customerId> in our case is > >> > MD5(<customerId>). > >> > In our case, we have so much data flowing in, that we end up having a > >> > region per <customerId><bucket> pretty quickly and even that, is > splitted > >> > into different regions by specific date duration (timestamp). > >> > > >> > We're not witnessing a hotspot issue. I built some scripts in java and > >> awk, > >> > and saw that 66% of our customers use more than 1Rs. > >> > > >> > We have two main serious issues: primary and secondary. > >> > > >> > Our primary issue being the slow-region vs fast-region. First let's be > >> > reminded that a region represents as I detailed before a specific > >> > <customerId><bucket>. Some customers gets x50 times more data that > other > >> > customers at a specific time frame (2hrs - 1 day). So in a one RS, we > >> have > >> > regions getting 10 write requests per hour, vs 50k write requests per > >> hour. > >> > So the region mapped to the slow-filling customer id, doesn't get to > the > >> > 256MB flush limit and hence isn't flushed, while the regions mapped to > >> the > >> > fast-filling customer id, are flushing very quickly since they are > >> filling > >> > very quickly. > >> > Let's say the 1st WAL file contains the put of a slow-filling > customerId. > >> > the fast-filling customerId, fills up the rest of that file. After > 20-30 > >> > seconds, the file gets rolled, and another file fills up with fast > >> filling > >> > customerId. After a while, we get to 32 WAL Files. The 1st file wasn't > >> > deleted since its region wasn't flushed. The 32 limit makes HBase go > into > >> > stress mode, and dump all involving regions contains in those 32 WAL > >> Files. > >> > In our case, we saw that it flushes 111 regions. Lots of the store > files > >> > are 3k-3mb sized. So our compaction queue start filling up with those > >> store > >> > files needs to be compacted. > >> > At the of the road, the RS gets dead. > >> > > >> > Our secondary issue is those of empty regions - we get to a situation > >> where > >> > a region is mapped to a specific <customerId>, <bucket>, and date > range > >> > (1/7 - 3/7). Those when we are in August (we TTL set to 30 days), > those > >> > regions gets empty and will never get filled again. > >> > We assume this somehow wrecks havoc in the load balancer, and also > MSLAB > >> > probably steals 1-2 GB of memory for those empty regions. > >> > > >> > Thanks! > >> > > >> > > >> > > >> > On Sat, Nov 16, 2013 at 7:25 PM, Mike Axiak <[email protected]> wrote: > >> > > >> > > Hi, > >> > > > >> > > One new key pattern that we're starting to use is a salt based on a > >> > shard. > >> > > For example, let's take your key: > >> >
