Blog Post: Using HBase Quotas to Share Resources at Scale

2024-10-02 Thread Bryan Beaudreault
Hey all, over the past year we at HubSpot have invested heavily in the Quotas feature of HBase. We've built it into our internal tooling, and also pushed many contributions to improve it for all HBase users. These features are all available in 2.6.0+. Ray Mattingly drove most of these improvements

Re: Implementing continous backup and Point in time recovery in HBase?

2024-09-30 Thread Bryan Beaudreault
Ankit, thanks for starting this discussion. It'd be great to integrate streaming of WAL edits to a backup destination. We've done this for years internally at my company. It's critical to achieving only a few minutes of RPO, but also complicated for us to maintain. Having it in hbase would benefit

Re: [ANNOUNCE] New HBase committer Ray Mattingly

2024-09-10 Thread Bryan Beaudreault
Congrats Ray! Keep up the good work On Tue, Sep 10, 2024 at 9:12 AM Nick Dimiduk wrote: > Congratulations Ray and thanks for all the contributions! > > On Tue, Sep 10, 2024 at 3:57 AM Viraj Jasani wrote: > > > On behalf of the Apache HBase PMC, I am pleased to announce that > > Ray Mattingly ha

Re: How to handle Remote Exceptions when shading HBase client

2024-09-09 Thread Bryan Beaudreault
We used to do this years ago at my company. You have to find the various ProtobufUtil and RemoteException related classes, wherever the remote classname string is parsed from a protobuf into a Class. You need to push some custom code to rewrite the strings there. I think there were two or three spo

Re: [DISCUSS] Move our official slack channel to the one in the-asf.slack.com

2024-07-07 Thread Bryan Beaudreault
+1 sounds good to me On Sun, Jul 7, 2024 at 11:07 AM Duo Zhang wrote: > As I mentioned in another thread, now slack will hide the comments > before 90 days in the current apache-hbase.slack.com, which is really > not good for finding useful discussions. > > According to the documentation here >

Re: [ANNOUNCE] New HBase committer Andor Molnár

2024-05-29 Thread Bryan Beaudreault
Congrats and welcome! On Wed, May 29, 2024 at 5:06 PM Viraj Jasani wrote: > Congratulations and Welcome, Andor! Well deserved!! > > > On Wed, May 29, 2024 at 7:36 AM Duo Zhang wrote: > > > On behalf of the Apache HBase PMC, I am pleased to announce that > > Andor Molnár(andor) has accepted the

Re: [ANNOUNCE] Apache HBase 2.6.0 is now available for download

2024-05-20 Thread Bryan Beaudreault
Haha yes, I noticed that…after sending. Hopefully the title and various other mentions of 2.6.0 can suffice this time :) On Mon, May 20, 2024 at 10:08 PM 张铎(Duo Zhang) wrote: > Congratulations! > > But it seems you missed the replacement for the first '_version_' > pl

[ANNOUNCE] Apache HBase 2.6.0 is now available for download

2024-05-20 Thread Bryan Beaudreault
The HBase team is happy to announce the immediate availability of HBase _version_. Apache HBase™ is an open-source, distributed, versioned, non-relational database. Apache HBase gives you low latency random access to billions of rows with millions of columns atop non-specialized hardware. To learn

Re: Build failure on Apple (M1/M3) for hbase-common package for io.opentelemetry:opentelemetry-context, while it works on Intel Mac

2024-04-30 Thread Bryan Beaudreault
I also routinely build all branches of hbase on an apple M3, using hadoop3, and Java 11+. I believe I’ve also built with java8 at some point, but don’t quote me on that because we largely don’t use java8 at my company. On Tue, Apr 30, 2024 at 4:48 PM Wei-Chiu Chuang wrote: > I am on Apple M3, c

Re: Re: Can HBase 2.5.8 working with jdk17?

2024-04-10 Thread Bryan Beaudreault
wrote: > Hi. > I am currently unable to start REGION using JDK17 directly. What > adjustments do I need to make to use JDK17/21? > Tks. > > > > > > > > > > > > > > 在 2024-04-09 18:52:34,"Bryan Beaudreault" 写道: > >We ran

Re: Can HBase 2.5.8 working with jdk17?

2024-04-09 Thread Bryan Beaudreault
We ran hbase under jdk17 for a few months. The only issue we saw was https://issues.apache.org/jira/browse/HBASE-28206 which was fixed in 2.5.7. More recently we’ve upgraded again to jdk21 to gain access to generational zgc. That also has been working fine without any additional patches. We’re ru

Re: Re: INDEX_BLOCK_ENCODING=> PREFIX_TREE cannot be used properly

2024-03-26 Thread Bryan Beaudreault
INDEX_BLOCK_ENCODING is a new feature, but just the configuration exists. No actual encodings have been committed. Development on the PR stalled. See https://github.com/apache/hbase/pull/4782 It would be great if someone picked up this work again. On Tue, Mar 26, 2024 at 6:05 AM lisoda wrote: >

Re: Tentative Release Date | Hbase 2.6.0

2024-01-13 Thread Bryan Beaudreault
Hello, I'm unsure where the original email was sent, but I did not see it on our user or dev lists. Sorry that it was missed. The release was delayed due to the holidays, but I hope to have the first release candidate for 2.6.0 generated this coming week. I'm now finishing porting some final patc

Re: [DISCUSS] End support for hadoop 2.10?

2023-12-06 Thread Bryan Beaudreault
ase 2.6 should > be > > able to support 3.2.x and 3.3.x. At the same time, IMO 3.2.x is also an > > inactive release version, we can discuss if we should just change our > base > > of hadoop to 3.3.6 maybe starting from HBase 3.0+ > > > > -Stephen > > >

Re: [ANNOUNCE] Please welcome Bryan Beaudreault to the HBase PMC

2023-10-17 Thread Bryan Beaudreault
Thanks all! On Tue, Oct 17, 2023 at 2:44 PM Jan Hentschel wrote: > Welcome Bryan! Well deserved. > > From: Duo Zhang > Date: Tuesday, October 17, 2023 at 3:32 AM > To: HBase Dev List , hbase-user < > user@hbase.apache.org>, user-zh > Subject: [ANNOUNCE] Please welco

Re: [ANNOUNCE] New HBase committer Tianhang Tang(唐天航)

2023-03-16 Thread Bryan Beaudreault
Congratulations and welcome! On Thu, Mar 16, 2023 at 12:02 AM 张铎(Duo Zhang) wrote: > On behalf of the Apache HBase PMC, I am pleased to announce that Tianhang > Tang(thangTang) > has accepted the PMC's invitation to become a committer on the project. We > appreciate all > of Tianhang's generous

Re: Why hbase stop all HMaster service first when rolling restart in rolling-restart.sh

2023-03-09 Thread Bryan Beaudreault
Taking a look at the git blame for the script, some of the parts you reference are over 13 years old. So it may just be that they deserve some updating. Anyway, you are not missing anything and your approach is both safe and more graceful. On Thu, Mar 9, 2023 at 8:47 PM Bryan Beaudreault wrote

Re: Why hbase stop all HMaster service first when rolling restart in rolling-restart.sh

2023-03-09 Thread Bryan Beaudreault
I can’t speak to why the script is the way it is. But I will say that my company has been running hbase at massive scale with high reliability standards for years. We’ve never used any of the built in shell scripts. We have our own automation, and our HMaster rolling restart is more like what you d

Re: ReadOnlyZKClient question

2023-02-08 Thread Bryan Beaudreault
I sort of wonder if you have a log4j.properties and log4j2.properties on the classpath. I think I remember having an issue when both existed, where I'd see a lot more logs than I expect. Check for a log4j.properties and remove it. On Wed, Feb 8, 2023 at 4:51 AM 张铎(Duo Zhang) wrote: > These are a

Re: [ANNOUNCE] Please welcome Tak Lon (Stephen) Wu to the HBase PMC

2023-01-30 Thread Bryan Beaudreault
Congrats! On Mon, Jan 30, 2023 at 4:10 AM Balazs Meszaros wrote: > Congratulations! > > On Mon, Jan 30, 2023 at 9:19 AM Jan Hentschel > wrote: > > > Congratulations and welcome! > > > > From: Duo Zhang > > Date: Monday, January 30, 2023 at 3:50 AM > > To: HBase Dev List , hbase-user < > > user

Re: [DISCUSS] Allow namespace admins to clone snapshots created by them

2023-01-03 Thread Bryan Beaudreault
I think development is done on TLS. We are just waiting on requested testing. Andor was working on that, but I believe he had some stuff come up at his work. I also want to get backups in place, but there is 1 backwards compatibility issue to work through. Hoping to have that squared away soon. O

Re: [ANNOUNCE] New HBase Committer Liangjun He

2022-12-05 Thread Bryan Beaudreault
Congrats! On Sun, Dec 4, 2022 at 2:15 PM Wei-Chiu Chuang wrote: > Congrats! > > On Sun, Dec 4, 2022 at 5:33 AM 宾莉金(binlijin) wrote: > > > Congratulations! > > > > 张铎(Duo Zhang) 于2022年12月3日周六 22:28写道: > > > > > Congratulations! > > > > > > Yu Li 于2022年12月3日周六 21:51写道: > > > > > > > > Hi All, >

Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)

2022-12-01 Thread Bryan Beaudreault
Should we have them sent to private@? Just thinking in terms of reducing spam to users who put their email and full name on a public list. One thought I had about bug tracking is whether we could use some sort of github -> jira sync. I've seen them used before, where it automatically syncs issues

Re: HBase Metrics reset

2022-11-28 Thread Bryan Beaudreault
Not currently. What's the use-case? On Mon, Nov 28, 2022 at 2:47 AM Andrey Khozov wrote: > Hello! > > Is there any way (API or shell command) to reset read/write statistics > for regions in Region Servers? > > Thanks, > Andrey > > >

Re: [ANNOUNCE] Please welcome Xiaolin Ha(哈晓琳) to the HBase PMC

2022-04-11 Thread Bryan Beaudreault
Congratulations! On Mon, Apr 11, 2022 at 10:06 AM Huaxiang Sun wrote: > Congratulations, Xiaolin! > > Sent from my iPhone > > > On Apr 11, 2022, at 2:30 AM, Jan Hentschel < > jan.hentsc...@ultratendency.com> wrote: > > > > Congratulations and welcome! > > > > From: 张铎(Duo Zhang) > > Date: Satu

Re: [ANNOUNCE] New HBase committer Bryan Beaudreault

2022-04-11 Thread Bryan Beaudreault
Nick > > On Sat, Apr 9, 2022 at 04:45 张铎(Duo Zhang) wrote: > > > On behalf of the Apache HBase PMC, I am pleased to announce that Bryan > > Beaudreault(bbeaudreault) has accepted the PMC's invitation to become a > > committer on the project. We appreciate all of Bryan

Re: RegionTooBusyException: StoreTooBusy

2022-03-23 Thread Bryan Beaudreault
Hello, Unfortunately I don’t have good guidance on what to tune this to. What I can say though is that this feature will be disabled by default starting with version 2.5.0. Part of the reason for that is we determined it is too aggressive but didn’t yet have good guidance on a better default. So

Re: upgrade hadoop , keep hbase

2021-12-20 Thread Bryan Beaudreault
I have to agree anecdotally that when I first upgraded to 3.3.1 I got a bunch of NoSuchMethodExceptions (or similar). This was a while ago so I don't have specifics. It was an easy fix -- recompile hbase with -Dhadoop.version=3.3.1. On Mon, Dec 20, 2021 at 9:46 PM Wei-Chiu Chuang wrote: > tl; dr

Re: [DISCUSS] Removing problematic terms from our project

2021-09-29 Thread Bryan Beaudreault
and voting, for necessary > deprecation-release-removal-release cycles where termonology changes impact > one or more of our compatibility guidelines. > > What has been missing since this thread closed with this conclusion? > > Actual patches. > > It's quite easy to

Re: [DISCUSS] Removing problematic terms from our project

2021-09-29 Thread Bryan Beaudreault
Sorry to revive a very old thread, but I just stumbled across this and don't see a clear resolution. I wonder if we should create a JIRA from Andrew's summary and treat that as an umbrella encompassing the original 3 JIRAs? I'm also cognizant of the fact that there are rumblings of doing an initial

hbase.pb package breaks protobuf compatibility between pre-1.3 and post

2021-08-20 Thread Bryan Beaudreault
It's been decided in a thread on the dev list that protobuf should be considered InterfaceAudience.PRIVATE. See here: https://lists.apache.org/thread.html/r9e6eb11106727d245f6eb2a5023823901637971d6ed0f0aedaf8d149%40%3Cdev.hbase.apache.org%3E So with that being said, this is mostly just to put out

Re: Upgrading cdh5.16.2 to apache hbase 2.4 using replication

2021-05-19 Thread Bryan Beaudreault
the data > so cannot justify the exorbitant cost per node that cloudera are asking for > later versions. > > -Original Message- > From: Bryan Beaudreault > Sent: Wednesday, May 19, 2021 2:49 PM > To: user@hbase.apache.org > Subject: Upgrading cdh5.16.2 to apache hbase

Upgrading cdh5.16.2 to apache hbase 2.4 using replication

2021-05-19 Thread Bryan Beaudreault
We are running about 40 HBase clusters, with over 5000 regionservers total. These are all running cdh5.16.2. We also have thousands of clients (from APIs to kafka workers to hadoop jobs, etc) hitting these various clusters, also running cdh5.16.2. We are starting to plan an upgrade to hbase 2.x an

Re: HotSpot detection/mitigation worker?

2021-05-18 Thread Bryan Beaudreault
re of this in a > seamless manager for our clients. > > Example: Actual row key --> *0QUPHSBTLGM*, and client requested a 3 digit > prefix based on table region range (000 - 999), would translate to > *115-0QUPHSBTLGM* with murmurhash > > --- > Mallikarjun > > > On Tue, May 1

Re: hbase slack

2021-05-17 Thread Bryan Beaudreault
That's weird! Can you invite bbeaudrea...@gmail.com? On Mon, May 17, 2021 at 9:39 PM 张铎(Duo Zhang) wrote: > Could you please give us an email address? > > bbeaudrea...@hubspot.com.invalid > > Is this the expected one? There is an 'invalid' in it... > > Bryan

hbase slack

2021-05-17 Thread Bryan Beaudreault
Is there an existing user group slack of hbase users? If so can I have an invite? Thanks!

HotSpot detection/mitigation worker?

2021-05-17 Thread Bryan Beaudreault
Hey all, We run a bunch of big hbase clusters that get used by hundreds of product teams for a variety of real-time workloads. We are a B2B company, so most data has a customerId somewhere in the rowkey. As the team that owns the hbase infrastructure, we try to help product teams properly design s

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-18 Thread Bryan Beaudreault
Is the backported patch available anywhere? Not seeing it on the referenced JIRA. If it ends up not getting officially backported to branch-1 due to 2.0 around the corner, some of us who build our own deploy may want to integrate into our builds. Thanks! These numbers look great On Fri, Nov 18, 20

Favored nodes

2016-11-15 Thread Bryan Beaudreault
Hello, I'm wondering if the favored nodes implementation in HBase 1.2-cdh5.9 is ready or documented anywhere. I can't find anything official, but there are plenty of class files that look to work with it. I know Yahoo has recently renewed development on a V2 version, but does the V1 work? Thanks

Re: setMaxResultSize on Gets

2016-08-23 Thread Bryan Beaudreault
Great, thanks stack! On Tue, Aug 23, 2016 at 12:54 AM Stack wrote: > On Mon, Aug 22, 2016 at 3:02 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > In HBase 1.2.x and higher you can call setMaxResultSize on a Scan to > limit > > the impact of

setMaxResultSize on Gets

2016-08-22 Thread Bryan Beaudreault
In HBase 1.2.x and higher you can call setMaxResultSize on a Scan to limit the impact of scans that are too aggressive, by bailing out at a certain size response. The client side will nicely splice together all of the isPartial responses to create a full one as well, pushing the danger to the clien

Re: Hbase regionserver.MultiVersionConcurrencyControl Warning

2016-08-05 Thread Bryan Beaudreault
I'm also interested in an answer here. We see this from time to time in our production HBase clusters (non-opentsdb). It seems to be related to contention under heavy reads or heavy writes. But it's not clear what the impact is here. On Fri, Aug 5, 2016 at 5:14 AM Sterfield wrote: > Hi, > > I'm

Re: Scan.setMaxResultSize and Result.isPartial

2016-06-18 Thread Bryan Beaudreault
and give a single Result object, because it > will cause OOM. > > Hope this helps. > Enis > > On Fri, Jun 17, 2016 at 4:15 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > Hello, > > > > We are running 1.2.0-cdh5.7.0 on our ser

Scan.setMaxResultSize and Result.isPartial

2016-06-17 Thread Bryan Beaudreault
Hello, We are running 1.2.0-cdh5.7.0 on our server side, and 1.0.0-cdh5.4.5 on the client side. We're in the process of upgrading the client, but aren't there yet. I'm trying to figure out the relationship of Result.isPartial and the user, when setMaxResultSize is used. I've done a little reading

Re: Enabling stripe compaction without disabling table

2016-06-07 Thread Bryan Beaudreault
online alter > table is not different at all. The "hbase.online.schema.update.enable" > property was fixing some possible race conditions that were fixed long time > ago. > > We should update the documentation. Mind creating a small patch? > Enis > > On Mon, Ju

Re: Enabling stripe compaction without disabling table

2016-06-06 Thread Bryan Beaudreault
n and test it before using in production. > > FYI > > On Mon, Jun 6, 2016 at 12:19 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > Hello, > > > > We're running hbase 1.2.0-cdh5.7.0. According to the HBase book, in order > > t

Enabling stripe compaction without disabling table

2016-06-06 Thread Bryan Beaudreault
Hello, We're running hbase 1.2.0-cdh5.7.0. According to the HBase book, in order to enable stripe compactions on a table we need to first disable the table. We basically can't disable tables in production. Is it possible to do this without disabling the table? If not, are there any plans to make

dfs.block.size recommendations for HBase

2016-06-01 Thread Bryan Beaudreault
Hello, There is very little information that I can find online with regards to recommended dfs.block.size setting for HBase. Often it conflates with the HBase blocksize, which we know should be smaller. Any chance we can get some recommendations for dfs.block.size? The default shipped with HDFS

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
ake a difference. > > Hang on... will be back in a sec... just sending this in meantime... > > St.Ack > > On Mon, May 23, 2016 at 12:20 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrote: > > > For reference, the Scan backing the job is pretty basic: > &

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
rwise it is using the out-of-the-box TableInputFormat. On Mon, May 23, 2016 at 3:13 PM Bryan Beaudreault wrote: > I've forced the issue to happen again. netstat takes a while to run on > this host while it's happening, but I do not see an abnormal amount of > CLOSE_WAIT

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
s I know uses the HBase RPC and does not hit HDFS directly at all. Is it possible that a long running scan (one with many, many next() calls) could keep some references to HDFS open for the duration of the overall scan? On Mon, May 23, 2016 at 2:19 PM Bryan Beaudreault wrote: > We run MR agains

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
igs between CDH5.3.8 and 5.7.0 are identical for us. On Mon, May 23, 2016 at 2:03 PM Stack wrote: > On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > Hey everyone, > > > > We are noticing a file descriptor leak th

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
are hitting does not appear to affect 0.98/CDH5.3.8. We also never saw it when we were on 0.94. This seems new in either 1.0+ or 1.2+. On Mon, May 23, 2016 at 12:59 PM Ted Yu wrote: > Have you taken a look at HBASE-9393 ? > > On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <

File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Bryan Beaudreault
Hey everyone, We are noticing a file descriptor leak that is only affecting nodes in our cluster running 5.7.0, not those still running 5.3.8. I ran an lsof against an affected regionserver, and noticed that there were 10k+ unix sockets that are just called "socket", as well as another 10k+ of the

Re: Slow sync cost

2016-04-27 Thread Bryan Beaudreault
er with BufferedMutator or > separately with just direct Put's? > > Thanks. > > ---- > Saad > > > > > On Wed, Apr 27, 2016 at 2:22 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > Hey Ted, > > > > Actually

Re: Slow sync cost

2016-04-27 Thread Bryan Beaudreault
gt; Does the above mean that hbase servers are still using ParallelGC ? > > Thanks > > On Wed, Apr 27, 2016 at 7:39 AM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > We have 6 production clusters and all of them are tuned differently, so > I'

Re: Slow sync cost

2016-04-27 Thread Bryan Beaudreault
hat is interesting. Would it be possible for you to share what GC settings > you ended up on that gave you the most predictable performance? > > Thanks. > > > Saad > > > On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrot

Re: Slow sync cost

2016-04-26 Thread Bryan Beaudreault
We were seeing this for a while with our CDH5 HBase clusters too. We eventually correlated it very closely to GC pauses. Through heavily tuning our GC we were able to drastically reduce the logs, by keeping most GC's under 100ms. On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti wrote: > From what I ca

Testing HBase packaging changes

2016-03-14 Thread Bryan Beaudreault
Hello, We've been using CDH community for a few years now, but have reached the point where we want to be able to do some development and backporting upstream. We'd like to keep the base CDH packaging for consistency, but have come up with some automation around building CDH from source RPM, with

(CDH5.5) AssertionError: Key followed by smaller key

2016-03-02 Thread Bryan Beaudreault
Recently we upgraded to CDH5.3.8, and needed to use CDAP Readless Increments (https://github.com/caskdata/cdap) to overcome the recently-fixed performance regression around Increment. We are now looking to upgrade to CDH5.5.x, and have attempted to upgrade 1 slave in our 5.3.8 cluster to 5.5.0. U

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
t; > ... and other is doing: > > at > > org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3593) > > Not many increments going on. > > > https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286 > is two increments too in same places. Is it stuck? >

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
ey); } On Tue, Dec 1, 2015 at 12:08 AM Stack wrote: > Looking at that stack trace, nothing showing as blocked or slowed by > another operation. You have others I could look at Bryan? > St.Ack > > On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
000ms range. It seems we may be blocking on FSHLog#syncer. https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359 On Mon, Nov 30, 2015 at 11:26 PM Stack wrote: > Still slow increments though? > > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Those log lines have settled down, they may have been related to a cluster-wide forced restart at the time. On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault wrote: > We've been doing more debugging of this and have set up the read vs write > handlers to try to at least segment t

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
at 6:11 PM Bryan Beaudreault wrote: > Sorry the second link should be > https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579 > > On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrote: > >> https://gist.gith

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Sorry the second link should be https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579 On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault wrote: > https://gist.github.com/bbeaudreault/2994a748da83d9f75085 > > An active handler: > https://gist.github.com/

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
t 2:31 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > The rollback seems to have mostly solved the issue for one of our > clusters, > > but another one is still seeing long increment times: > > > > "slowIncrementCount": 52

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
ion for increments between CDH4.7.1 and CDH5.3.8? On Mon, Nov 30, 2015 at 4:13 PM Stack wrote: > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrote: > > > Should this be added as a known issue in the CDH or hbase documentation? > It >

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Should this be added as a known issue in the CDH or hbase documentation? It was a severe performance hit for us, all of our regionservers were sitting at a few thousand queued requests. On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault wrote: > Yea, they are all over the place and called f

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Stack wrote: > Rollback is untested. No fix in 5.5. I was going to work on this now. Where > are your counters Bryan? In their own column family or scattered about in a > row with other Cell types? > St.Ack > > On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault < > bbeaudr

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Is there any update to this? We just upgraded all of our production clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the known issues, did not not about this. Now we are seeing perfomance issues across all clusters, as we make heavy use of increments. Can we roll forward to CDH5

Re: Automating major compactions

2015-07-08 Thread Bryan Beaudreault
Our automation uses a combination of the following to determine what to compact: - Which regions have bad locality (% of blocks are local vs remote, using HDFS getBlockLocations APIs) - Which regions have the most number of HFiles (most files per region/cf directory) - Which regions have gone the

Re: Stochastic Balancer by tables

2015-06-18 Thread Bryan Beaudreault
Just had to say, https://issues.apache.org/jira/browse/HBASE-13103 looks *AWESOME* On Thu, Jun 18, 2015 at 5:00 PM Mikhail Antonov wrote: > Yeah, I could see 2 reasons for remaining few regions to take > unproportionally long time - 1) those regions are unproportionally > large (you should be ab

Re: How to make the client fast fail

2015-06-16 Thread Bryan Beaudreault
I agree that more documentation would be better. However, > Yet, there are some applications which require a faster time out than > others. So, you tune some of the timers to have a fast fail, and you end up > causing unintended problems for others. > > The simplest solution is to use threads in

Re: How are version conflicts handled in HBase?

2015-06-05 Thread Bryan Beaudreault
I wouldn't say it is recommended, but it is certainly possible to override the version timestamp at write time. You might be able to use this to provide the uniqueness you need (i.e. instead of using epoch timestamp, use one based more recently and add digits for uniqueness at the end). We've do

Re: Response Too Slow in RegionServer Logs

2015-05-29 Thread Bryan Beaudreault
ion total 19420524K, used 7807614K > [0x0001f4f9, 0x0006964eb000, 0x0007fae0) > concurrent-mark-sweep perm gen total 51000K, used 30498K > [0x0007fae0, 0x0007fdfce000, 0x0008) > > Thanks, > Rahul > > > On Fri, May 29, 2015 at 8:

Re: Response Too Slow in RegionServer Logs

2015-05-29 Thread Bryan Beaudreault
> > 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We > slept > 14912ms instead of 3000ms, this is likely due to a long garbage collecting > pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired I would check your gc logs for long gc

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

2015-05-22 Thread Bryan Beaudreault
ugh if I > remember correctly, also you need to define hbase.master.hostname if you > are using HBase 1.1 > > cheers, > esteban. > > -- > Cloudera, Inc. > > > On Fri, May 22, 2015 at 12:55 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrote: > > &g

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

2015-05-22 Thread Bryan Beaudreault
d HBASE-12954 so it depends on which version of > > 1.x > > > you are using. > > > > > > Regarding your account issue, I have created an INFRA JIRA on your > behalf > > > to look into your account problem. > > > > > > thanks, >

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

2015-05-22 Thread Bryan Beaudreault
On Fri, May 22, 2015 at 3:34 PM, Ted Yu wrote: > bq. hbase-1.1.0.1 > > To my knowledge, latest release was 1.1.0. The release before that was > 1.0.1 > > Can you clarify ? > > Thanks > > On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault < > bbeaudrea...@hub

Re: DNS mismatch between master and regionserver causes doubly registered regionservers

2015-05-22 Thread Bryan Beaudreault
> -- > Cloudera, Inc. > > > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrote: > > > In our system each server has 2 dns associated with it, one always points > > to a private address and the other to public or private d

DNS mismatch between master and regionserver causes doubly registered regionservers

2015-05-22 Thread Bryan Beaudreault
In our system each server has 2 dns associated with it, one always points to a private address and the other to public or private depending on the context. This issue did not show up in 0.94.x, but is showing up on my new 1.x cluster. Basically it goes like this: 1. Regionserver starts up, get's

Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5)

2015-05-14 Thread Bryan Beaudreault
ry stuff. > > Note: my project is not related to migration from 0.94 to 1.0. But, i am > > supporting the argument for moving MR code in client or a separate > > artifact. > > > > On Thu, May 14, 2015 at 9:43 AM, Bryan Beaudreault < > > bbeaudrea...@hubspot.

Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5)

2015-05-14 Thread Bryan Beaudreault
is an ok compromise where necessary. I can put JIRAs in for these if it makes sense On Tue, May 5, 2015 at 10:48 PM, Bryan Beaudreault wrote: > Thanks for the response guys! > > You've done a review of HTI in 1.0 vs 0.94 to make sure we've not >> mistakenly dropped

Re: How to know the root reason to cause RegionServer OOM?

2015-05-13 Thread Bryan Beaudreault
After moving to the G1GC we were plagued with random OOMs from time to time. We always thought it was due to people requesting a big row or group of rows, but upon investigation noticed that the heap dumps were many GBs less than the max heap at time of OOM. If you have this symptom, you may be r

Re: How to Restore the block locality of a RegionServer ?

2015-05-09 Thread Bryan Beaudreault
Major compactions will restore locality to the cluster. On Sat, May 9, 2015 at 3:36 PM, Michael Segel wrote: > First, understand why you had to create an ‘auto restart’ script. > > Taking down HBase completely (probably including zookeeper) and do a full > restart would probably fix the issue of

Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5)

2015-05-05 Thread Bryan Beaudreault
n guys. I'm expecting this will be a drawn out process considering our scope, but will be happy to keep updates here as I proceed. On Tue, May 5, 2015 at 10:31 PM, Esteban Gutierrez wrote: > Just to a little bit to what StAck said: > > -- > Cloudera, Inc. > > >

Upgrading from 0.94 (CDH4) to 1.0 (CDH5)

2015-05-05 Thread Bryan Beaudreault
Hello, I'm about to start tackling our upgrade path for 0.94 to 1.0+. We have 6 production hbase clusters, 2 hadoop clusters, and hundreds of APIs/daemons/crons/etc hitting all of these things. Many of these clients hit multiple clusters in the same process. Daunting to say the least. We can't

Re: Strange issue when DataNode goes down

2015-03-23 Thread Bryan Beaudreault
ading this HDFS-3703, trying to get more context, but looks to me > so far that this is one of those things where you should decide what are > your own acceptable tradeoffs. > > On Mon, Mar 23, 2015 at 4:40 PM Bryan Beaudreault < > bbeaudrea...@hubspot.com> > wrote: > &

Re: Strange issue when DataNode goes down

2015-03-23 Thread Bryan Beaudreault
light on what > > Nicolas is talking about - > https://issues.apache.org/jira/browse/HDFS-3703 > > > > On Mon, Mar 23, 2015 at 3:53 PM Bryan Beaudreault < > > bbeaudrea...@hubspot.com> > > wrote: > > > > > So it is safe to set hbase.lease.reco

Re: Strange issue when DataNode goes down

2015-03-23 Thread Bryan Beaudreault
So it is safe to set hbase.lease.recovery.timeout lower if you also set heartbeat.recheck.interval lower (lowering that 10.5 min dead node timer)? Or is it recommended to not touch either of those? Reading the above with interest, thanks for digging in here guys. On Mon, Mar 23, 2015 at 10:13 AM

Re: Poll: HBase usage by HBase version

2015-03-19 Thread Bryan Beaudreault
Super useful, thanks Dave! On Wed, Mar 18, 2015 at 11:25 PM, Dave Latham wrote: > If you haven't already seen it - take a look at the bridge at > https://issues.apache.org/jira/browse/HBASE-12814 > We're using it to go through the process now. > > Dave > > On Wed,

Re: Poll: HBase usage by HBase version

2015-03-18 Thread Bryan Beaudreault
My only complaint about this poll is the labels: "0.94.x - I like stable releases". It's not really about the stable releases for me, it's more about the extremely difficulty of overcoming "the singularity" from 0.94 -> 0.96+ with no downtime in a reasonably complex production system. Hortonwork's

Re: Where is HBase failed servers list stored

2015-03-05 Thread Bryan Beaudreault
You should run with a backup master in a production cluster. The failover process works very well and will cause no downtime. I've done it literally hundreds of times across our multiple production hbase clusters. Even if you don't have a backup master, you should still be fine with restarting t

Re: What companies are using HBase to serve a customer-facing product?

2014-12-06 Thread Bryan Beaudreault
At HubSpot we have 5 customer facing production clusters 30-60TB+ each. Our Data Ops team has ranged from 2-3 (including me), but we support much more than just hbase. We have an in-house built nightly backup system and persist all HLogs on an ongoing basis, so in 2-3 hours we can recover to within

Re: Hbase Unusable after auto split to 1024 regions

2014-11-06 Thread Bryan Beaudreault
us-west-2.compute.internal.log:2014-11-06 > >> 03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler > 46 > >> on 60020): (responseTooSlow): > >> > {"processingtimems":14620,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@4fc3bb1f > ), > >> rpc version=1, client version=29, > methodsFingerPrint=-540141542","client":" > >> 10.230.130.102:54068 > >> > ","starttimems":1415246057021,"queuetimems":27565,"class":"HRegionServer","responsesize":0,"method":"multi"} > >> > hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06 > >> 03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler > 35 > >> on 60020): (responseTooSlow): > >> > {"processingtimems":13431,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3b321922 > ), > >> rpc version=1, client version=29, > methodsFingerPrint=-540141542","client":" > >> 10.227.42.252:60493 > >> > ","starttimems":1415246058210,"queuetimems":1134,"class":"HRegionServer","responsesize":0,"method":"multi"} > >> On Nov 6, 2014, at 12:38 PM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > >> > wrote: > >> > >>> blockingStoreFiles > >> > >> > >

Re: Hbase Unusable after auto split to 1024 regions

2014-11-06 Thread Bryan Beaudreault
f retry > > Also the weediest behavior I have noticed about this lag/outage is that > the master Hbase daemon is eating all the CPU whereas before it barely had > more than a 1.0 load. Is it possible the master is in some way broken and > slowing everything down? > > -Pere &g

Re: Hbase Unusable after auto split to 1024 regions

2014-11-06 Thread Bryan Beaudreault
ml > > > > > Do I need to set this as well? > > > hbase.regionserver.logroll.period > 360 > hbase-default.xml > > > Thanks, > Pere > > > On Nov 6, 2014, at 11:23 AM, Bryan Beaudreault > wrote: > > > The default periodic flush is 1 hour. If y

Re: Hbase Unusable after auto split to 1024 regions

2014-11-06 Thread Bryan Beaudreault
The default periodic flush is 1 hour. If you have a lot of regions and your write distribution is not strictly uniform this can cause a lot of small flushes, as you are seeing. I tuned this up to 12 hours in my cluster, and may tune it up further. It made a big impact on the number of minor compa

Re: OOM when fetching all versions of single row

2014-11-03 Thread Bryan Beaudreault
There are many blog posts and articles about people turning for > 16GB heaps since java7 and the G1 collector became mainstream. We run with 25GB heap ourselves with very short GC pauses using a mostly untuned G1 collector. Just one example is the excellent blog post by Intel, https://software.in

  1   2   3   >