Hey all, over the past year we at HubSpot have invested heavily in the
Quotas feature of HBase. We've built it into our internal tooling, and also
pushed many contributions to improve it for all HBase users. These features
are all available in 2.6.0+.
Ray Mattingly drove most of these improvements
Ankit, thanks for starting this discussion. It'd be great to integrate
streaming of WAL edits to a backup destination. We've done this for years
internally at my company. It's critical to achieving only a few minutes of
RPO, but also complicated for us to maintain. Having it in hbase would
benefit
Congrats Ray! Keep up the good work
On Tue, Sep 10, 2024 at 9:12 AM Nick Dimiduk wrote:
> Congratulations Ray and thanks for all the contributions!
>
> On Tue, Sep 10, 2024 at 3:57 AM Viraj Jasani wrote:
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that
> > Ray Mattingly ha
We used to do this years ago at my company. You have to find the various
ProtobufUtil and RemoteException related classes, wherever the remote
classname string is parsed from a protobuf into a Class. You need to push
some custom code to rewrite the strings there. I think there were two or
three spo
+1 sounds good to me
On Sun, Jul 7, 2024 at 11:07 AM Duo Zhang wrote:
> As I mentioned in another thread, now slack will hide the comments
> before 90 days in the current apache-hbase.slack.com, which is really
> not good for finding useful discussions.
>
> According to the documentation here
>
Congrats and welcome!
On Wed, May 29, 2024 at 5:06 PM Viraj Jasani wrote:
> Congratulations and Welcome, Andor! Well deserved!!
>
>
> On Wed, May 29, 2024 at 7:36 AM Duo Zhang wrote:
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that
> > Andor Molnár(andor) has accepted the
Haha yes, I noticed that…after sending. Hopefully the title and various
other mentions of 2.6.0 can suffice this time :)
On Mon, May 20, 2024 at 10:08 PM 张铎(Duo Zhang)
wrote:
> Congratulations!
>
> But it seems you missed the replacement for the first '_version_'
> pl
The HBase team is happy to announce the immediate availability of HBase
_version_.
Apache HBase™ is an open-source, distributed, versioned, non-relational
database.
Apache HBase gives you low latency random access to billions of rows with
millions of columns atop non-specialized hardware. To learn
I also routinely build all branches of hbase on an apple M3, using hadoop3,
and Java 11+. I believe I’ve also built with java8 at some point, but don’t
quote me on that because we largely don’t use java8 at my company.
On Tue, Apr 30, 2024 at 4:48 PM Wei-Chiu Chuang wrote:
> I am on Apple M3, c
wrote:
> Hi.
> I am currently unable to start REGION using JDK17 directly. What
> adjustments do I need to make to use JDK17/21?
> Tks.
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2024-04-09 18:52:34,"Bryan Beaudreault" 写道:
> >We ran
We ran hbase under jdk17 for a few months. The only issue we saw was
https://issues.apache.org/jira/browse/HBASE-28206 which was fixed in 2.5.7.
More recently we’ve upgraded again to jdk21 to gain access to generational
zgc. That also has been working fine without any additional patches.
We’re ru
INDEX_BLOCK_ENCODING is a new feature, but just the configuration exists.
No actual encodings have been committed. Development on the PR stalled. See
https://github.com/apache/hbase/pull/4782
It would be great if someone picked up this work again.
On Tue, Mar 26, 2024 at 6:05 AM lisoda wrote:
>
Hello,
I'm unsure where the original email was sent, but I did not see it on our
user or dev lists. Sorry that it was missed.
The release was delayed due to the holidays, but I hope to have the first
release candidate for 2.6.0 generated this coming week. I'm now finishing
porting some final patc
ase 2.6 should
> be
> > able to support 3.2.x and 3.3.x. At the same time, IMO 3.2.x is also an
> > inactive release version, we can discuss if we should just change our
> base
> > of hadoop to 3.3.6 maybe starting from HBase 3.0+
> >
> > -Stephen
> >
>
Thanks all!
On Tue, Oct 17, 2023 at 2:44 PM Jan Hentschel
wrote:
> Welcome Bryan! Well deserved.
>
> From: Duo Zhang
> Date: Tuesday, October 17, 2023 at 3:32 AM
> To: HBase Dev List , hbase-user <
> user@hbase.apache.org>, user-zh
> Subject: [ANNOUNCE] Please welco
Congratulations and welcome!
On Thu, Mar 16, 2023 at 12:02 AM 张铎(Duo Zhang)
wrote:
> On behalf of the Apache HBase PMC, I am pleased to announce that Tianhang
> Tang(thangTang)
> has accepted the PMC's invitation to become a committer on the project. We
> appreciate all
> of Tianhang's generous
Taking a look at the git blame for the script, some of the parts you
reference are over 13 years old. So it may just be that they deserve some
updating. Anyway, you are not missing anything and your approach is both
safe and more graceful.
On Thu, Mar 9, 2023 at 8:47 PM Bryan Beaudreault
wrote
I can’t speak to why the script is the way it is. But I will say that my
company has been running hbase at massive scale with high reliability
standards for years. We’ve never used any of the built in shell scripts. We
have our own automation, and our HMaster rolling restart is more like what
you d
I sort of wonder if you have a log4j.properties and log4j2.properties on
the classpath. I think I remember having an issue when both existed, where
I'd see a lot more logs than I expect. Check for a log4j.properties and
remove it.
On Wed, Feb 8, 2023 at 4:51 AM 张铎(Duo Zhang) wrote:
> These are a
Congrats!
On Mon, Jan 30, 2023 at 4:10 AM Balazs Meszaros
wrote:
> Congratulations!
>
> On Mon, Jan 30, 2023 at 9:19 AM Jan Hentschel
> wrote:
>
> > Congratulations and welcome!
> >
> > From: Duo Zhang
> > Date: Monday, January 30, 2023 at 3:50 AM
> > To: HBase Dev List , hbase-user <
> > user
I think development is done on TLS. We are just waiting on requested
testing. Andor was working on that, but I believe he had some stuff come up
at his work.
I also want to get backups in place, but there is 1 backwards compatibility
issue to work through. Hoping to have that squared away soon.
O
Congrats!
On Sun, Dec 4, 2022 at 2:15 PM Wei-Chiu Chuang wrote:
> Congrats!
>
> On Sun, Dec 4, 2022 at 5:33 AM 宾莉金(binlijin) wrote:
>
> > Congratulations!
> >
> > 张铎(Duo Zhang) 于2022年12月3日周六 22:28写道:
> >
> > > Congratulations!
> > >
> > > Yu Li 于2022年12月3日周六 21:51写道:
> > > >
> > > > Hi All,
>
Should we have them sent to private@? Just thinking in terms of reducing
spam to users who put their email and full name on a public list.
One thought I had about bug tracking is whether we could use some sort of
github -> jira sync. I've seen them used before, where it automatically
syncs issues
Not currently. What's the use-case?
On Mon, Nov 28, 2022 at 2:47 AM Andrey Khozov wrote:
> Hello!
>
> Is there any way (API or shell command) to reset read/write statistics
> for regions in Region Servers?
>
> Thanks,
> Andrey
>
>
>
Congratulations!
On Mon, Apr 11, 2022 at 10:06 AM Huaxiang Sun wrote:
> Congratulations, Xiaolin!
>
> Sent from my iPhone
>
> > On Apr 11, 2022, at 2:30 AM, Jan Hentschel <
> jan.hentsc...@ultratendency.com> wrote:
> >
> > Congratulations and welcome!
> >
> > From: 张铎(Duo Zhang)
> > Date: Satu
Nick
>
> On Sat, Apr 9, 2022 at 04:45 张铎(Duo Zhang) wrote:
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that Bryan
> > Beaudreault(bbeaudreault) has accepted the PMC's invitation to become a
> > committer on the project. We appreciate all of Bryan
Hello,
Unfortunately I don’t have good guidance on what to tune this to. What I
can say though is that this feature will be disabled by default starting
with version 2.5.0. Part of the reason for that is we determined it is too
aggressive but didn’t yet have good guidance on a better default.
So
I have to agree anecdotally that when I first upgraded to 3.3.1 I got a
bunch of NoSuchMethodExceptions (or similar). This was a while ago so I
don't have specifics. It was an easy fix -- recompile hbase with
-Dhadoop.version=3.3.1.
On Mon, Dec 20, 2021 at 9:46 PM Wei-Chiu Chuang
wrote:
> tl; dr
and voting, for necessary
> deprecation-release-removal-release cycles where termonology changes impact
> one or more of our compatibility guidelines.
>
> What has been missing since this thread closed with this conclusion?
>
> Actual patches.
>
> It's quite easy to
Sorry to revive a very old thread, but I just stumbled across this and
don't see a clear resolution. I wonder if we should create a JIRA from
Andrew's summary and treat that as an umbrella encompassing the original 3
JIRAs? I'm also cognizant of the fact that there are rumblings of doing an
initial
It's been decided in a thread on the dev list that protobuf should be
considered InterfaceAudience.PRIVATE. See here:
https://lists.apache.org/thread.html/r9e6eb11106727d245f6eb2a5023823901637971d6ed0f0aedaf8d149%40%3Cdev.hbase.apache.org%3E
So with that being said, this is mostly just to put out
the data
> so cannot justify the exorbitant cost per node that cloudera are asking for
> later versions.
>
> -Original Message-
> From: Bryan Beaudreault
> Sent: Wednesday, May 19, 2021 2:49 PM
> To: user@hbase.apache.org
> Subject: Upgrading cdh5.16.2 to apache hbase
We are running about 40 HBase clusters, with over 5000 regionservers total.
These are all running cdh5.16.2. We also have thousands of clients (from
APIs to kafka workers to hadoop jobs, etc) hitting these various clusters,
also running cdh5.16.2.
We are starting to plan an upgrade to hbase 2.x an
re of this in a
> seamless manager for our clients.
>
> Example: Actual row key --> *0QUPHSBTLGM*, and client requested a 3 digit
> prefix based on table region range (000 - 999), would translate to
> *115-0QUPHSBTLGM* with murmurhash
>
> ---
> Mallikarjun
>
>
> On Tue, May 1
That's weird! Can you invite bbeaudrea...@gmail.com?
On Mon, May 17, 2021 at 9:39 PM 张铎(Duo Zhang) wrote:
> Could you please give us an email address?
>
> bbeaudrea...@hubspot.com.invalid
>
> Is this the expected one? There is an 'invalid' in it...
>
> Bryan
Is there an existing user group slack of hbase users? If so can I have an
invite?
Thanks!
Hey all,
We run a bunch of big hbase clusters that get used by hundreds of product
teams for a variety of real-time workloads. We are a B2B company, so most
data has a customerId somewhere in the rowkey. As the team that owns the
hbase infrastructure, we try to help product teams properly design s
Is the backported patch available anywhere? Not seeing it on the referenced
JIRA. If it ends up not getting officially backported to branch-1 due to
2.0 around the corner, some of us who build our own deploy may want to
integrate into our builds. Thanks! These numbers look great
On Fri, Nov 18, 20
Hello,
I'm wondering if the favored nodes implementation in HBase 1.2-cdh5.9 is
ready or documented anywhere. I can't find anything official, but there are
plenty of class files that look to work with it. I know Yahoo has recently
renewed development on a V2 version, but does the V1 work?
Thanks
Great, thanks stack!
On Tue, Aug 23, 2016 at 12:54 AM Stack wrote:
> On Mon, Aug 22, 2016 at 3:02 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > In HBase 1.2.x and higher you can call setMaxResultSize on a Scan to
> limit
> > the impact of
In HBase 1.2.x and higher you can call setMaxResultSize on a Scan to limit
the impact of scans that are too aggressive, by bailing out at a certain
size response. The client side will nicely splice together all of the
isPartial responses to create a full one as well, pushing the danger to the
clien
I'm also interested in an answer here. We see this from time to time in our
production HBase clusters (non-opentsdb). It seems to be related to
contention under heavy reads or heavy writes. But it's not clear what the
impact is here.
On Fri, Aug 5, 2016 at 5:14 AM Sterfield wrote:
> Hi,
>
> I'm
and give a single Result object, because it
> will cause OOM.
>
> Hope this helps.
> Enis
>
> On Fri, Jun 17, 2016 at 4:15 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > Hello,
> >
> > We are running 1.2.0-cdh5.7.0 on our ser
Hello,
We are running 1.2.0-cdh5.7.0 on our server side, and 1.0.0-cdh5.4.5 on the
client side. We're in the process of upgrading the client, but aren't there
yet. I'm trying to figure out the relationship of Result.isPartial and the
user, when setMaxResultSize is used.
I've done a little reading
online alter
> table is not different at all. The "hbase.online.schema.update.enable"
> property was fixing some possible race conditions that were fixed long time
> ago.
>
> We should update the documentation. Mind creating a small patch?
> Enis
>
> On Mon, Ju
n and test it before using in production.
>
> FYI
>
> On Mon, Jun 6, 2016 at 12:19 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > Hello,
> >
> > We're running hbase 1.2.0-cdh5.7.0. According to the HBase book, in order
> > t
Hello,
We're running hbase 1.2.0-cdh5.7.0. According to the HBase book, in order
to enable stripe compactions on a table we need to first disable the table. We
basically can't disable tables in production. Is it possible to do this
without disabling the table? If not, are there any plans to make
Hello,
There is very little information that I can find online with regards to
recommended dfs.block.size setting for HBase. Often it conflates with the
HBase blocksize, which we know should be smaller. Any chance we can get
some recommendations for dfs.block.size?
The default shipped with HDFS
ake a difference.
>
> Hang on... will be back in a sec... just sending this in meantime...
>
> St.Ack
>
> On Mon, May 23, 2016 at 12:20 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
> > For reference, the Scan backing the job is pretty basic:
> &
rwise it is using the out-of-the-box TableInputFormat.
On Mon, May 23, 2016 at 3:13 PM Bryan Beaudreault
wrote:
> I've forced the issue to happen again. netstat takes a while to run on
> this host while it's happening, but I do not see an abnormal amount of
> CLOSE_WAIT
s I
know uses the HBase RPC and does not hit HDFS directly at all. Is it
possible that a long running scan (one with many, many next() calls) could
keep some references to HDFS open for the duration of the overall scan?
On Mon, May 23, 2016 at 2:19 PM Bryan Beaudreault
wrote:
> We run MR agains
igs between CDH5.3.8 and 5.7.0
are identical for us.
On Mon, May 23, 2016 at 2:03 PM Stack wrote:
> On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > Hey everyone,
> >
> > We are noticing a file descriptor leak th
are hitting does not appear to
affect 0.98/CDH5.3.8. We also never saw it when we were on 0.94. This seems
new in either 1.0+ or 1.2+.
On Mon, May 23, 2016 at 12:59 PM Ted Yu wrote:
> Have you taken a look at HBASE-9393 ?
>
> On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault <
Hey everyone,
We are noticing a file descriptor leak that is only affecting nodes in our
cluster running 5.7.0, not those still running 5.3.8. I ran an lsof against
an affected regionserver, and noticed that there were 10k+ unix sockets
that are just called "socket", as well as another 10k+ of the
er with BufferedMutator or
> separately with just direct Put's?
>
> Thanks.
>
> ----
> Saad
>
>
>
>
> On Wed, Apr 27, 2016 at 2:22 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > Hey Ted,
> >
> > Actually
gt; Does the above mean that hbase servers are still using ParallelGC ?
>
> Thanks
>
> On Wed, Apr 27, 2016 at 7:39 AM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > We have 6 production clusters and all of them are tuned differently, so
> I'
hat is interesting. Would it be possible for you to share what GC settings
> you ended up on that gave you the most predictable performance?
>
> Thanks.
>
>
> Saad
>
>
> On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrot
We were seeing this for a while with our CDH5 HBase clusters too. We
eventually correlated it very closely to GC pauses. Through heavily tuning
our GC we were able to drastically reduce the logs, by keeping most GC's
under 100ms.
On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti wrote:
> From what I ca
Hello,
We've been using CDH community for a few years now, but have reached the
point where we want to be able to do some development and backporting
upstream. We'd like to keep the base CDH packaging for consistency, but
have come up with some automation around building CDH from source RPM, with
Recently we upgraded to CDH5.3.8, and needed to use CDAP Readless
Increments (https://github.com/caskdata/cdap) to overcome the
recently-fixed performance regression around Increment.
We are now looking to upgrade to CDH5.5.x, and have attempted to upgrade 1
slave in our 5.3.8 cluster to 5.5.0. U
t;
> ... and other is doing:
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:3593)
>
> Not many increments going on.
>
>
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> is two increments too in same places. Is it stuck?
>
ey);
}
On Tue, Dec 1, 2015 at 12:08 AM Stack wrote:
> Looking at that stack trace, nothing showing as blocked or slowed by
> another operation. You have others I could look at Bryan?
> St.Ack
>
> On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
000ms range. It seems we may be blocking on FSHLog#syncer.
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
On Mon, Nov 30, 2015 at 11:26 PM Stack wrote:
> Still slow increments though?
>
> On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault
Those log lines have settled down, they may have been related to a
cluster-wide forced restart at the time.
On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault
wrote:
> We've been doing more debugging of this and have set up the read vs write
> handlers to try to at least segment t
at 6:11 PM Bryan Beaudreault
wrote:
> Sorry the second link should be
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
>
> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
>> https://gist.gith
Sorry the second link should be
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault
wrote:
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>
> An active handler:
> https://gist.github.com/
t 2:31 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > The rollback seems to have mostly solved the issue for one of our
> clusters,
> > but another one is still seeing long increment times:
> >
> > "slowIncrementCount": 52
ion for increments between CDH4.7.1 and CDH5.3.8?
On Mon, Nov 30, 2015 at 4:13 PM Stack wrote:
> On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
> > Should this be added as a known issue in the CDH or hbase documentation?
> It
>
Should this be added as a known issue in the CDH or hbase documentation? It
was a severe performance hit for us, all of our regionservers were sitting
at a few thousand queued requests.
On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault
wrote:
> Yea, they are all over the place and called f
Stack wrote:
> Rollback is untested. No fix in 5.5. I was going to work on this now. Where
> are your counters Bryan? In their own column family or scattered about in a
> row with other Cell types?
> St.Ack
>
> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> bbeaudr
Is there any update to this? We just upgraded all of our production
clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
known issues, did not not about this. Now we are seeing perfomance issues
across all clusters, as we make heavy use of increments.
Can we roll forward to CDH5
Our automation uses a combination of the following to determine what to
compact:
- Which regions have bad locality (% of blocks are local vs remote, using
HDFS getBlockLocations APIs)
- Which regions have the most number of HFiles (most files per region/cf
directory)
- Which regions have gone the
Just had to say, https://issues.apache.org/jira/browse/HBASE-13103 looks
*AWESOME*
On Thu, Jun 18, 2015 at 5:00 PM Mikhail Antonov
wrote:
> Yeah, I could see 2 reasons for remaining few regions to take
> unproportionally long time - 1) those regions are unproportionally
> large (you should be ab
I agree that more documentation would be better. However,
> Yet, there are some applications which require a faster time out than
> others. So, you tune some of the timers to have a fast fail, and you end up
> causing unintended problems for others.
>
> The simplest solution is to use threads in
I wouldn't say it is recommended, but it is certainly possible to override
the version timestamp at write time. You might be able to use this to
provide the uniqueness you need (i.e. instead of using epoch timestamp,
use one based more recently and add digits for uniqueness at the end).
We've do
ion total 19420524K, used 7807614K
> [0x0001f4f9, 0x0006964eb000, 0x0007fae0)
> concurrent-mark-sweep perm gen total 51000K, used 30498K
> [0x0007fae0, 0x0007fdfce000, 0x0008)
>
> Thanks,
> Rahul
>
>
> On Fri, May 29, 2015 at 8:
>
> 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We
> slept
> 14912ms instead of 3000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
I would check your gc logs for long gc
ugh if I
> remember correctly, also you need to define hbase.master.hostname if you
> are using HBase 1.1
>
> cheers,
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Fri, May 22, 2015 at 12:55 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
> &g
d HBASE-12954 so it depends on which version of
> > 1.x
> > > you are using.
> > >
> > > Regarding your account issue, I have created an INFRA JIRA on your
> behalf
> > > to look into your account problem.
> > >
> > > thanks,
>
On Fri, May 22, 2015 at 3:34 PM, Ted Yu wrote:
> bq. hbase-1.1.0.1
>
> To my knowledge, latest release was 1.1.0. The release before that was
> 1.0.1
>
> Can you clarify ?
>
> Thanks
>
> On Fri, May 22, 2015 at 12:23 PM, Bryan Beaudreault <
> bbeaudrea...@hub
> --
> Cloudera, Inc.
>
>
> On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
> > In our system each server has 2 dns associated with it, one always points
> > to a private address and the other to public or private d
In our system each server has 2 dns associated with it, one always points
to a private address and the other to public or private depending on the
context.
This issue did not show up in 0.94.x, but is showing up on my new 1.x
cluster. Basically it goes like this:
1. Regionserver starts up, get's
ry stuff.
> > Note: my project is not related to migration from 0.94 to 1.0. But, i am
> > supporting the argument for moving MR code in client or a separate
> > artifact.
> >
> > On Thu, May 14, 2015 at 9:43 AM, Bryan Beaudreault <
> > bbeaudrea...@hubspot.
is an ok compromise where
necessary.
I can put JIRAs in for these if it makes sense
On Tue, May 5, 2015 at 10:48 PM, Bryan Beaudreault wrote:
> Thanks for the response guys!
>
> You've done a review of HTI in 1.0 vs 0.94 to make sure we've not
>> mistakenly dropped
After moving to the G1GC we were plagued with random OOMs from time to
time. We always thought it was due to people requesting a big row or group
of rows, but upon investigation noticed that the heap dumps were many GBs
less than the max heap at time of OOM. If you have this symptom, you may
be r
Major compactions will restore locality to the cluster.
On Sat, May 9, 2015 at 3:36 PM, Michael Segel
wrote:
> First, understand why you had to create an ‘auto restart’ script.
>
> Taking down HBase completely (probably including zookeeper) and do a full
> restart would probably fix the issue of
n guys. I'm expecting this will be a drawn out process
considering our scope, but will be happy to keep updates here as I proceed.
On Tue, May 5, 2015 at 10:31 PM, Esteban Gutierrez
wrote:
> Just to a little bit to what StAck said:
>
> --
> Cloudera, Inc.
>
>
>
Hello,
I'm about to start tackling our upgrade path for 0.94 to 1.0+. We have 6
production hbase clusters, 2 hadoop clusters, and hundreds of
APIs/daemons/crons/etc hitting all of these things. Many of these clients
hit multiple clusters in the same process. Daunting to say the least.
We can't
ading this HDFS-3703, trying to get more context, but looks to me
> so far that this is one of those things where you should decide what are
> your own acceptable tradeoffs.
>
> On Mon, Mar 23, 2015 at 4:40 PM Bryan Beaudreault <
> bbeaudrea...@hubspot.com>
> wrote:
>
&
light on what
> > Nicolas is talking about -
> https://issues.apache.org/jira/browse/HDFS-3703
> >
> > On Mon, Mar 23, 2015 at 3:53 PM Bryan Beaudreault <
> > bbeaudrea...@hubspot.com>
> > wrote:
> >
> > > So it is safe to set hbase.lease.reco
So it is safe to set hbase.lease.recovery.timeout lower if you also
set heartbeat.recheck.interval lower (lowering that 10.5 min dead node
timer)? Or is it recommended to not touch either of those?
Reading the above with interest, thanks for digging in here guys.
On Mon, Mar 23, 2015 at 10:13 AM
Super useful, thanks Dave!
On Wed, Mar 18, 2015 at 11:25 PM, Dave Latham wrote:
> If you haven't already seen it - take a look at the bridge at
> https://issues.apache.org/jira/browse/HBASE-12814
> We're using it to go through the process now.
>
> Dave
>
> On Wed,
My only complaint about this poll is the labels: "0.94.x - I like stable
releases". It's not really about the stable releases for me, it's more
about the extremely difficulty of overcoming "the singularity" from 0.94 ->
0.96+ with no downtime in a reasonably complex production system.
Hortonwork's
You should run with a backup master in a production cluster. The failover
process works very well and will cause no downtime. I've done it literally
hundreds of times across our multiple production hbase clusters.
Even if you don't have a backup master, you should still be fine with
restarting t
At HubSpot we have 5 customer facing production clusters 30-60TB+ each. Our
Data Ops team has ranged from 2-3 (including me), but we support much more
than just hbase. We have an in-house built nightly backup system and
persist all HLogs on an ongoing basis, so in 2-3 hours we can recover to
within
us-west-2.compute.internal.log:2014-11-06
> >> 03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler
> 46
> >> on 60020): (responseTooSlow):
> >>
> {"processingtimems":14620,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@4fc3bb1f
> ),
> >> rpc version=1, client version=29,
> methodsFingerPrint=-540141542","client":"
> >> 10.230.130.102:54068
> >>
> ","starttimems":1415246057021,"queuetimems":27565,"class":"HRegionServer","responsesize":0,"method":"multi"}
> >>
> hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06
> >> 03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler
> 35
> >> on 60020): (responseTooSlow):
> >>
> {"processingtimems":13431,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3b321922
> ),
> >> rpc version=1, client version=29,
> methodsFingerPrint=-540141542","client":"
> >> 10.227.42.252:60493
> >>
> ","starttimems":1415246058210,"queuetimems":1134,"class":"HRegionServer","responsesize":0,"method":"multi"}
> >> On Nov 6, 2014, at 12:38 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> >> > wrote:
> >>
> >>> blockingStoreFiles
> >>
> >>
>
>
f retry
>
> Also the weediest behavior I have noticed about this lag/outage is that
> the master Hbase daemon is eating all the CPU whereas before it barely had
> more than a 1.0 load. Is it possible the master is in some way broken and
> slowing everything down?
>
> -Pere
&g
ml
>
>
>
>
> Do I need to set this as well?
>
>
> hbase.regionserver.logroll.period
> 360
> hbase-default.xml
>
>
> Thanks,
> Pere
>
>
> On Nov 6, 2014, at 11:23 AM, Bryan Beaudreault
> wrote:
>
> > The default periodic flush is 1 hour. If y
The default periodic flush is 1 hour. If you have a lot of regions and your
write distribution is not strictly uniform this can cause a lot of small
flushes, as you are seeing. I tuned this up to 12 hours in my cluster, and
may tune it up further. It made a big impact on the number of minor
compa
There are many blog posts and articles about people turning for > 16GB
heaps since java7 and the G1 collector became mainstream. We run with 25GB
heap ourselves with very short GC pauses using a mostly untuned G1
collector. Just one example is the excellent blog post by Intel,
https://software.in
1 - 100 of 232 matches
Mail list logo