Re: Optimizing compactions on super-low-cost HW

Serega Sheypak Fri, 22 May 2015 01:19:43 -0700

>What version of hbase are you on?
We are on CDH 5.2.1 HBase 0.98


>These hfiles are created on same cluster with MR? (i.e. they are using up
i/os)
The same cluster :) They are created during night and we get IO degradation
when no MR runs. I understand, that MR also gives significant IO pressure.

>Can you cache more?
Don't understand, can you explain? Row cache enabled for all tables which
apps read.

>Can you make it so files are bigger before you flush?
How can I reach that? increase memstore size?

>the traffic is not so heavy?
During night is 3-4 times less. I run major compactions during night.

>You realize that a major compaction will do full rewrite of your dataset?
I do

> When they run, how many storefiles are there?
How can I measure that? Goto hdfs and count files in table catalog?

Do you have to run once a day?  Can you not run once a week?
Maybe if there is no significant read performance penalty

> Enable deferring sync'ing firs
Will try...

2015-05-21 23:04 GMT+03:00 Stack <st...@duboce.net>:

> On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
>
> > > Do you have the system sharing
> > There are 2 HDD 7200 2TB each. There is 300GB OS partition on each drive
> > with mirroring enabled. I can't persuade devops that mirroring could
> cause
> > IO issues. What arguments can I bring? They use OS partition mirroring
> when
> > disck fails, we can use other partition to boot OS and continue to
> work...
> >
> >
> You are already compromised i/o-wise having two disks only. I have not the
> experience to say for sure but basic physics would seem to dictate that
> having your two disks (partially) mirrored compromises your i/o even more.
>
> You are in a bit of a hard place. Your operators want the machine to boot
> even after it loses 50% of its disk.
>
>
> > >Do you have to compact? In other words, do you have read SLAs?
> > Unfortunately, I have mixed workload from web applications. I need to
> write
> > and read and SLA is < 50ms.
> >
> >
> Ok. You get the bit that seeks are about 10ms or each so with two disks you
> can do 2x100 seeks a second presuming no one else is using disk.
>
>
> > >How are your read times currently?
> > Cloudera manager says it's 4K reads per second and 500 writes per second
> >
> > >Does your working dataset fit in RAM or do
> > reads have to go to disk?
> > I have several tables for 500GB each and many small tables 10-20 GB.
> Small
> > tables loaded hourly/daily using bulkload (prepare HFiles using MR and
> move
> > them to HBase using utility). Big tables are used by webapps, they read
> and
> > write them.
> >
> >
> These hfiles are created on same cluster with MR? (i.e. they are using up
> i/os)
>
>
> > >It looks like you are running at about three storefiles per column
> family
> > is it hbase.hstore.compactionThreshold=3?
> >
>
>
> > >What if you upped the threshold at which minors run?
> > you mean bump  hbase.hstore.compactionThreshold to 8 or 10?
> >
> >
> Yes.
>
> Downside is that your reads may require more seeks to find a keyvalue.
>
> Can you cache more?
>
> Can you make it so files are bigger before you flush?
>
>
>
> > >Do you have a downtime during which you could schedule compactions?
> > Unfortunately no. It should work 24/7 and sometimes it doesn't do it.
> >
> >
> So, it is running at full bore 24/7?  There is no 'downtime'... a time when
> the traffic is not so heavy?
>
>
>
> > >Are you managing the major compactions yourself or are you having hbase
> do
> > it for you?
> > HBase, once a day hbase.hregion.majorcompaction=1day
> >
> >
> Have you studied your compactions?  You realize that a major compaction
> will do full rewrite of your dataset?  When they run, how many storefiles
> are there?
>
> Do you have to run once a day?  Can you not run once a week?  Can you
> manage the compactions yourself... and run them a region at a time in a
> rolling manner across the cluster rather than have them just run whenever
> it suits them once a day?
>
>
>
> > I can disable WAL. It's ok to loose some data in case of RS failure. I'm
> > not doing banking transactions.
> > If I disable WAL, could it help?
> >
> >
> It could but don't. Enable deferring sync'ing first if you can 'lose' some
> data.
>
> Work on your flushing and compactions before you mess w/ WAL.
>
> What version of hbase are you on? You say CDH but the newer your hbase, the
> better it does generally.
>
> St.Ack
>
>
>
>
>
> > 2015-05-20 18:04 GMT+03:00 Stack <st...@duboce.net>:
> >
> > > On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak <
> > serega.shey...@gmail.com>
> > > wrote:
> > >
> > > > Hi, we are using extremely cheap HW:
> > > > 2 HHD 7200
> > > > 4*2 core (Hyperthreading)
> > > > 32GB RAM
> > > >
> > > > We met serious IO performance issues.
> > > > We have more or less even distribution of read/write requests. The
> same
> > > for
> > > > datasize.
> > > >
> > > > ServerName Request Per Second Read Request Count Write Request Count
> > > > node01.domain.com,60020,1430172017193 195 171871826 16761699
> > > > node02.domain.com,60020,1426925053570 24 34314930 16006603
> > > > node03.domain.com,60020,1430860939797 22 32054801 16913299
> > > > node04.domain.com,60020,1431975656065 33 1765121 253405
> > > > node05.domain.com,60020,1430484646409 27 42248883 16406280
> > > > node07.domain.com,60020,1426776403757 27 36324492 16299432
> > > > node08.domain.com,60020,1426775898757 26 38507165 13582109
> > > > node09.domain.com,60020,1430440612531 27 34360873 15080194
> > > > node11.domain.com,60020,1431989669340 28 44307 13466
> > > > node12.domain.com,60020,1431927604238 30 5318096 2020855
> > > > node13.domain.com,60020,1431372874221 29 31764957 15843688
> > > > node14.domain.com,60020,1429640630771 41 36300097 13049801
> > > >
> > > > ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed
> > > > Storefile
> > > > Size Index Size Bloom Size
> > > > node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb
> 641849k
> > > > 310111k
> > > > node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb
> 649610k
> > > > 318854k
> > > > node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb
> 627346k
> > > > 307136k
> > > > node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb
> 655954k
> > > > 289316k
> > > > node05.domain.com,60020,1430484646409 82 185 1111807m 81474mb
> 688136k
> > > > 334127k
> > > > node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb
> 631774k
> > > > 296169k
> > > > node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb
> 681486k
> > > > 312325k
> > > > node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb
> 658924k
> > > > 309734k
> > > > node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb
> 664753k
> > > > 264081k
> > > > node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb
> 652970k
> > > > 304137k
> > > > node13.domain.com,60020,1431372874221 82 178 937557m 70042mb 601684k
> > > > 257607k
> > > > node14.domain.com,60020,1429640630771 82 145 949090m 69749mb 592812k
> > > > 266677k
> > > >
> > > >
> > > > When compaction starts  random node gets I/O 100%, io wait for
> seconds,
> > > > even tenth of seconds.
> > > >
> > > > What are the approaches to optimize minor and major compactions when
> > you
> > > > are I/O bound..?
> > > >
> > >
> > > Yeah, with two disks, you will be crimped. Do you have the system
> sharing
> > > with hbase/hdfs or is hdfs running on one disk only?
> > >
> > > Do you have to compact? In other words, do you have read SLAs?  How are
> > > your read times currently?  Does your working dataset fit in RAM or do
> > > reads have to go to disk?  It looks like you are running at about three
> > > storefiles per column family.  What if you upped the threshold at which
> > > minors run? Do you have a downtime during which you could schedule
> > > compactions? Are you managing the major compactions yourself or are you
> > > having hbase do it for you?
> > >
> > > St.Ack
> > >
> >
>

Re: Optimizing compactions on super-low-cost HW

Reply via email to