Ok, got it. Thank you. 2015-05-25 7:58 GMT+03:00 lars hofhansl <la...@apache.org>:
> Re: blockingStoreFiles > With LSM stores you do not get a smooth behavior when you continuously try > to pump more data into the cluster than the system can absorb. > For a while the memstores can absorb the write in RAM, then they need to > flush. If compactions cannot keep up with the influx of new HFiles, you > have two choices: (1) you allow the number of the HFiles to grow at the > expense of read performance, or (2) you tell the clients to slow down > (there are various levels of sophistication about how you do that, but > that's besides the point). > blockingStoreFiles is the maximum number of files (per store, i.e. per > column family) that HBase will allow to accumulate before it stops > accepting writes from the clients.In 0.94 it would simply block for a > while. In 0.98 it throws an exception back to the client to tell it to back > off. > -- Lars > > From: Serega Sheypak <serega.shey...@gmail.com> > To: user <user@hbase.apache.org>; lars hofhansl <la...@apache.org> > Sent: Sunday, May 24, 2015 12:59 PM > Subject: Re: Optimizing compactions on super-low-cost HW > > Hi, thanks! > > hbase.hstore.blockingStoreFiles > Don't understand the idea of this setting, can I find explanation for > "dummies"? > > >hbase.hregion.majorcompaction > done already > > >DATA_BLOCK_ENCODING, SNAPPY > I always use it by default, CPU OK > > > memstore flush size > done > > > >I assume only the 300g partitions are mirrored, right? (not the entire 2t > drive) > Aha > > >Can you add more machines? > Will do it when earn money. > Thank you :) > > > > 2015-05-24 21:42 GMT+03:00 lars hofhansl <la...@apache.org>: > > > Yeah, all you can do is drive your write amplification down. > > > > > > As Stack said: > > - Increase hbase.hstore.compactionThreshold, and > > hbase.hstore.blockingStoreFiles. It'll hurt read, but in your case read > is > > already significantly hurt when compactions happen. > > > > > > - Absolutely set hbase.hregion.majorcompaction to 1 week (with a jitter > if > > 1/2 week, that's the default in 0.98 and later). Minor compaction will > > still happen, based on the compactionThreshold setting. Right now you're > > rewriting _all_ you data _every_ day. > > > > > > - Turning off WAL writing will safe you IO, but I doubt it'll help much. > I > > do not expect async WAL helps a lot as the aggregate IO is still the > same. > > > > - See if you can enable DATA_BLOCK_ENCODING on your column families > > (FAST_DIFF, or PREFIX are good). You can also try SNAPPY compression. > That > > would reduce you overall IO (Since your CPUs are also weak you'd have to > > test the CPU/IO tradeoff) > > > > > > - If you have RAM to spare, increase the memstore flush size (will lead > to > > initially larger and fewer files). > > > > > > - Or (again if you have spare RAM) make your regions smaller, to curb > > write amplification. > > > > > > - I assume only the 300g partitions are mirrored, right? (not the entire > > 2t drive) > > > > > > I have some suggestions compiled here (if you don't mind the plug): > > > http://hadoop-hbase.blogspot.com/2015/05/my-hbasecon-talk-about-hbase.html > > > > Other than that, I'll repeat what others said, you have 14 extremely weak > > machines, you can't expect the world from this. > > You're aggregate IOPS are less than 3000, you aggregate IO bandwidth > > ~3GB/s. Can you add more machines? > > > > > > -- Lars > > > > ________________________________ > > From: Serega Sheypak <serega.shey...@gmail.com> > > To: user <user@hbase.apache.org> > > Sent: Friday, May 22, 2015 3:45 AM > > Subject: Re: Optimizing compactions on super-low-cost HW > > > > > > We don't have money, these nodes are the cheapest. I totally agree that > we > > need 4-6 HDD, but there is no chance to get it unfortunately. > > Okay, I'll try yo apply Stack suggestions. > > > > > > > > > > 2015-05-22 13:00 GMT+03:00 Michael Segel <michael_se...@hotmail.com>: > > > > > Look, to be blunt, you’re screwed. > > > > > > If I read your cluster spec.. it sounds like you have a single i7 (quad > > > core) cpu. That’s 4 cores or 8 threads. > > > > > > Mirroring the OS is common practice. > > > Using the same drives for Hadoop… not so good, but once the sever boots > > > up… not so much I/O. > > > Its not good, but you could live with it…. > > > > > > Your best bet is to add a couple of more spindles. Ideally you’d want > to > > > have 6 drives. the 2 OS drives mirrored and separate. (Use the extra > > space > > > to stash / write logs.) Then have 4 drives / spindles in JBOD for > Hadoop. > > > This brings you to a 1:1 on physical cores. If your box can handle > more > > > spindles, then going to a total of 10 drives would improve performance > > > further. > > > > > > However, you need to level set your expectations… you can only go so > far. > > > If you have 4 drives spinning, you could start to saturate a 1GbE > > network > > > so that will hurt performance. > > > > > > That’s pretty much your only option in terms of fixing the hardware and > > > then you have to start tuning. > > > > > > > On May 21, 2015, at 4:04 PM, Stack <st...@duboce.net> wrote: > > > > > > > > On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak < > > > serega.shey...@gmail.com> > > > > wrote: > > > > > > > >>> Do you have the system sharing > > > >> There are 2 HDD 7200 2TB each. There is 300GB OS partition on each > > drive > > > >> with mirroring enabled. I can't persuade devops that mirroring could > > > cause > > > >> IO issues. What arguments can I bring? They use OS partition > mirroring > > > when > > > >> disck fails, we can use other partition to boot OS and continue to > > > work... > > > >> > > > >> > > > > You are already compromised i/o-wise having two disks only. I have > not > > > the > > > > experience to say for sure but basic physics would seem to dictate > that > > > > having your two disks (partially) mirrored compromises your i/o even > > > more. > > > > > > > > You are in a bit of a hard place. Your operators want the machine to > > boot > > > > even after it loses 50% of its disk. > > > > > > > > > > > >>> Do you have to compact? In other words, do you have read SLAs? > > > >> Unfortunately, I have mixed workload from web applications. I need > to > > > write > > > >> and read and SLA is < 50ms. > > > >> > > > >> > > > > Ok. You get the bit that seeks are about 10ms or each so with two > disks > > > you > > > > can do 2x100 seeks a second presuming no one else is using disk. > > > > > > > > > > > >>> How are your read times currently? > > > >> Cloudera manager says it's 4K reads per second and 500 writes per > > second > > > >> > > > >>> Does your working dataset fit in RAM or do > > > >> reads have to go to disk? > > > >> I have several tables for 500GB each and many small tables 10-20 GB. > > > Small > > > >> tables loaded hourly/daily using bulkload (prepare HFiles using MR > and > > > move > > > >> them to HBase using utility). Big tables are used by webapps, they > > read > > > and > > > >> write them. > > > >> > > > >> > > > > These hfiles are created on same cluster with MR? (i.e. they are > using > > up > > > > i/os) > > > > > > > > > > > >>> It looks like you are running at about three storefiles per column > > > family > > > >> is it hbase.hstore.compactionThreshold=3? > > > >> > > > > > > > > > > > >>> What if you upped the threshold at which minors run? > > > >> you mean bump hbase.hstore.compactionThreshold to 8 or 10? > > > >> > > > >> > > > > Yes. > > > > > > > > Downside is that your reads may require more seeks to find a > keyvalue. > > > > > > > > Can you cache more? > > > > > > > > Can you make it so files are bigger before you flush? > > > > > > > > > > > > > > > >>> Do you have a downtime during which you could schedule compactions? > > > >> Unfortunately no. It should work 24/7 and sometimes it doesn't do > it. > > > >> > > > >> > > > > So, it is running at full bore 24/7? There is no 'downtime'... a > time > > > when > > > > the traffic is not so heavy? > > > > > > > > > > > > > > > >>> Are you managing the major compactions yourself or are you having > > > hbase do > > > >> it for you? > > > >> HBase, once a day hbase.hregion.majorcompaction=1day > > > >> > > > >> > > > > Have you studied your compactions? You realize that a major > compaction > > > > will do full rewrite of your dataset? When they run, how many > > storefiles > > > > are there? > > > > > > > > Do you have to run once a day? Can you not run once a week? Can you > > > > manage the compactions yourself... and run them a region at a time > in a > > > > rolling manner across the cluster rather than have them just run > > whenever > > > > it suits them once a day? > > > > > > > > > > > > > > > >> I can disable WAL. It's ok to loose some data in case of RS failure. > > I'm > > > >> not doing banking transactions. > > > >> If I disable WAL, could it help? > > > >> > > > >> > > > > It could but don't. Enable deferring sync'ing first if you can 'lose' > > > some > > > > data. > > > > > > > > Work on your flushing and compactions before you mess w/ WAL. > > > > > > > > What version of hbase are you on? You say CDH but the newer your > hbase, > > > the > > > > better it does generally. > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > >> 2015-05-20 18:04 GMT+03:00 Stack <st...@duboce.net>: > > > >> > > > >>> On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak < > > > >> serega.shey...@gmail.com> > > > >>> wrote: > > > >>> > > > >>>> Hi, we are using extremely cheap HW: > > > >>>> 2 HHD 7200 > > > >>>> 4*2 core (Hyperthreading) > > > >>>> 32GB RAM > > > >>>> > > > >>>> We met serious IO performance issues. > > > >>>> We have more or less even distribution of read/write requests. The > > > same > > > >>> for > > > >>>> datasize. > > > >>>> > > > >>>> ServerName Request Per Second Read Request Count Write Request > Count > > > >>>> node01.domain.com,60020,1430172017193 195 171871826 16761699 > > > >>>> node02.domain.com,60020,1426925053570 24 34314930 16006603 > > > >>>> node03.domain.com,60020,1430860939797 22 32054801 16913299 > > > >>>> node04.domain.com,60020,1431975656065 33 1765121 253405 > > > >>>> node05.domain.com,60020,1430484646409 27 42248883 16406280 > > > >>>> node07.domain.com,60020,1426776403757 27 36324492 16299432 > > > >>>> node08.domain.com,60020,1426775898757 26 38507165 13582109 > > > >>>> node09.domain.com,60020,1430440612531 27 34360873 15080194 > > > >>>> node11.domain.com,60020,1431989669340 28 44307 13466 > > > >>>> node12.domain.com,60020,1431927604238 30 5318096 2020855 > > > >>>> node13.domain.com,60020,1431372874221 29 31764957 15843688 > > > >>>> node14.domain.com,60020,1429640630771 41 36300097 13049801 > > > >>>> > > > >>>> ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed > > > >>>> Storefile > > > >>>> Size Index Size Bloom Size > > > >>>> node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb > > 641849k > > > >>>> 310111k > > > >>>> node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb > > 649610k > > > >>>> 318854k > > > >>>> node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb > > 627346k > > > >>>> 307136k > > > >>>> node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb > > 655954k > > > >>>> 289316k > > > >>>> node05.domain.com,60020,1430484646409 82 185 1111807m 81474mb > > 688136k > > > >>>> 334127k > > > >>>> node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb > > 631774k > > > >>>> 296169k > > > >>>> node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb > > 681486k > > > >>>> 312325k > > > >>>> node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb > > 658924k > > > >>>> 309734k > > > >>>> node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb > > 664753k > > > >>>> 264081k > > > >>>> node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb > > 652970k > > > >>>> 304137k > > > >>>> node13.domain.com,60020,1431372874221 82 178 937557m 70042mb > > 601684k > > > >>>> 257607k > > > >>>> node14.domain.com,60020,1429640630771 82 145 949090m 69749mb > > 592812k > > > >>>> 266677k > > > >>>> > > > >>>> > > > >>>> When compaction starts random node gets I/O 100%, io wait for > > > seconds, > > > >>>> even tenth of seconds. > > > >>>> > > > >>>> What are the approaches to optimize minor and major compactions > when > > > >> you > > > >>>> are I/O bound..? > > > >>>> > > > >>> > > > >>> Yeah, with two disks, you will be crimped. Do you have the system > > > sharing > > > >>> with hbase/hdfs or is hdfs running on one disk only? > > > >>> > > > >>> Do you have to compact? In other words, do you have read SLAs? How > > are > > > >>> your read times currently? Does your working dataset fit in RAM or > > do > > > >>> reads have to go to disk? It looks like you are running at about > > three > > > >>> storefiles per column family. What if you upped the threshold at > > which > > > >>> minors run? Do you have a downtime during which you could schedule > > > >>> compactions? Are you managing the major compactions yourself or are > > you > > > >>> having hbase do it for you? > > > >>> > > > >>> St.Ack > > > >>> > > > >> > > > > > > > > > > >