Thanks for your advices. For option three, I think major compaction on a large region will affect performance of the region server. So the down time shall be down time for all the table on that RS, am i Right?
On 12/16/15, 5:12 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: >w.r.t. option #1, also consider >http://hbase.apache.org/book.html#arch.bulk.load > >FYI > >On Tue, Dec 15, 2015 at 12:17 PM, Frank Luo <j...@merkleinc.com> wrote: > >> I am in a very similar situation. >> >> I guess you can try one of the options. >> >> Option one: avoid online insert by preparing data off-line. Do something >> like http://hbase.apache.org/0.94/book/ops_mgt.html#importtsv >> >> Option two: If the first option doesn’t work for you. It will be better to >> reduce your region size and increase read/write timeout. So that you allow >> compact to happen while you insert data, but since the size is smaller, it >> takes less time to compact/split. With this option, you can have a table >> available 24/7 but the overall performance tends to go down dramatically >> once some regions starts compacting. >> >> Option three: If you can afford some down time, ie, two hours every day. >> You can manage compact/split during that time. What I usually do is to run >> major-compact against all tables, then split ones that is large so that it >> has enough room to grow for the next day’s insert. >> >> I hope it helps. >> >> From: 林豪 [mailto:lin...@qiyi.com] >> Sent: Monday, December 14, 2015 11:51 PM >> To: user@hbase.apache.org >> Subject: Common advices for hosting a huge table >> >> Hi, all: >> >> We have a HBase Cluster which has several hundreds of region servers and >> each RS hosts nearly 300 regions. Currently one of our tables has increased >> to 16 TB and some region exceeds 10 GB. Major compaction on these regions >> is painful as it produces a lot of disk I/O and will affect the performance >> of RS. The auto splitting size of IncreasingToUpperBoundRegionSplitPolicy >> increased to 16 GB or more for this huge table. My solution is set >> attribute MAX_FILESIZE on this table so ConstantSizeRegionSplitPolicy auto >> splitting will work again. >> >> My question is: What are the common advices or configuration options to >> host such a huge table. If we decide to limit the region size, how can we >> decide the optimised region size? If region size is too large, major >> compaction is painful; but if region size is too small, then we have a lot >> of small region which will overwhelm the RS. >> >> 林豪 >> 云平台 研发工程师 >> >> 爱奇艺公司 >> QIYI.com, Inc. >> 地址:上海市徐汇区宜山路1388号民润大厦6层 >> 邮编:201103 >> 手机:+86 136 1180 1618 >> 电话:+86 21 5451 9520 8393 >> 传真:+86 21 5451 9529 >> 邮箱:lin...@qiyi.com<mailto:zhouxiq...@qiyi.com> >> 网址:www.iQIYI.com<http://www.iqiyi.com/> >> [cid:B21E048D-B27D-4528-92D0-36BAE7117128]<http://www.iqiyi.com/> >> >> This email and any attachments transmitted with it are intended for use by >> the intended recipient(s) only. If you have received this email in error, >> please notify the sender immediately and then delete it. If you are not the >> intended recipient, you must not keep, use, disclose, copy or distribute >> this email without the author’s prior permission. We take precautions to >> minimize the risk of transmitting software viruses, but we advise you to >> perform your own virus checks on any attachment to this message. We cannot >> accept liability for any loss or damage caused by software viruses. The >> information contained in this communication may be confidential and may be >> subject to the attorney-client privilege. >>