Re: Query on rowkey distribution || Does RS and number of Region related with each other

Manjeet Singh Mon, 03 Sep 2018 06:10:45 -0700

Hi Josh

Sharing steps and my findings for better understanding:



I have tested on below table creation policy (considering that I am 100%
aware of pre-splitting but can't use as per our rowkey design)

I have to opt some different policy which can evenly distribute the data to
all Regions

#1
hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit
-c 10 -f f1
alter 'test_table', { NAME => 'si', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
alter 'test_table', {NAME => 'si', COMPRESSION => 'SNAPPY'


#2
create 'TEST_TABLE_KeyPrefixRegionSplitPolicy', {NAME => 'si'}, CONFIG =>
{'KeyPrefixRegionSplitPolicy.prefix_length'=> '5'}
alter 'TEST_TABLE_KeyPrefixRegionSplitPolicy', { NAME => 'si',
DATA_BLOCK_ENCODING => 'FAST_DIFF' }
alter 'TEST_TABLE_KeyPrefixRegionSplitPolicy', {NAME => 'si', COMPRESSION
=> 'SNAPPY'



#3 Currently I am consdring it and want to distribute data only based on
rowkey
create 'TEST_TABLE','si',{ NAME => 'si', COMPRESSION => 'SNAPPY' }
alter 'TEST_TABLE', { NAME => 'si', DATA_BLOCK_ENCODING => 'FAST_DIFF' }
alter 'TEST_TABLE', {NAME => 'si', COMPRESSION => 'SNAPPY' }


Thanks
Manjeet Singh



On Fri, Aug 31, 2018 at 6:49 PM, Josh Elser <els...@apache.org> wrote:

> I'd like to remind you again that we're all volunteers and we're helping
> you because we choose to do so. Antagonizing those who are helping you is a
> great way to stop receiving any free help.
>
> If you do not create more than one Region, HBase will not distribute your
> data on more than one RegionServer. Full stop.
>
>
> On 8/30/18 2:16 PM, Manjeet Singh wrote:
>
>> Hi Elser
>>
>> I have clearly total about rowkey does I am talking about data? see below
>> what I have told about rowkey
>>
>> SALT_ID_DayStartTimestamp_DayEndTimeStamp_IDTimeStamp
>>
>> Problem is this you are not understanding the question and just telling
>> what you know, even on slack you are saying same thing.
>> Question is simple if I put salt (which can be any arbit char or genrated
>> hash any thing) at the begning of the rowkey why my data not getting
>> distributed
>> Please note this is not pre splitted table.
>>
>> Thanks
>> Manjeet Singh
>>
>> On Thu, Aug 30, 2018 at 9:11 PM Josh Elser <els...@apache.org> wrote:
>>
>> As I've been trying to explain in Slack:
>>>
>>> 1. Are you including the salt in the data that you are writing, such
>>> that you are spreading the data across all Regions per their boundaries?
>>> Or, as I think you are, just creating split points with this arbitrary
>>> "salt" and not including it when you write data?
>>>
>>> If, as I am assuming, you are not, all of your data will go into the
>>> first or last region. If you are still not getting my point, I'd suggest
>>> that you share the exact splitpoints and one rowkey that you are writing
>>> to HBase. That will make it quite clear if my guess is correct or not.
>>>
>>> 2. The number of Regions controls the number of RegionServers that will
>>> be involved with reads/writes against that table. This is a calculation
>>> that you need to figure out based on your cluster configuration and the
>>> magnitude of your workload.
>>>
>>> On 8/30/18 1:11 AM, Manjeet Singh wrote:
>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> I have two Question
>>>>
>>>> *Question 1 : *
>>>>
>>>> I want to understand how rowkey distribution happen if I create my table
>>>> with out applying any policy but opting prefix salting.
>>>>
>>>> Example I have rowkey like
>>>>
>>>> SALT_ID_DayStartTimestamp_DayEndTimeStamp_IDTimeStamp
>>>>
>>>> So it will look like as below
>>>>
>>>> *_99_1516838400_1516924800_1516865160
>>>>
>>>> Question is : now I can not see that my data is getting distributed only
>>>> because of salt.
>>>>
>>>> So does I have only choice of pre splitting? Or do I have any other
>>>>
>>> option?
>>>
>>>>
>>>> I have seen two more approaches
>>>>
>>>> i.e.
>>>>
>>>> hbase org.apache.hadoop.hbase.util.RegionSplitter test_table
>>>>
>>> HexStringSplit
>>>
>>>> -c 10 -f f1
>>>>
>>>> I guess its scope is limited as number of region created at the time
>>>>
>>> table
>>>
>>>> creation and it will fix? Not sure.
>>>>
>>>> and
>>>>
>>>> *UniformSplit
>>>> <
>>>>
>>> https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbas
>>> e/util/RegionSplitter.UniformSplit.html
>>>
>>>> *
>>>>
>>>>
>>>>
>>>> *Second 2: Does number of split point anywhere related to the number of
>>>>
>>> RS
>>>
>>>> in cluster, If yes what is the calculation? *
>>>>
>>>>
>>>
>>
>>


-- 
luv all

Re: Query on rowkey distribution || Does RS and number of Region related with each other

Reply via email to