>>>> In the first step, the files are read correctly and regionGroups is 
>>>> creates as it should.
Did you notice the reducer numbers? Did it equal to 2000(Before your extended 
HFileOutputFormat)? 

>>> RegionServer logs in the RegionServer that the files are moved to 
>>> indeed shows that all files are moved to that region (when it 
>>> doesn't happen it shows only 1 file per family moved to a 
>>> RegionServer)

How about the region-split related logs? 

> Loaded regions are listed in .META. table and the ENCODED field in the 
> table points to an existing directory. But all family directories in 
> this region are empty...

Was the previous old region still in .META.?

> I implemented an extension of HFileOutputFormat - because each bulk load will 
> import data to the newly created regions only, I pass the prefix
> (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that 
> getRegionStartKeys returns only the corresponding keys.
>I did this in order to avoid having 2000 reducers when my target is 15 
>regions...

We always do like this:). Only configure the necessary regions.

Sorry for the lately reply.

Jieshan
-----Original Message-----
From: Amit Sela [mailto:am...@infolinks.com] 
Sent: Tuesday, December 17, 2013 12:19 AM
To: user@hbase.apache.org
Subject: Re: Bulk load moving HFiles to the wrong region

I've managed to isolate the problem.
I implemented an extension of HFileOutputFormat - because each bulk load will 
import data to the newly created regions only, I pass the prefix
(yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that 
getRegionStartKeys returns only the corresponding keys.
I did this in order to avoid having 2000 reducers when my target is 15 
regions...

When I use HFileOutputFormat  it seems to work. But I don't understand why it 
doesn't happen in other tables (some smaller and some much much bigger) or even 
in that table it happens every once in a while ?

Any ideas ?



On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:

> Loaded regions are listed in .META. table and the ENCODED field in the 
> table points to an existing directory. But all family directories in 
> this region are empty...
>
>
> On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote:
>
>> I ran the hbck tool, and while I do have some inconsistencies they 
>> are not in the table that has the bulk load issues.
>>
>>
>>
>> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com> wrote:
>>
>>> RegionServer logs in the RegionServer that the files are moved to 
>>> indeed shows that all files are moved to that region (when it 
>>> doesn't happen it shows only 1 file per family moved to a 
>>> RegionServer)
>>>
>>>
>>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com> wrote:
>>>
>>>> In the first step, the files are read correctly and regionGroups is 
>>>> creates as it should.
>>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I 
>>>> notice that ServerCallable's regionName returned from server is the 
>>>> wrong region (the pre-split last region).
>>>> The previous last region is not supposed to delete I'm just adding 
>>>> new regions (always following lexicographically) so that the last 
>>>> region before the pre-split is not the last anymore.
>>>> It seems that wherever the ServerCallable is running, it is not 
>>>> updated with the new regions... I tried major compacting (the new 
>>>> regions) after pre-split and before the bulkload, but that didn't help.
>>>>
>>>>
>>>>
>>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijies...@huawei.com>wrote:
>>>>
>>>>> As we know, bulk load has two steps:
>>>>> 1. Create HFiles by MapReduce.
>>>>> 2. Load HFiles into HBase.
>>>>>
>>>>> I wonder whether it read the right partitions information during 
>>>>> the first step. Have you run hbck tool to check the cluster healthy?
>>>>> You mentioned you see the new regions in the webapp. The files 
>>>>> were moved to the previous old region indicated the old region 
>>>>> directory was still there. So you started bulk load just after 
>>>>> region split? (Old region directory will be deleted soon by 
>>>>> CatalogJanitor after region-split once compaction finished)
>>>>>
>>>>> I suggest to check the regionserver logs.
>>>>>
>>>>> Jieshan.
>>>>> -----Original Message-----
>>>>> From: Amit Sela [mailto:am...@infolinks.com]
>>>>> Sent: Monday, December 16, 2013 2:29 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>>>>>
>>>>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that 
>>>>> the entire load is (almost) evenly spread.
>>>>> The problem I described causes the bulk load to load all files to 
>>>>> to the last region of the previous day.
>>>>> Thanks.
>>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bijies...@huawei.com> wrote:
>>>>>
>>>>> > Hi Amit:
>>>>> > Can you provide the split-keys of the new regions and your 
>>>>> > row-key
>>>>> design?
>>>>> >
>>>>> > Thank you.
>>>>> > Jieshan.
>>>>> > -----Original Message-----
>>>>> > From: Amit Sela [mailto:am...@infolinks.com]
>>>>> > Sent: Monday, December 16, 2013 7:09 AM
>>>>> > To: user@hbase.apache.org
>>>>> > Subject: Bulk load moving HFiles to the wrong region
>>>>> >
>>>>> > Hi all,
>>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>>>>> > When trying to bulk load using the Java API I sometimes get the
>>>>> HFiles
>>>>> > moved to the wrong directory.
>>>>> > I'm pre-splitting regions and the new regions are always the 
>>>>> > last (lexicographically), so when this happens all files move to 
>>>>> > the last region pre-split. But the split does work. I see the 
>>>>> > new regions in the webapp before bulk load executes. Once a 
>>>>> > table has this problem (not all the time) it keeps on until I restart 
>>>>> > HBase.
>>>>> >
>>>>> > Anyone seen something similar ?
>>>>> >
>>>>> > Thanks,
>>>>> > Amit.
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to