>>>> In the first step, the files are read correctly and regionGroups is >>>> creates as it should. Did you notice the reducer numbers? Did it equal to 2000(Before your extended HFileOutputFormat)?
>>> RegionServer logs in the RegionServer that the files are moved to >>> indeed shows that all files are moved to that region (when it >>> doesn't happen it shows only 1 file per family moved to a >>> RegionServer) How about the region-split related logs? > Loaded regions are listed in .META. table and the ENCODED field in the > table points to an existing directory. But all family directories in > this region are empty... Was the previous old region still in .META.? > I implemented an extension of HFileOutputFormat - because each bulk load will > import data to the newly created regions only, I pass the prefix > (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that > getRegionStartKeys returns only the corresponding keys. >I did this in order to avoid having 2000 reducers when my target is 15 >regions... We always do like this:). Only configure the necessary regions. Sorry for the lately reply. Jieshan -----Original Message----- From: Amit Sela [mailto:am...@infolinks.com] Sent: Tuesday, December 17, 2013 12:19 AM To: user@hbase.apache.org Subject: Re: Bulk load moving HFiles to the wrong region I've managed to isolate the problem. I implemented an extension of HFileOutputFormat - because each bulk load will import data to the newly created regions only, I pass the prefix (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that getRegionStartKeys returns only the corresponding keys. I did this in order to avoid having 2000 reducers when my target is 15 regions... When I use HFileOutputFormat it seems to work. But I don't understand why it doesn't happen in other tables (some smaller and some much much bigger) or even in that table it happens every once in a while ? Any ideas ? On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote: > Loaded regions are listed in .META. table and the ENCODED field in the > table points to an existing directory. But all family directories in > this region are empty... > > > On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote: > >> I ran the hbck tool, and while I do have some inconsistencies they >> are not in the table that has the bulk load issues. >> >> >> >> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com> wrote: >> >>> RegionServer logs in the RegionServer that the files are moved to >>> indeed shows that all files are moved to that region (when it >>> doesn't happen it shows only 1 file per family moved to a >>> RegionServer) >>> >>> >>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com> wrote: >>> >>>> In the first step, the files are read correctly and regionGroups is >>>> creates as it should. >>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I >>>> notice that ServerCallable's regionName returned from server is the >>>> wrong region (the pre-split last region). >>>> The previous last region is not supposed to delete I'm just adding >>>> new regions (always following lexicographically) so that the last >>>> region before the pre-split is not the last anymore. >>>> It seems that wherever the ServerCallable is running, it is not >>>> updated with the new regions... I tried major compacting (the new >>>> regions) after pre-split and before the bulkload, but that didn't help. >>>> >>>> >>>> >>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijies...@huawei.com>wrote: >>>> >>>>> As we know, bulk load has two steps: >>>>> 1. Create HFiles by MapReduce. >>>>> 2. Load HFiles into HBase. >>>>> >>>>> I wonder whether it read the right partitions information during >>>>> the first step. Have you run hbck tool to check the cluster healthy? >>>>> You mentioned you see the new regions in the webapp. The files >>>>> were moved to the previous old region indicated the old region >>>>> directory was still there. So you started bulk load just after >>>>> region split? (Old region directory will be deleted soon by >>>>> CatalogJanitor after region-split once compaction finished) >>>>> >>>>> I suggest to check the regionserver logs. >>>>> >>>>> Jieshan. >>>>> -----Original Message----- >>>>> From: Amit Sela [mailto:am...@infolinks.com] >>>>> Sent: Monday, December 16, 2013 2:29 PM >>>>> To: user@hbase.apache.org >>>>> Subject: RE: Bulk load moving HFiles to the wrong region >>>>> >>>>> Every split executed is a new day. The row key design is yyyyMMdd_URL. >>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that >>>>> the entire load is (almost) evenly spread. >>>>> The problem I described causes the bulk load to load all files to >>>>> to the last region of the previous day. >>>>> Thanks. >>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bijies...@huawei.com> wrote: >>>>> >>>>> > Hi Amit: >>>>> > Can you provide the split-keys of the new regions and your >>>>> > row-key >>>>> design? >>>>> > >>>>> > Thank you. >>>>> > Jieshan. >>>>> > -----Original Message----- >>>>> > From: Amit Sela [mailto:am...@infolinks.com] >>>>> > Sent: Monday, December 16, 2013 7:09 AM >>>>> > To: user@hbase.apache.org >>>>> > Subject: Bulk load moving HFiles to the wrong region >>>>> > >>>>> > Hi all, >>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12. >>>>> > When trying to bulk load using the Java API I sometimes get the >>>>> HFiles >>>>> > moved to the wrong directory. >>>>> > I'm pre-splitting regions and the new regions are always the >>>>> > last (lexicographically), so when this happens all files move to >>>>> > the last region pre-split. But the split does work. I see the >>>>> > new regions in the webapp before bulk load executes. Once a >>>>> > table has this problem (not all the time) it keeps on until I restart >>>>> > HBase. >>>>> > >>>>> > Anyone seen something similar ? >>>>> > >>>>> > Thanks, >>>>> > Amit. >>>>> > >>>>> >>>> >>>> >>> >> >