[ 
https://issues.apache.org/jira/browse/HBASE-20844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539597#comment-16539597
 ] 

ShivaKumar SS commented on HBASE-20844:
---------------------------------------

This behaviour is not seen in hbase 1.4.5 and it turns out to be below fix 
missing in hbase 1.3.1, where it ignores regions which are getting split.


{{Class : org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl}}

 

Method :

{{  public static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest 
manifest) {}}
{{      List<SnapshotRegionManifest> regionManifests = 
manifest.getRegionManifests();}}
{{      if (regionManifests == null) {}}
{{         throw new IllegalArgumentException("Snapshot seems empty");}}
{{      }}}

{{      List<HRegionInfo> regionInfos = 
Lists.newArrayListWithCapacity(regionManifests.size());}}

{{      for (SnapshotRegionManifest regionManifest : regionManifests) {}}
{{         HRegionInfo hri = 
HRegionInfo.convert(regionManifest.getRegionInfo());}}
{{         if (hri.isOffline() && (hri.isSplit() || hri.isSplitParent())) { // 
This one.}}
{{           continue;}}
{{         }}}
{{         regionInfos.add(hri);}}
{{      }}}
{{      return regionInfos;}}
{{  }}}

 

> Duplicate rows returned while hbase snapshot reads
> --------------------------------------------------
>
>                 Key: HBASE-20844
>                 URL: https://issues.apache.org/jira/browse/HBASE-20844
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce, snapshots, spark
>    Affects Versions: 1.3.1
>         Environment: Cluster Details 
> Java  1.7
> Hbase     1.3.1
> Spark      1.6.1
>            Reporter: ShivaKumar SS
>            Priority: Major
>
> We are trying to take snapshot from code and read data using MR and spark, 
> both approaches are returning duplicate records.
> On the API side, 
> \{{org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat }} is used.
> Snapshot was taken during the table was in a region split state.
> We suspect it is due to data is being returned for both parent and daughter 
> regions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to