Guanghao Zhang created HBASE-17125:
--------------------------------------

             Summary: Inconsistent result when use filter to read data
                 Key: HBASE-17125
                 URL: https://issues.apache.org/jira/browse/HBASE-17125
             Project: HBase
          Issue Type: Bug
            Reporter: Guanghao Zhang


Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
The oldest version doesn't remove immediately. But from the user view, the 
oldest version has gone. When user use a filter to query, if the filter skip a 
new version, then the oldest version will be seen again. But after compact the 
region, then the oldest version will never been seen. So it is weird for user. 
The query will get inconsistent result before and after region compaction.

The reason is matchColumn method of UserScanQueryMatcher. It first check the 
cell by filter, then check the number of versions needed. So if the filter skip 
the new version, then the oldest version will be seen again when it is not 
removed.

Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
solution for this problem. The first idea is check the number of versions 
first, then check the cell by filter. As the comment of setFilter, the filter 
is called after all tests for ttl, column match, deletes and max versions have 
been run.
{code}
  /**
   * Apply the specified server-side filter when performing the Query.
   * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
   * for ttl, column match, deletes and max versions have been run.
   * @param filter filter to run on the server
   * @return this for invocation chaining
   */
  public Query setFilter(Filter filter) {
    this.filter = filter;
    return this;
  }
{code}
But this idea has another problem, if a column's max version is 5 and the user 
query only need 3 versions. It first check the version's number, then check the 
cell by filter. So the cells number of the result may less than 3. But there 
are 2 versions which don't read anymore.

So the second idea has three steps.
1. check by the max versions of this column
2. check the kv by filter
3. check the versions which user need.
But this will lead the ScanQueryMatcher more complicated. And this will break 
the javadoc of Query.setFilter.

Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to