VitoMakarevich opened a new issue, #5511:
URL: https://github.com/apache/hudi/issues/5511

   **Describe the problem you faced**
   
   Incremental query with `begin.instanttime` less than the first commit time 
is different, depending on how many commits added.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Write to hudi table n commits, where n > number of commits config param
   2. trigger an incremental query with `begin.instanttime` less than the first 
commit, e.g. `0`.
   3. verify the number of output rows and compare it with the number of rows 
in the snapshot. It will contain fewer rows compared to the snapshot.
   Here is the [sample 
repo](https://github.com/VitoMakarevich/hudi-incremental-issue) with 
reproduction.
   
   But if you do the same thing, but write n commits, where n < number of 
commits config param, then query from `0`, you will
   see the number of rows equal to the number of rows in the snapshot.
   
   **Expected behavior**
   
   I expect that incremental behavior with `begin.instanttime` less than the 
first commit, should be the same independently of the fact that something was 
cleaned or not.
   
   **Environment Description**
   
   * Hudi version : 0.9.0, 0.10.0, works correctly for 0.11.0
   
   * Spark version : 3.1.2
   
   * Storage (HDFS/S3/GCS..) : s3/local file
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   I assume that this [PR](https://github.com/apache/hudi/pull/3946/files) 
fixes the behavior for `0.11.0`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to