Jian Feng created HUDI-4677:
-------------------------------

             Summary: Snapshot view management
                 Key: HUDI-4677
                 URL: https://issues.apache.org/jira/browse/HUDI-4677
             Project: Apache Hudi
          Issue Type: Epic
            Reporter: Jian Feng
         Attachments: image-2022-08-22-02-03-31-588.png

 !image-2022-08-22-02-03-31-588.png! image.png
    for the snapshot view scenario, Hudi already provides two key features to 
support it:
Time travel: user provides a timestamp to query a specific snapshot view of a 
Hudi table
Savepoint/restore: "savepoint" saves the table as of the commit time so that it 
lets you restore the table to this savepoint at a later point in time if need 
be. but in this case, the user usually uses this to prevent cleaning snapshot 
view at a specific timestamp, only clean unused files
The situation is there some inconvenience for users if use them directly

Usually users incline to use a meaningful name instead of querying Hudi table 
with a timestamp, using the timestamp in SQL may lead to the wrong snapshot 
view being used. for example, we can announce that a new tag of hudi table with 
table_nameYYYYMMDD was released, then the user can use this new table name to 
query.
Savepoint is not designed for this "snapshot view" scenario in the beginning, 
it is designed for disaster recovery. let's say a new snapshot view will be 
created every day, and it has 7 days retention, we should support lifecycle 
management on top of it.
What I plan to do is to let Hudi support release a snapshot view and lifecycle 
management out-of-box. 





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to