Yuchen Liu created SPARK-48542: ---------------------------------- Summary: Give snapshotStartBatchId and snapshotPartitionId to the state data source Key: SPARK-48542 URL: https://issues.apache.org/jira/browse/SPARK-48542 Project: Spark Issue Type: New Feature Components: SQL, Structured Streaming Affects Versions: 4.0.0 Environment: This should work for both HDFS state store and RocksDB state store. Reporter: Yuchen Liu
Right now, to read a version of the state data, the state source will try to find the first snapshot file before the given version and construct it using the delta files. In some debugging scenarios, users need more granular control on how to reconstruct the given state, for example they want to start from a specific snapshot instead of the closest one. One use case is to find whether a snapshot has been corrupted after committing. This task introduces two options {{snapshotStartBatchId}} and {{snapshotPartitionId}} to the state data source. By specifying them, users can control the starting batch id of the snapshot and partition id of the state. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org