Jungtaek Lim created SPARK-45672: ------------------------------------ Summary: Provide a unified user-facing schema for state format versions in state data source - reader Key: SPARK-45672 URL: https://issues.apache.org/jira/browse/SPARK-45672 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Jungtaek Lim
As of now, except stream-stream join with joinSide option being specified, state data source would provide the state "as it is" in the state store. This means state data source will provide the different schema for operators having multiple state format versions. >From users' perspective, they do not care about the state format version, >hence may be confused if the state data source produces different schema. That said, we could probably consider defining and providing the same user facing schema for each operator. *Note that this would need further discussion* before coming up with code, because there is a clear trade-off. It makes a strong coupling between state data source and the implementation of stateful operators. Also, for the argument of non-predictable output schema, users can call printSchema() to see the output schema in prior to query. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org