[GitHub] [spark] Fokko opened a new pull request #26737: [SPARK-30103][SQL] Consolidate Schema merge logic

GitBox Mon, 02 Dec 2019 07:40:53 -0800

Fokko opened a new pull request #26737: [SPARK-30103][SQL] Consolidate Schema
merge logic
URL: https://github.com/apache/spark/pull/26737

While working at https://github.com/apache/spark/pull/26644 I've noticed
something strange in behavior.

https://github.com/apache/spark/pull/26644 focusses on merging
UserDefinedTypes into Spark's native types. Delta checks if the schema is still
compatible, so as an integration test I tried to union two DF's, where one has
a UserDefinedTypes, which should then be merged into a native type. To mimic
this, I've used an union there, because we don't have the Delta extension, and
Spark does not check schema compatibility on write, so it is impossible to
reproduce the situation that we've observed with Delta.

However, when Delta checks compatibility, it merges the schema using the
`StructType.merge()`, and when Spark checks compatibility, it uses the
`TypeCoercion`:

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L314-L323

Since this is complex code, I think we should merge these two to get similar
behavior. Curious what your opinion is on this, and why these so similar
functions are kept separately.

### What changes were proposed in this pull request?

Removing the `StructType.merge()`, and use
`TypeCoercion.findTightestCommonType()` instead. The one in TypeCoercion looks
more complete.

### Why are the changes needed?

To simplify the codebase, and consolidate the behavior of merging schemas.

### Does this PR introduce any user-facing change?

Not in UI/Console, possibly in behavior.

### How was this patch tested?

Existing unit tests,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Fokko opened a new pull request #26737: [SPARK-30103][SQL] Consolidate Schema merge logic

Reply via email to