[ https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Otto resolved SPARK-23890. --------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Ah! This is supported in DataSource v2 after all, except just not via CHANGE COLUMN. Instead, you can add a column to a nested field by addressing it with dotted notation: ALTER TABLE otto.test_table03 ADD COLUMN s1.s1_f2_added STRING; > Hive ALTER TABLE CHANGE COLUMN for struct type no longer works > -------------------------------------------------------------- > > Key: SPARK-23890 > URL: https://issues.apache.org/jira/browse/SPARK-23890 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 3.0.0 > Reporter: Andrew Otto > Priority: Major > Labels: bulk-closed, pull-request-available > Fix For: 3.0.0 > > > As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE > CHANGE COLUMN commands to Hive. This restriction was loosened in > [https://github.com/apache/spark/pull/12714] to allow for those commands if > they only change the column comment. > Wikimedia has been evolving Parquet backed Hive tables with data originally > from JSON events by adding newly found columns to the Hive table schema, via > a Spark job we call 'Refine'. We do this by recursively merging an input > DataFrame schema with a Hive table DataFrame schema, finding new fields, and > then issuing an ALTER TABLE statement to add the columns. However, because > we allow for nested data types in the incoming JSON data, we make extensive > use of struct type fields. In order to add newly detected fields in a nested > data type, we must alter the struct column and append the nested struct > field. This requires CHANGE COLUMN that alters the column type. In reality, > the 'type' of the column is not changing, it just just a new field being > added to the struct, but to SQL, this looks like a type change. > -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that > can be sent to Hive will block us. I believe this is fixable by adding an > exception in > [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325] > to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and > destination type are both struct types, and the destination type only adds > new fields.- > > In this [PR|https://github.com/apache/spark/pull/21012], I was told that the > Spark 3 datasource v2 would support this. > However, it is clear that it does not. There is an [explicit > check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441] > and > [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583] > that prevents this from happening. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org