Hello,
We recently ran into an issue where calls to IMetaStoreClient.alter_table() 
were failing with an error message like the following:
"The following columns have types incompatible with the existing columns in 
their respective positions : foo, bar".
 
For some background, we use Confluent's KafkaConnect to consume data from 
Kafka. It writes the data it consumes into HDFS (as Parquet files) and then 
uses the IMetaStoreClient to update the schema of an external table in Hive 
pointing to those Parquet files.
 
After some investigation, we identified that the error from MetaStore was due 
to us attempting to remove some columns in our external table when updating the 
schema (by calling IMetaStoreClient.alter_table).
I found this code that validates changes in MetaStore here: 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DefaultIncompatibleTableChangeHandler.java#L82
 
It seems to validate the changes (made when calling 
IMetaStoreClient.alter_table) by matching columns from the current schema with 
columns from the new schema by index. And checking to ensure any type changes 
are compatible.
The problem is, if I am trying to drop a column as part of my new schema, this 
logic could end up comparing completely different columns when validating their 
types.
 
I am aware that we can forgo the type validation entirely by setting 
"metastore.disallow.incompatible.col.type.changes" but this is undesirable as 
we want some safeguards in place against bad schema updates.
 
My question is, is this expected behaviour? We are fairly new to using Hive so 
are unsure if dropping columns via IMetaStoreClient is unsupported by design.
If it is not expected behaviour, could the logic be changed to match columns by 
names before checking that type changes are compatible?
 
 
Kind regards,
Yussuf

Reply via email to