Hi Sam,

You can consider checking Carbondata 
format(https://github.com/apache/carbondata). It supports Column removal and 
Datatype change of existing column. Column rename you can raise a issue to 
support.

Regards,
Ramana
________________________________
From: Joel D [games2013....@gmail.com]
Sent: Tuesday, May 30, 2017 7:34 AM
To: user@spark.apache.org
Subject: Schema Evolution Parquet vs Avro

Hi,

We are trying to come up with the best storage format for handling schema 
changes in ingested data.

We noticed that both avro and parquet allows one to select based on column name 
instead of the data index/position of data. However, we are inclined towards 
parquet for better read performance since it's columnar and we will be 
selecting few columns instead of all. Data will be processed and saved to 
partitions on which we will have hive external tables.

Will parquet be able to handle the following:
- Column renaming from between data
- Column removal from between
- DataType change of existing column (int to bigint should be allowed, right?)

Please advise.

Thanks,
Sam

Reply via email to