Hi Lian, Sorry I'm new to Spark so I did not express myself very clearly. I'm concerned about the situation when let's say I have a Parquet table some partitions and I add a new column A to parquet schema and write some data with the new schema to a new partition in the table. If i'm not mistaken, if I do a sqlContext.read.parquet(table_path).printSchema() it will print the correct schema with new column A. But if I do a 'describe table' from SparkSQLCLI I won't see the new column being added. I understand that this is because Hive doesn't support schema evolution. So what is the best way to support CLI queries in this situation? Do I need to manually alter the table everytime the underlying schema changes?
Thanks On Tue, Jul 21, 2015 at 4:37 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > Hey Jerrick, > > What do you mean by "schema evolution with Hive metastore tables"? Hive > doesn't take schema evolution into account. Could you please give a > concrete use case? Are you trying to write Parquet data with extra columns > into an existing metastore Parquet table? > > Cheng > > > On 7/21/15 1:04 AM, Jerrick Hoang wrote: > > I'm new to Spark, any ideas would be much appreciated! Thanks > > On Sat, Jul 18, 2015 at 11:11 AM, Jerrick Hoang <jerrickho...@gmail.com> > wrote: > >> Hi all, >> >> I'm aware of the support for schema evolution via DataFrame API. Just >> wondering what would be the best way to go about dealing with schema >> evolution with Hive metastore tables. So, say I create a table via SparkSQL >> CLI, how would I deal with Parquet schema evolution? >> >> Thanks, >> J >> > > >