Re: Spark-hive parquet schema evolution

Jerrick Hoang Tue, 21 Jul 2015 19:34:36 -0700

Hi Lian,

Sorry I'm new to Spark so I did not express myself very clearly. I'm
concerned about the situation when let's say I have a Parquet table some
partitions and I add a new column A to parquet schema and write some data
with the new schema to a new partition in the table. If i'm not mistaken,
if I do a sqlContext.read.parquet(table_path).printSchema() it will print
the correct schema with new column A. But if I do a 'describe table' from
SparkSQLCLI I won't see the new column being added. I understand that this
is because Hive doesn't support schema evolution. So what is the best way
to support CLI queries in this situation? Do I need to manually alter the
table everytime the underlying schema changes?


Thanks

On Tue, Jul 21, 2015 at 4:37 PM, Cheng Lian <lian.cs....@gmail.com> wrote:

>  Hey Jerrick,
>
> What do you mean by "schema evolution with Hive metastore tables"? Hive
> doesn't take schema evolution into account. Could you please give a
> concrete use case? Are you trying to write Parquet data with extra columns
> into an existing metastore Parquet table?
>
> Cheng
>
>
> On 7/21/15 1:04 AM, Jerrick Hoang wrote:
>
> I'm new to Spark, any ideas would be much appreciated! Thanks
>
> On Sat, Jul 18, 2015 at 11:11 AM, Jerrick Hoang <jerrickho...@gmail.com>
> wrote:
>
>> Hi all,
>>
>>  I'm aware of the support for schema evolution via DataFrame API. Just
>> wondering what would be the best way to go about dealing with schema
>> evolution with Hive metastore tables. So, say I create a table via SparkSQL
>> CLI, how would I deal with Parquet schema evolution?
>>
>>  Thanks,
>> J
>>
>
>
>

Re: Spark-hive parquet schema evolution

Reply via email to