I have a “directory” in S3 containing Parquet files created from Avro using the 
AvroParquetWriter in the parquet-mr project.

I can load the contents of these files as a DataFrame using
    val it = spark.read.parquet("s3a://coolbeth/file=testy")

but I have not found a way to define a permanent table based on these parquet 
files.

If I do a regular CREATE EXTERNAL TABLE STORED AS PARQUET, then the 
deserialization crashes at query time, I think because there is no Spark schema 
stored in the parquet metadata (there is instead an Avro schema).

Is there a way to create the table I want from these Avro-generated Parquet 
files?

Thanks,

Matt Coolbeth
Software Engineer
Disney DTCI

Reply via email to