I have a “directory” in S3 containing Parquet files created from Avro using the AvroParquetWriter in the parquet-mr project.
I can load the contents of these files as a DataFrame using val it = spark.read.parquet("s3a://coolbeth/file=testy") but I have not found a way to define a permanent table based on these parquet files. If I do a regular CREATE EXTERNAL TABLE STORED AS PARQUET, then the deserialization crashes at query time, I think because there is no Spark schema stored in the parquet metadata (there is instead an Avro schema). Is there a way to create the table I want from these Avro-generated Parquet files? Thanks, Matt Coolbeth Software Engineer Disney DTCI