Do we need schema for Parquet files with Spark?

ashokkumar rajendran Thu, 03 Mar 2016 19:33:06 -0800

Hi,

I am exploring to use Apache Parquet with Spark SQL in our project. I
notice that Apache Parquet uses different encoding for different columns.
The dictionary encoding in Parquet will be one of the good ones for our
performance. I do not see much documentation in Spark or Parquet on how to
configure this. For example, how would Parquet know dictionary of words if
there is no schema provided by user? Where/how to specify my schema /
config for Parquet format?


Could not find Apache Parquet mailing list in the official site. It would
be great if anyone could share it as well.

Regards
Ashok

Do we need schema for Parquet files with Spark?

Reply via email to