
I am exploring to use Apache Parquet with Spark SQL in our project. I
notice that Apache Parquet uses different encoding for different columns.
The dictionary encoding in Parquet will be one of the good ones for our
performance. I do not see much documentation in Spark or Parquet on how to
configure this. For example, how would Parquet know dictionary of words if
there is no schema provided by user? Where/how to specify my schema /
config for Parquet format?

Could not find Apache Parquet mailing list in the official site. It would
be great if anyone could share it as well.


Reply via email to