Hi Ashok, On the Spark SQL side, when you create a dataframe, it will have a schema (each column has a type such as Int or String). Then when you save that dataframe as parquet format, Spark translates the dataframe schema into Parquet data types. (See spark.sql.execution.datasources.parquet.) Then Parquet does the dictionary encoding automatically (if applicable) based on the data values; this encoding is not specified by the user. Parquet figures out the right encoding to use for you.
Xinh > On Mar 3, 2016, at 7:32 PM, ashokkumar rajendran > <ashokkumar.rajend...@gmail.com> wrote: > > Hi, > > I am exploring to use Apache Parquet with Spark SQL in our project. I notice > that Apache Parquet uses different encoding for different columns. The > dictionary encoding in Parquet will be one of the good ones for our > performance. I do not see much documentation in Spark or Parquet on how to > configure this. For example, how would Parquet know dictionary of words if > there is no schema provided by user? Where/how to specify my schema / config > for Parquet format? > > Could not find Apache Parquet mailing list in the official site. It would be > great if anyone could share it as well. > > Regards > Ashok --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org