Hi, This is the expected behaivour. A default compression for parquet is `snappy`. See: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L215
// maropu On Tue, Jul 26, 2016 at 6:33 AM, janardhan shetty <janardhan...@gmail.com> wrote: > Andrew, > > 2.0 > > I tried > val inputR = sc.textFile(file) > val inputS = inputR.map(x => x.split("`")) > val inputDF = inputS.toDF() > > inputDF.write.format("parquet").save(result.parquet) > > Result part files end with *.snappy.parquet *is that expected ? > > On Sun, Jul 24, 2016 at 8:00 PM, Andrew Ehrlich <and...@aehrlich.com> > wrote: > >> You can load the text with sc.textFile() to an RDD[String], then use >> .map() to convert it into an RDD[Row]. At this point you are ready to >> apply a schema. Use sqlContext.createDataFrame(rddOfRow, structType) >> >> Here is an example on how to define the StructType (schema) that you >> will combine with the RDD[Row] to create a DataFrame. >> >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.StructType >> >> Once you have the DataFrame, save it to parquet with >> dataframe.save(“/path”) to create a parquet file. >> >> Reference for SQLContext / createDataFrame: >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SQLContext >> >> >> >> On Jul 24, 2016, at 5:34 PM, janardhan shetty <janardhan...@gmail.com> >> wrote: >> >> We have data in Bz2 compression format. Any links in Spark to convert >> into Parquet and also performance benchmarks and uses study materials ? >> >> >> > -- --- Takeshi Yamamuro