Andrew, 2.0
I tried val inputR = sc.textFile(file) val inputS = inputR.map(x => x.split("`")) val inputDF = inputS.toDF() inputDF.write.format("parquet").save(result.parquet) Result part files end with *.snappy.parquet *is that expected ? On Sun, Jul 24, 2016 at 8:00 PM, Andrew Ehrlich <and...@aehrlich.com> wrote: > You can load the text with sc.textFile() to an RDD[String], then use > .map() to convert it into an RDD[Row]. At this point you are ready to > apply a schema. Use sqlContext.createDataFrame(rddOfRow, structType) > > Here is an example on how to define the StructType (schema) that you will > combine with the RDD[Row] to create a DataFrame. > > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.StructType > > Once you have the DataFrame, save it to parquet with > dataframe.save(“/path”) to create a parquet file. > > Reference for SQLContext / createDataFrame: > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SQLContext > > > > On Jul 24, 2016, at 5:34 PM, janardhan shetty <janardhan...@gmail.com> > wrote: > > We have data in Bz2 compression format. Any links in Spark to convert into > Parquet and also performance benchmarks and uses study materials ? > > >