You can load the text with sc.textFile() to an RDD[String], then use .map() to 
convert it into an RDD[Row]. At this point you are ready to apply a schema. Use 
sqlContext.createDataFrame(rddOfRow, structType)

Here is an example on how to define the StructType (schema) that you will 
combine with the RDD[Row] to create a DataFrame.

Once you have the DataFrame, save it to parquet with“/path”) to 
create a parquet file.

Reference for SQLContext / createDataFrame:

> On Jul 24, 2016, at 5:34 PM, janardhan shetty <> wrote:
> We have data in Bz2 compression format. Any links in Spark to convert into 
> Parquet and also performance benchmarks and uses study materials ?

Reply via email to