Hi,
This is the expected behaivour.
A default compression for parquet is `snappy`.
See:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L215
// maropu
On Tue, Jul 26, 2016 at 6:33 AM, janardhan shetty
Andrew,
2.0
I tried
val inputR = sc.textFile(file)
val inputS = inputR.map(x => x.split("`"))
val inputDF = inputS.toDF()
inputDF.write.format("parquet").save(result.parquet)
Result part files end with *.snappy.parquet *is that expected ?
On Sun, Jul 24, 2016 at 8:00 PM, Andrew Ehrlich
You can load the text with sc.textFile() to an RDD[String], then use .map() to
convert it into an RDD[Row]. At this point you are ready to apply a schema. Use
sqlContext.createDataFrame(rddOfRow, structType)
Here is an example on how to define the StructType (schema) that you will
combine with
We have data in Bz2 compression format. Any links in Spark to convert into
Parquet and also performance benchmarks and uses study materials ?