Try rdd.coalesce(1).saveAsParquetFile(...) http://spark.apache.org/docs/1.2.0/programming-guide.html#transformations
--- Original Message --- From: "Manoj Samel" <manojsamelt...@gmail.com> Sent: January 29, 2015 9:28 AM To: user@spark.apache.org Subject: schemaRDD.saveAsParquetFile creates large number of small parquet files ... Spark 1.2 on Hadoop 2.3 Read one big csv file, create a schemaRDD on it and saveAsParquetFile. It creates a large number of small (~1MB ) parquet part-x- files. Any way to control so that smaller number of large files are created ? Thanks,