You can use coalesce or repartition to control the number of file output by
any Spark operation.

On Thu, Jan 29, 2015 at 9:27 AM, Manoj Samel <manojsamelt...@gmail.com>
wrote:

> Spark 1.2 on Hadoop 2.3
>
> Read one big csv file, create a schemaRDD on it and saveAsParquetFile.
>
> It creates a large number of small (~1MB ) parquet part-x- files.
>
> Any way to control so that smaller number of large files are created ?
>
> Thanks,
>

Reply via email to