Spark 1.2 on Hadoop 2.3 Read one big csv file, create a schemaRDD on it and saveAsParquetFile.
It creates a large number of small (~1MB ) parquet part-x- files. Any way to control so that smaller number of large files are created ? Thanks,
Spark 1.2 on Hadoop 2.3 Read one big csv file, create a schemaRDD on it and saveAsParquetFile.
It creates a large number of small (~1MB ) parquet part-x- files. Any way to control so that smaller number of large files are created ? Thanks,