Re: How to read a multipart s3 file?

Nicholas Chammas Mon, 12 May 2014 00:32:28 -0700

On Wed, May 7, 2014 at 4:00 AM, Han JU <ju.han.fe...@gmail.com> wrote:


But in my experience, when reading directly from s3n, spark create only 1
> input partition per file, regardless of the file size. This may lead to
> some performance problem if you have big files.

 You can (and perhaps should) always repartition() the RDD explicitly to
increase your level of parallelism to match the number of cores in your
cluster. It’s pretty quick, and will speed up all subsequent operations.

Re: How to read a multipart s3 file?

Reply via email to