Reading from and writing to different S3 buckets in spark

Aseem Bansal Wed, 12 Oct 2016 02:49:55 -0700

Hi

I want to read CSV from one bucket, do some processing and write to a
different bucket. I know the way to set S3 credentials using


jssc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", YOUR_ACCESS_KEY)
jssc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", YOUR_SECRET_KEY)

But the problem is that spark is lazy. So if do the following

   - set credentails 1
   - read input csv
   - do some processing
   - set credentials 2
   - write result csv

Then there is a chance that due to laziness while reading input csv the
program may try to use credentails 2.

A solution is to cache the result csv but in case there is not enough
storage it is possible that the csv will be re-read. So how to handle this
situation?

Reading from and writing to different S3 buckets in spark

Reply via email to