Hi I want to read CSV from one bucket, do some processing and write to a different bucket. I know the way to set S3 credentials using
jssc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", YOUR_ACCESS_KEY) jssc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", YOUR_SECRET_KEY) But the problem is that spark is lazy. So if do the following - set credentails 1 - read input csv - do some processing - set credentials 2 - write result csv Then there is a chance that due to laziness while reading input csv the program may try to use credentails 2. A solution is to cache the result csv but in case there is not enough storage it is possible that the csv will be re-read. So how to handle this situation?