Thanks Patrick. But why am I getting a Bad Digest error when I am saving large amount of data to s3?
/Loss was due to org.apache.hadoop.fs.s3.S3Exception org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/spark_test%2Fsmaato_one_day_phase_2%2Fsmaato_2014_05_17%2F_temporary%2F_attempt_201408041624_0000_m_000065_165%2Fpart-00065' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>BadDigest</Code><Message>The Content-MD5 you specified did not match what we received.</Message><ExpectedDigest>lb2tDEVSSnRNM4pw6504Bg==</ExpectedDigest><CalculatedDigest>EL9UDBzFvTwJycA7Ii2KGA==</CalculatedDigest><RequestId>437F15C89D355081</RequestId><HostId>kJQI+c9edzBmT2Z9sbfAELYT/8R5ezLWeUgeIU37iPsq5KQm/qAXItunZY35wnYx</HostId></Error> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:82)/ As indicated earlier, I use the following command as an alternative to saveAsTextFile: /x.map(x => (NullWritable.get(), new Text(x.toString))).coalesce(100).saveAsHadoopFile[TextOutputFormat[NullWritable, Text]]("s3n://dest-dir"/) In the above case, it succeeds till it writes some 48 part files out of 100 (but this 48 also is inconsistent) and then starts throwing the above error. The same works well if I increase the capacity of the cluster (say from 3 m3.2xlarge slaves to 6), or reduce the data size. Is there a possibility that the data is getting corrupt when the load increases? Please advice. I am stuck with this problem for the past couple of weeks. Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11345.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org