Thanks Patrick. 

But why am I getting a Bad Digest error when I am saving large amount of
data to s3? 

/Loss was due to org.apache.hadoop.fs.s3.S3Exception
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
S3 PUT failed for
'/spark_test%2Fsmaato_one_day_phase_2%2Fsmaato_2014_05_17%2F_temporary%2F_attempt_201408041624_0000_m_000065_165%2Fpart-00065'
XML Error Message: <?xml version="1.0"
encoding="UTF-8"?><Error><Code>BadDigest</Code><Message>The Content-MD5 you
specified did not match what we
received.</Message><ExpectedDigest>lb2tDEVSSnRNM4pw6504Bg==</ExpectedDigest><CalculatedDigest>EL9UDBzFvTwJycA7Ii2KGA==</CalculatedDigest><RequestId>437F15C89D355081</RequestId><HostId>kJQI+c9edzBmT2Z9sbfAELYT/8R5ezLWeUgeIU37iPsq5KQm/qAXItunZY35wnYx</HostId></Error>
        at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:82)/

As indicated earlier, I use the following command as an alternative to
saveAsTextFile:

/x.map(x => (NullWritable.get(), new
Text(x.toString))).coalesce(100).saveAsHadoopFile[TextOutputFormat[NullWritable,
Text]]("s3n://dest-dir"/)

In the above case, it succeeds till it writes some 48 part files out of 100
(but this 48 also is inconsistent) and then starts throwing the above error.
The  same works well if I increase the capacity of the cluster (say from 3
m3.2xlarge slaves to 6), or reduce the data size. 

Is there a possibility that the data is getting corrupt when the load
increases? 

Please advice. I am stuck with this problem for the past couple of weeks.

Thanks,
lmk



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11345.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to