The hadoop docs about s3 <http://wiki.apache.org/hadoop/AmazonS3> (linked to by the Spark docs) say that s3n files are subject to "the 5GB limit on file size imposed by S3." However, limit was raised <http://www.computerworld.com/s/article/9200763/Amazon_s_S3_can_now_store_files_of_up_to_5TB> about three years ago. So it wasn't clear to me whether this limit still applies to Hadoops s3n urls.
Well, I tried running a spark job on a 200GB s3n file, and it ran fine. Has this been other people's experience? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/s3n-5GB-tp943.html Sent from the Apache Spark User List mailing list archive at Nabble.com.