Amazon also strongly discourages the use of s3:// because the block file
system it maps to is deprecated.

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-file-systems.html

Note
> The configuration of Hadoop running on Amazon EMR differs from the default
> configuration provided by Apache Hadoop. On Amazon EMR, s3n:// and s3://
> both map to the Amazon S3 native file system, *while in the default
> configuration provided by Apache Hadoop s3:// is mapped to the Amazon S3
> block storage system.*


Amazon S3 block is a deprecated file system that is not recommended because
> it can trigger a race condition that might cause your cluster to fail. It
> may be required by legacy applications.




On Tue, May 6, 2014 at 8:23 PM, Matei Zaharia <matei.zaha...@gmail.com>wrote:

> There’s a difference between s3:// and s3n:// in the Hadoop S3 access
> layer. Make sure you use the right one when reading stuff back. In general
> s3n:// ought to be better because it will create things that look like
> files in other S3 tools. s3:// was present when the file size limit in S3
> was much lower, and it uses S3 objects as blocks in a kind of overlay file
> system.
>
> If you use s3n:// for both, you should be able to pass the exact same file
> to load as you did to save. Make sure you also set your AWS keys in the
> environment or in SparkContext.hadoopConfiguration.
>
> Matei
>
> On May 6, 2014, at 5:19 PM, kamatsuoka <ken...@gmail.com> wrote:
>
> > I have a Spark app that writes out a file,
> s3://mybucket/mydir/myfile.txt.
> >
> > Behind the scenes, the S3 driver creates a bunch of files like
> > s3://mybucket//mydir/myfile.txt/part-0000, as well as the block files
> like
> > s3://mybucket/block_3574186879395643429.
> >
> > How do I construct an url to use this file as input to another Spark
> app?  I
> > tried all the variations of s3://mybucket/mydir/myfile.txt, but none of
> them
> > work.
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>

Reply via email to