You might be interested in the new s3a filesystem in Hadoop 2.6.0 [1]. 1. https://issues.apache.org/jira/plugins/servlet/mobile#issue/HADOOP-10400 On Nov 26, 2014 12:24 PM, "Aaron Davidson" <ilike...@gmail.com> wrote:
> Spark has a known problem where it will do a pass of metadata on a large > number of small files serially, in order to find the partition information > prior to starting the job. This will probably not be repaired by switching > the FS impl. > > However, you can change the FS being used like so (prior to the first > usage): > sc.hadoopConfiguration.set("fs.s3n.impl", > "org.apache.hadoop.fs.s3native.NativeS3FileSystem") > > On Wed, Nov 26, 2014 at 1:47 AM, Tomer Benyamini <tomer....@gmail.com> > wrote: > >> Thanks Lalit; Setting the access + secret keys in the configuration works >> even when calling sc.textFile. Is there a way to select which hadoop s3 >> native filesystem implementation would be used at runtime using the hadoop >> configuration? >> >> Thanks, >> Tomer >> >> On Wed, Nov 26, 2014 at 11:08 AM, lalit1303 <la...@sigmoidanalytics.com> >> wrote: >> >>> >>> you can try creating hadoop Configuration and set s3 configuration i.e. >>> access keys etc. >>> Now, for reading files from s3 use newAPIHadoopFile and pass the config >>> object here along with key, value classes. >>> >>> >>> >>> >>> >>> ----- >>> Lalit Yadav >>> la...@sigmoidanalytics.com >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/S3NativeFileSystem-inefficient-implementation-when-calling-sc-textFile-tp19841p19845.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >