Super, it worked. Thanks On Fri, May 15, 2015 at 12:26 AM, Ram Sriharsha <sriharsha....@gmail.com> wrote:
> Here is an example of how I would pass in the S3 parameters to hadoop > configuration in pyspark. > You can do something similar for other parameters you want to pass to the > hadoop configuration > > hadoopConf=sc._jsc.hadoopConfiguration() > hadoopConf.set("fs.s3.impl", > "org.apache.hadoop.fs.s3native.NativeS3FileSystem") > hadoopConf.set("fs.s3n.awsAccessKeyId",$your_access_key_id) > hadoopConf.set("fs.s3n.awsSecretAccessKey",$your_secret_access_key) > > lines = sc.textFile($your_dataset_in_S3) > lines.count() > > > On Thu, May 14, 2015 at 4:17 AM, ayan guha <guha.a...@gmail.com> wrote: > >> Jo >> >> Thanks for the reply, but _jsc does not have anything to pass hadoop >> configs. can you illustrate your answer a bit more? TIA... >> >> On Wed, May 13, 2015 at 12:08 AM, Ram Sriharsha <sriharsha....@gmail.com> >> wrote: >> >>> yes, the SparkContext in the Python API has a reference to the >>> JavaSparkContext (jsc) >>> >>> https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext >>> >>> through which you can access the hadoop configuration >>> >>> On Tue, May 12, 2015 at 6:39 AM, ayan guha <guha.a...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> I found this method in scala API but not in python API (1.3.1). >>>> >>>> Basically, I want to change blocksize in order to read a binary file >>>> using sc.binaryRecords but with multiple partitions (for testing I want to >>>> generate partitions smaller than default blocksize)/ >>>> >>>> Is it possible in python? if so, how? >>>> >>>> -- >>>> Best Regards, >>>> Ayan Guha >>>> >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > > -- Best Regards, Ayan Guha