Re: Using sc.HadoopConfiguration in Python

Ram Sriharsha Thu, 14 May 2015 07:27:27 -0700

Here is an example of how I would pass in the S3 parameters to hadoop
configuration in pyspark.
You can do something similar for other parameters you want to pass to the
hadoop configuration


hadoopConf=sc._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3n.awsAccessKeyId",$your_access_key_id)
hadoopConf.set("fs.s3n.awsSecretAccessKey",$your_secret_access_key)

lines = sc.textFile($your_dataset_in_S3)
lines.count()


On Thu, May 14, 2015 at 4:17 AM, ayan guha <[email protected]> wrote:

> Jo
>
> Thanks for the reply, but _jsc does not have anything to pass hadoop
> configs. can you illustrate your answer a bit more? TIA...
>
> On Wed, May 13, 2015 at 12:08 AM, Ram Sriharsha <[email protected]>
> wrote:
>
>> yes, the SparkContext in the Python API has a reference to the
>> JavaSparkContext (jsc)
>>
>> https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext
>>
>> through which you can access the hadoop configuration
>>
>> On Tue, May 12, 2015 at 6:39 AM, ayan guha <[email protected]> wrote:
>>
>>> Hi
>>>
>>> I found this method in scala API but not in python API (1.3.1).
>>>
>>> Basically, I want to change blocksize in order to read a binary file
>>> using sc.binaryRecords but with multiple partitions (for testing I want to
>>> generate partitions smaller than default blocksize)/
>>>
>>> Is it possible in python? if so, how?
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Using sc.HadoopConfiguration in Python

Reply via email to