Re: Using sc.HadoopConfiguration in Python

ayan guha Thu, 14 May 2015 08:02:58 -0700

Super, it worked. Thanks

On Fri, May 15, 2015 at 12:26 AM, Ram Sriharsha <sriharsha....@gmail.com>
wrote:


> Here is an example of how I would pass in the S3 parameters to hadoop
> configuration in pyspark.
> You can do something similar for other parameters you want to pass to the
> hadoop configuration
>
> hadoopConf=sc._jsc.hadoopConfiguration()
> hadoopConf.set("fs.s3.impl",
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
> hadoopConf.set("fs.s3n.awsAccessKeyId",$your_access_key_id)
> hadoopConf.set("fs.s3n.awsSecretAccessKey",$your_secret_access_key)
>
> lines = sc.textFile($your_dataset_in_S3)
> lines.count()
>
>
> On Thu, May 14, 2015 at 4:17 AM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Jo
>>
>> Thanks for the reply, but _jsc does not have anything to pass hadoop
>> configs. can you illustrate your answer a bit more? TIA...
>>
>> On Wed, May 13, 2015 at 12:08 AM, Ram Sriharsha <sriharsha....@gmail.com>
>> wrote:
>>
>>> yes, the SparkContext in the Python API has a reference to the
>>> JavaSparkContext (jsc)
>>>
>>> https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext
>>>
>>> through which you can access the hadoop configuration
>>>
>>> On Tue, May 12, 2015 at 6:39 AM, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> I found this method in scala API but not in python API (1.3.1).
>>>>
>>>> Basically, I want to change blocksize in order to read a binary file
>>>> using sc.binaryRecords but with multiple partitions (for testing I want to
>>>> generate partitions smaller than default blocksize)/
>>>>
>>>> Is it possible in python? if so, how?
>>>>
>>>> --
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Re: Using sc.HadoopConfiguration in Python

Reply via email to