u want?
>>
>>
>>
>> *From:* Saatvik Shah [mailto:saatvikshah1...@gmail.com]
>> *Sent:* Friday, June 30, 2017 8:55 AM
>> *To:* ayan guha
>> *Cc:* user
>> *Subject:* Re: PySpark working with Generators
>>
>>
>>
>> Hey Ayan,
>>
istent.com> wrote:
>> Wouldn’t this work if you load the files in hdfs and let the partitions be
>> equal to the amount of parallelism you want?
>>
>>
>>
>> From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
>> Sent: Friday,
h1...@gmail.com]
> *Sent:* Friday, June 30, 2017 8:55 AM
> *To:* ayan guha
> *Cc:* user
> *Subject:* Re: PySpark working with Generators
>
>
>
> Hey Ayan,
>
>
>
> This isnt a typical text file - Its a proprietary data format for which a
> native Spark reader
Wouldn’t this work if you load the files in hdfs and let the partitions be
equal to the amount of parallelism you want?
From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
Sent: Friday, June 30, 2017 8:55 AM
To: ayan guha
Cc: user
Subject: Re: PySpark working with Generators
Hey Ayan
Hi
I understand that now. However, your function foo() should take a string
and parse it, rather than trying to read from file. This way, you can
separate the file read path and process part.
r = sc.wholeTextFile(path)
parsed = r.map(lambda x: x[0],foo(x[1]))
On Fri, Jun 30, 2017 at 1:25 PM,
Hey Ayan,
This isnt a typical text file - Its a proprietary data format for which a
native Spark reader is not available.
Thanks and Regards,
Saatvik Shah
On Thu, Jun 29, 2017 at 6:48 PM, ayan guha wrote:
> If your files are in same location you can use sc.wholeTextFile.
If your files are in same location you can use sc.wholeTextFile. If not,
sc.textFile accepts a list of filepaths.
On Fri, 30 Jun 2017 at 5:59 am, saatvikshah1994
wrote:
> Hi,
>
> I have this file reading function is called /foo/ which reads contents into
> a list of