Re: PySpark working with Generators

2017-07-05 Thread Saatvik Shah
u want? >> >> >> >> *From:* Saatvik Shah [mailto:saatvikshah1...@gmail.com] >> *Sent:* Friday, June 30, 2017 8:55 AM >> *To:* ayan guha >> *Cc:* user >> *Subject:* Re: PySpark working with Generators >> >> >> >> Hey Ayan, >>

Re: PySpark working with Generators

2017-06-30 Thread Jörn Franke
istent.com> wrote: >> Wouldn’t this work if you load the files in hdfs and let the partitions be >> equal to the amount of parallelism you want? >> >> >> >> From: Saatvik Shah [mailto:saatvikshah1...@gmail.com] >> Sent: Friday,

Re: PySpark working with Generators

2017-06-30 Thread Saatvik Shah
h1...@gmail.com] > *Sent:* Friday, June 30, 2017 8:55 AM > *To:* ayan guha > *Cc:* user > *Subject:* Re: PySpark working with Generators > > > > Hey Ayan, > > > > This isnt a typical text file - Its a proprietary data format for which a > native Spark reader

RE: PySpark working with Generators

2017-06-29 Thread Mahesh Sawaiker
Wouldn’t this work if you load the files in hdfs and let the partitions be equal to the amount of parallelism you want? From: Saatvik Shah [mailto:saatvikshah1...@gmail.com] Sent: Friday, June 30, 2017 8:55 AM To: ayan guha Cc: user Subject: Re: PySpark working with Generators Hey Ayan

Re: PySpark working with Generators

2017-06-29 Thread ayan guha
Hi I understand that now. However, your function foo() should take a string and parse it, rather than trying to read from file. This way, you can separate the file read path and process part. r = sc.wholeTextFile(path) parsed = r.map(lambda x: x[0],foo(x[1])) On Fri, Jun 30, 2017 at 1:25 PM,

Re: PySpark working with Generators

2017-06-29 Thread Saatvik Shah
Hey Ayan, This isnt a typical text file - Its a proprietary data format for which a native Spark reader is not available. Thanks and Regards, Saatvik Shah On Thu, Jun 29, 2017 at 6:48 PM, ayan guha wrote: > If your files are in same location you can use sc.wholeTextFile.

Re: PySpark working with Generators

2017-06-29 Thread ayan guha
If your files are in same location you can use sc.wholeTextFile. If not, sc.textFile accepts a list of filepaths. On Fri, 30 Jun 2017 at 5:59 am, saatvikshah1994 wrote: > Hi, > > I have this file reading function is called /foo/ which reads contents into > a list of