RE: Processing multiple columns in parallel

2015-05-18 Thread Needham, Guy
emails. Learn more at http://vsre.info/ From: ayan guha [mailto:guha.a...@gmail.com] Sent: 18 May 2015 15:46 To: Laeeq Ahmed Cc: user@spark.apache.org Subject: Re: Processing multiple columns in parallel My first thought would be creating 10 rdds and run your word count on each of them..I think

Re: Processing multiple columns in parallel

2015-05-18 Thread ayan guha
My first thought would be creating 10 rdds and run your word count on each of them..I think spark scheduler is going to resolve dependency in parallel and launch 10 jobs. Best Ayan On 18 May 2015 23:41, "Laeeq Ahmed" wrote: > Hi, > > Consider I have a tab delimited text file with 10 columns. Eac