RE: Why is 1 executor overworked and other sit idle?

2015-09-23 Thread Richard Eggert
From:* Richard Eggert [mailto:richard.egg...@gmail.com] > *Sent:* Wednesday, September 23, 2015 5:39 AM > *To:* Ted Yu > *Cc:* User; Chirag Dewan > *Subject:* Re: Why is 1 executor overworked and other sit idle? > > > > If there's only one partition, by definition it will o

RE: Why is 1 executor overworked and other sit idle?

2015-09-22 Thread Chirag Dewan
, 2015 5:39 AM To: Ted Yu Cc: User; Chirag Dewan Subject: Re: Why is 1 executor overworked and other sit idle? If there's only one partition, by definition it will only be handled by one executor. Repartition to divide the work up. Note that this will also result in multiple output files, ho

Re: Why is 1 executor overworked and other sit idle?

2015-09-22 Thread Richard Eggert
If there's only one partition, by definition it will only be handled by one executor. Repartition to divide the work up. Note that this will also result in multiple output files, however. If you absolutely need them to be combined into a single file, I suggest using the Unix/Linux 'cat' command t

Re: Why is 1 executor overworked and other sit idle?

2015-09-22 Thread Ted Yu
Have you tried using repartition to spread the load ? Cheers > On Sep 22, 2015, at 4:22 AM, Chirag Dewan wrote: > > Hi, > > I am using Spark to access around 300m rows in Cassandra. > > My job is pretty simple as I am just mapping my row into a CSV format and > saving it as a text file. >

Why is 1 executor overworked and other sit idle?

2015-09-22 Thread Chirag Dewan
Hi, I am using Spark to access around 300m rows in Cassandra. My job is pretty simple as I am just mapping my row into a CSV format and saving it as a text file. public String call(CassandraRow row) throws Excepti