Re: Controlling number of spark partitions in dataframes

Deepak Sharma Thu, 26 Oct 2017 09:54:07 -0700

I guess the issue is spark.default.parallelism is ignored when you are
working with Data frames.It is supposed to work with only raw RDDs.


Thanks
Deepak

On Thu, Oct 26, 2017 at 10:05 PM, Noorul Islam Kamal Malmiyoda <
noo...@noorul.com> wrote:

> Hi all,
>
> I have the following spark configuration
>
> spark.app.name=Test
> spark.cassandra.connection.host=127.0.0.1
> spark.cassandra.connection.keep_alive_ms=5000
> spark.cassandra.connection.port=10000
> spark.cassandra.connection.timeout_ms=30000
> spark.cleaner.ttl=3600
> spark.default.parallelism=4
> spark.master=local[2]
> spark.ui.enabled=false
> spark.ui.showConsoleProgress=false
>
> Because I am setting spark.default.parallelism to 4, I was expecting
> only 4 spark partitions. But it looks like it is not the case
>
> When I do the following
>
>     df.foreachPartition { partition =>
>       val groupedPartition = partition.toList.grouped(3).toList
>       println("Grouped partition " + groupedPartition)
>     }
>
> There are too many print statements with empty list at the top. Only
> the relevant partitions are at the bottom. Is there a way to control
> number of partitions?
>
> Regards,
> Noorul
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Re: Controlling number of spark partitions in dataframes

Reply via email to