Thanks Silvio!
On 11 Aug 2015 17:44, Silvio Fiorito silvio.fior...@granturing.com
wrote:
You need to configure the spark.sql.shuffle.partitions parameter to a
different value. It defaults to 200.
On 8/11/15, 11:31 AM, Al M alasdair.mcbr...@gmail.com wrote:
I am using DataFrames with
Thank you Hao; that was a fantastic response. I have raised SPARK-9782 for
this.
I also would love to have dynamic partitioning. I mentioned it in the Jira.
On 12 Aug 2015 02:19, Cheng, Hao hao.ch...@intel.com wrote:
That's a good question, we don't support reading small files in a single
The DataFrames parallelism currently controlled through configuration option
spark.sql.shuffle.partitions. The default value is 200
I have raised an Improvement Jira to make it possible to specify the number
of partitions in https://issues.apache.org/jira/browse/SPARK-9872
--
View this
You need to configure the spark.sql.shuffle.partitions parameter to a different
value. It defaults to 200.
On 8/11/15, 11:31 AM, Al M alasdair.mcbr...@gmail.com wrote:
I am using DataFrames with Spark 1.4.1. I really like DataFrames but the
partitioning makes no sense to me.
I am loading
That's a good question, we don't support reading small files in a single
partition yet, but it's definitely an issue we need to optimize, do you mind to
create a jira issue for this? Hopefully we can merge that in 1.6 release.
200 is the default partition number for parallel tasks after the