subject:"Re\: Spark partitioning question"

Re: Spark partitioning question

2015-05-05 Thread Marius Danciu

Turned out that is was sufficient do to repartitionAndSortWithinPartitions ... so far so good ;) On Tue, May 5, 2015 at 9:45 AM Marius Danciu wrote: > Hi Imran, > > Yes that's what MyPartitioner does. I do see (using traces from > MyPartitioner) that the key is partitioned on partition 0 but the

Re: Spark partitioning question

2015-05-04 Thread Marius Danciu

Hi Imran, Yes that's what MyPartitioner does. I do see (using traces from MyPartitioner) that the key is partitioned on partition 0 but then I see this record arriving in both Yarn containers (I see it in the logs). Basically I need to emulate a Hadoop map-reduce job in Spark and groupByKey seemed

Re: Spark partitioning question

2015-05-04 Thread Imran Rashid

Hi Marius, I am also a little confused -- are you saying that myPartitions is basically something like: class MyPartitioner extends Partitioner { def numPartitions = 1 def getPartition(key: Any) = 0 } ?? If so, I don't understand how you'd ever end up data in two partitions. Indeed, than ev

Re: Spark partitioning question

2015-04-28 Thread Silvio Fiorito

. From: Marius Danciu Date: Tuesday, April 28, 2015 at 9:53 AM To: Silvio Fiorito, user Subject: Re: Spark partitioning question Thank you Silvio, I am aware of groubBy limitations and this is subject for replacement. I did try repartitionAndSortWithinPartitions but then I end up with maybe too

Re: Spark partitioning question

2015-04-28 Thread Marius Danciu

Thank you Silvio, I am aware of groubBy limitations and this is subject for replacement. I did try repartitionAndSortWithinPartitions but then I end up with maybe too much shuffling one from groupByKey and the other from repartition. My expectation was that since N records are partitioned to the

Re: Spark partitioning question

2015-04-28 Thread Silvio Fiorito

Hi Marius, What’s the expected output? I would recommend avoiding the groupByKey if possible since it’s going to force all records for each key to go to an executor which may overload it. Also if you need to sort and repartition, try using repartitionAndSortWithinPartitions to do it in one sho

Re: Spark partitioning question

Re: Spark partitioning question

Re: Spark partitioning question

Re: Spark partitioning question

Re: Spark partitioning question

Re: Spark partitioning question

6 matches

Site Navigation

Mail list logo

Footer information