It's not in general true that 100 different partitions keys go to 100 partitions -- it depends on the partitioner, but wouldn't be true in the case of a default HashPartitioner. But, yeah you'd expect a reasonably even distribution.
What happens in all cases depends on the partitioner. I haven't tested it (you should just try it) but I would assume that switching the columns could result in a different assignment to partitions. You would not in general want to be sensitive to the exact partitioning unless you were using your own custom partitioning. On Thu, Nov 17, 2016 at 7:41 PM Cesar <ces...@gmail.com> wrote: > > I am using the next line to re-partition a data frame by multiple columns: > > val partitionColumns = Seq("date", "company_id").map(x => new Column(x)) > val numPartitions = 100 > > val dfRepartitioined = df.repartition(numPartitions, partitionColumns) > > I understand that if the number of combinations of date and company_id is > at most 100, each combination of will go to a different partition. > > My question is, what happens when the number of combinations larger than > 100 ? Does re-partition changes in behavior if I switch the column order in > the definition of partitionColumns variable? > > Thanks > -- > Cesar Flores >