Hi Nguyen,

This looks promising and seems like I could achieve it using cluster by.
Thanks for the pointer.

Regards,
Keith.

http://keith-chapman.com

On Sat, Jun 24, 2017 at 5:27 AM, nguyen duc Tuan <newvalu...@gmail.com>
wrote:

> Hi Chapman,
> You can use "cluster by" to do what you want.
> https://deepsense.io/optimize-spark-with-distribute-by-and-cluster-by/
>
> 2017-06-24 17:48 GMT+07:00 Saliya Ekanayake <esal...@gmail.com>:
>
>> I haven't worked with datasets but would this help
>> https://stackoverflow.com/questions/37513667/how-to-cre
>> ate-a-spark-dataset-from-an-rdd?
>>
>> On Jun 23, 2017 5:43 PM, "Keith Chapman" <keithgchap...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have code that does the following using RDDs,
>>>
>>> val outputPartitionCount = 300
>>> val part = new MyOwnPartitioner(outputPartitionCount)
>>> val finalRdd = myRdd.repartitionAndSortWithinPartitions(part)
>>>
>>> where myRdd is correctly formed as key, value pairs. I am looking
>>> convert this to use Dataset/Dataframe instead of RDDs, so my question is:
>>>
>>> Is there custom partitioning of Dataset/Dataframe implemented in Spark?
>>> Can I accomplish the partial sort using mapPartitions on the resulting
>>> partitioned Dataset/Dataframe?
>>>
>>> Any thoughts?
>>>
>>> Regards,
>>> Keith.
>>>
>>> http://keith-chapman.com
>>>
>>
>

Reply via email to