You just need to call mapValues() to change your Iterable of things
into a sorted Iterable of things for each key-value pair. In that
function you write, it's no different from any other Java program. I
imagine you'll need to copy the input Iterable into an ArrayList
(unfortunately), sort it with whatever Comparator you want, and return
the result.

On Wed, Sep 17, 2014 at 4:37 PM,  <abraham.ja...@thomsonreuters.com> wrote:
> Hi Group,
>
>
>
> I am quite fresh in the spark world. There is a particular use case that I
> just cannot understand how to accomplish in spark. I am using Cloudera
> CDH5/YARN/Java 7.
>
>
>
> I have a dataset that has the following characteristics –
>
>
>
> A JavaPairRDD that represents the following –
>
>
>
> Key => {int ID}
>
> Value => {date effectiveFrom, float value}
>
>
>
> Let’s say that the data I have is the following –
>
>
>
>
>
> Partition – 1
>
> [K=> 1, V=> {09-17-2014, 2.8}]
>
> [K=> 1, V=> {09-11-2014, 3.9}]
>
> [K=> 3, V=> {09-18-2014, 5.0}]
>
> [K=> 3, V=> {09-10-2014, 7.4}]
>
>
>
>
>
> Partition – 2
>
> [K=> 2, V=> {09-13-2014, 2.5}]
>
> [K=> 4, V=> {09-07-2014, 6.2}]
>
> [K=> 2, V=> {09-12-2014, 1.8}]
>
> [K=> 4, V=> {09-22-2014, 2.9}]
>
>
>
>
>
> Grouping by key gives me the following RDD
>
>
>
> Partition – 1
>
> [K=> 1, V=> Iterable({09-17-2014, 2.8}, {09-11-2014, 3.9})]
>
> [K=> 3, V=> Iterable({09-18-2014, 5.0}, {09-10-2014, 7.4})]
>
>
>
> Partition – 2
>
> [K=> 2, Iterable({09-13-2014, 2.5}, {09-12-2014, 1.8})]
>
> [K=> 4, Iterable({09-07-2014, 6.2}, {09-22-2014, 2.9})]
>
>
>
> Now I would like to sort by the values and the result should look like this
> –
>
>
>
> Partition – 1
>
> [K=> 1, V=> Iterable({09-11-2014, 3.9}, {09-17-2014, 2.8})]
>
> [K=> 3, V=> Iterable({09-10-2014, 7.4}, {09-18-2014, 5.0})]
>
>
>
> Partition – 2
>
> [K=> 2, Iterable({09-12-2014, 1.8}, {09-13-2014, 2.5})]
>
> [K=> 4, Iterable({09-07-2014, 6.2}, {09-22-2014, 2.9})]
>
>
>
>
>
> What is the best way to do this in spark? If so desired, I can even move the
> “effectiveFrom” (the field that I want to sort on) into the key field.
>
>
>
> A code snippet or some pointers on how to solve this would be very helpful.
>
>
>
> Regards,
>
> Abraham

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to