you can kv.mapValues(sorted), but that's definitely less efficient than sorting during the groupBy

you could try using combineByKey directly w/ heapq...

from heapq import heapify, heappush, merge
def createCombiner(x):
    return [x]
def mergeValues(xs, x):
    heappush(xs, x)
    return xs
def mergeCombiners(a, b):
    return merge(a, b)

rdd.combineByKey(createCombiner, mergeValues, mergeCombiners)

best,


matt

On 08/22/2014 10:41 PM, Arpan Ghosh wrote:
I was grouping time series data by a key. I want the values to be sorted
by timestamp after the grouping.


On Fri, Aug 22, 2014 at 7:26 PM, Matthew Farrellee <[email protected]
<mailto:[email protected]>> wrote:

    On 08/22/2014 04:32 PM, Arpan Ghosh wrote:

        Is there any way to control the ordering of values for each key
        during a
        groupByKey() operation? Is there some sort of implicit ordering
        in place
        already?

        Thanks

        Arpan


    there's no implicit ordering in place. the same holds for the order
    of keys, unless you use sortByKey.

    what are you trying to achieve?

    best,


    matt




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to