I am trying to write some code that is cleverer than the optimizer. The idea is that in spargel you often want to send the same message to many other graph nodes. These target nodes are partitioned between the machines of your cluster, and it would make sense to send the message to a target machine only once, and then it would distribute it to the nodes it is holding.
Attila 2014-12-03 16:21 GMT+01:00 Aljoscha Krettek <[email protected]>: > RuntimeContext.getIndexOfThisSubtask() > > What do you want to use this partition number for? If I may ask. > > Cheers, > Aljoscha > > On Wed, Dec 3, 2014 at 4:12 PM, Attila Bernáth <[email protected]> > wrote: >> Thank you, Stephan. >> How to access the partition number from the RuntimeContext? >> >> Attila >> >> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <[email protected]>: >>> Hey! >>> >>> Here is a brief description how to use rich functions: >>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink >>> >>> Greetings, >>> Stephan >>> >>> >>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <[email protected]> wrote: >>>> >>>> Hi! >>>> >>>> You can always use the "rich" version of the function, for example the >>>> "RichMapFunction". Inside that function, you can call >>>> "getRuntimeContext()", >>>> which gives you access to many things, among them the partition number. >>>> >>>> Stephan >>>> >>>> >>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <[email protected]> >>>> wrote: >>>>> >>>>> Dear Developers, >>>>> >>>>> Datasets are partitioned between machines. I wonder if there is a way >>>>> to get some identifier of a partition. I see that the class >>>>> HashPartition has a getPartitionNumber method, but I don't see how I >>>>> could use this. >>>>> (For example, I would like to see the partition identifier in a >>>>> MapFunction, or in a MapPartitionFunction). >>>>> >>>>> Attila >>>> >>>> >>>
