Hi Matthew, Starting with your P.S.: It's not nutty; see MapWritable <https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/io/MapWritable.html> for example, which can be used as a message type, or ArrayPrimitiveWritable <http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/io/ArrayPrimitiveWritable.html>. In this project <https://github.com/grafos-ml/okapi>, which I've found helpful in getting inspiration for things as I'm getting started, they use collections for messages in multiple places.
Going back to your main question: When you say many small vs fewer large messages, I guess you mean that they'd both be sent in the same superstep? If that's the case, I'd recommend just testing it since it's difficult to say, but also my thought is that you could wrap the set in a primitive collection like ArrayPrimitiveWritable if you go with the large message approach, and you might save a bit of memory that you're sending out, rather than sending a bunch of small ones as LongWritables or whatever it might be. If I remember correctly, with the project I'm working on, I tried both approaches and the large message approach was more effective. Then, there's also the option of (if you run into problems with memory, for example) using large messages but splitting the one superstep into multiples if it's feasible. In the end I've found that it's difficult to predict how it will perform, and it never hurts to try both approaches to take a look at the result. Everyone else, please correct me if I've said something incorrectly, as I'm still relatively new at this. Best, Matthew Saltz On Thu, Sep 4, 2014 at 8:16 PM, Matthew Cornell <m...@matthewcornell.org> wrote: > Hi Everyone, > > I have an app whose messaging granularity could be written two ways - > sending many small messages vs. (possibly far) fewer larger ones. > Conceptually what moves around is a set of 'alive' vertex IDs that might > get filtered at each superstep based on a processed list (vertex value) > that vertexes manage. The ones that survive to the end are the lucky > winners. compute() calculates a set of 'new-to-me' incoming IDs that are > perfect for the outgoing message, but I could easily send each ID one at a > time. My guess is that sending fewer messages is more important, but the > each set might contain thousands of IDs. > > Thanks! > > P.S. A side question: The few custom message type examples I've found are > relatively simple objects with a few primitive instance variables, rather > than collections. Is it nutty to send around a collection of IDs as a > message? > > -- > Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson > Street, Amherst MA 01002 | matthewcornell.org >