Hi Matthew,

Starting with your P.S.: It's not nutty; see MapWritable
<https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/io/MapWritable.html>
for
example, which can be used as a message type, or ArrayPrimitiveWritable
<http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/io/ArrayPrimitiveWritable.html>.
In this project <https://github.com/grafos-ml/okapi>, which I've found
helpful in getting inspiration for things as I'm getting started, they use
collections for messages in multiple places.

Going back to your main question: When you say many small vs fewer large
messages, I guess you mean that they'd both be sent in the same superstep?
If that's the case, I'd recommend just testing it since it's difficult to
say, but also my thought is that you could wrap the set in a primitive
collection like ArrayPrimitiveWritable if you go with the large message
approach, and you might save a bit of memory that you're sending out,
rather than sending a bunch of small ones as LongWritables or whatever it
might be. If I remember correctly, with the project I'm working on, I tried
both approaches and the large message approach was more effective. Then,
there's also the option of (if you run into problems with memory, for
example) using large messages but splitting the one superstep into
multiples if it's feasible. In the end I've found that it's difficult to
predict how it will perform, and it never hurts to try both approaches to
take a look at the result.

Everyone else, please correct me if I've said something incorrectly, as I'm
still relatively new at this.

Best,
Matthew Saltz



On Thu, Sep 4, 2014 at 8:16 PM, Matthew Cornell <m...@matthewcornell.org>
wrote:

> Hi Everyone,
>
> I have an app whose messaging granularity could be written two ways -
> sending many small messages vs. (possibly far) fewer larger ones.
> Conceptually what moves around is a set of 'alive' vertex IDs that might
> get filtered at each superstep based on a processed list (vertex value)
> that vertexes manage. The ones that survive to the end are the lucky
> winners. compute() calculates a set of 'new-to-me' incoming IDs that are
> perfect for the outgoing message, but I could easily send each ID one at a
> time. My guess is that sending fewer messages is more important, but the
> each set might contain thousands of IDs.
>
> Thanks!
>
> P.S. A side question: The few custom message type examples I've found are
> relatively simple objects with a few primitive instance variables, rather
> than collections. Is it nutty to send around a collection of IDs as a
> message?
>
> --
> Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
> Street, Amherst MA 01002 | matthewcornell.org
>

Reply via email to