Also, I just thought of another possibility/question:

Is there any way to dynamically register aggregators? In other words,
instead of doing a Map, it would be ideal to just be able to register an
aggregator for each id, but on the fly, since I don't know what all the ids
will be in advance.

Thanks again for the help.

Matthew


On Thu, Jul 17, 2014 at 11:53 AM, Matthew Saltz <sal...@gmail.com> wrote:

> Hi everyone,
>
> I'm trying to implement my own aggregator, whose aggregated value should
> be a Map (for which I can use MapWritable) from an id (LongWritable) to a
> custom defined type (which simply extends Writable) that contains several
> aggregate metrics. I want vertices to be able to do something along the
> lines of
>
> aggregate(MY_MAP_AGGREGATOR, new MyAggregatorMessage(id, stat1, stat2));
>
> and then the map aggregator will do something like
>
> public void aggregate(MyAggregatorMessage m) {
>
>     MapWritable currentMap = (MapWritable) getAggregatedValue();
>
>     if (!currentMap.containsKey(m.getId())) {
>         // MyAggregatorData contains the aggregate info I want to keep for
>         // each id. Contains init. values for stat1 and stat2
>         currentMap.put(m.getId(), new MyAggregatorData());
>     }
>
>     MyAggregatorData oldData = currentMap.get(m.getId());
>     // Performs appropriate aggregates for each stat and stores it. Sum,
>     // average, whatever
>     oldData.aggregate(m.getStat1(), m.getStat2());
> }
>
> However, the problem is that the method signatures
> <https://giraph.apache.org/apidocs/org/apache/giraph/aggregators/Aggregator.html>for
> Aggregator all have to use the same type. In other words, I can't have
>
> public MapWritable getAggregatedValue()
>
> and
>
> public void aggregate (MyAggregatorMessage m)
>
> because the types are different.
>
> My idea right now is to use a MyAggregatorWritable class that extends
> GenericWritable
> <http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/GenericWritable.html>
>  to
> wrap both MyAggregatorMessage and MapWritable and then use that as the
> method signature for both, and deal with the rest through casting. I've
> already used GenericWritable for something else so the implementation would
> be straightforward.
>
> So, I have a few principle questions, I suppose:
>
> 1) Is there a better way to implement this than to use a GenericWritable
> as described above? If any of you have code for your own way to do this,
> I'd love to see it, and if not, I'd love to contribute what I come up with
> as a MapAggregator (in a generic manner) to the Giraph project if that
> would be appropriate.
>
> 2) Is there anything wrong in principle with this type of solution? In
> other words, is there some kind of philosophical or design reason that
> having a Map as an aggregator is a bad idea? I know that it might not end
> up being very efficient, but as it stands, I'm not seeing any other
> solution to my problem; if there's an ordinary kind of workaround that
> would be more efficient I'd love to hear it.
>
> 3) [Less important and more discussion oriented] Why is the API designed
> such that these methods must use the same type? It seems like having an
> Aggregator<Message, Result> would be useful.
>
> I apologize for the quite long message, and I appreciate any help you can
> offer. If you need any other information, please let me know and I'll be
> happy to provide it. In trying to simplify everything I easily could have
> made a mistake or left out something important.  Thanks in advance.
>
> Best,
> Matthew
> http://www.matthewsaltz.com
>
>

Reply via email to