Also, I just thought of another possibility/question: Is there any way to dynamically register aggregators? In other words, instead of doing a Map, it would be ideal to just be able to register an aggregator for each id, but on the fly, since I don't know what all the ids will be in advance.
Thanks again for the help. Matthew On Thu, Jul 17, 2014 at 11:53 AM, Matthew Saltz <sal...@gmail.com> wrote: > Hi everyone, > > I'm trying to implement my own aggregator, whose aggregated value should > be a Map (for which I can use MapWritable) from an id (LongWritable) to a > custom defined type (which simply extends Writable) that contains several > aggregate metrics. I want vertices to be able to do something along the > lines of > > aggregate(MY_MAP_AGGREGATOR, new MyAggregatorMessage(id, stat1, stat2)); > > and then the map aggregator will do something like > > public void aggregate(MyAggregatorMessage m) { > > MapWritable currentMap = (MapWritable) getAggregatedValue(); > > if (!currentMap.containsKey(m.getId())) { > // MyAggregatorData contains the aggregate info I want to keep for > // each id. Contains init. values for stat1 and stat2 > currentMap.put(m.getId(), new MyAggregatorData()); > } > > MyAggregatorData oldData = currentMap.get(m.getId()); > // Performs appropriate aggregates for each stat and stores it. Sum, > // average, whatever > oldData.aggregate(m.getStat1(), m.getStat2()); > } > > However, the problem is that the method signatures > <https://giraph.apache.org/apidocs/org/apache/giraph/aggregators/Aggregator.html>for > Aggregator all have to use the same type. In other words, I can't have > > public MapWritable getAggregatedValue() > > and > > public void aggregate (MyAggregatorMessage m) > > because the types are different. > > My idea right now is to use a MyAggregatorWritable class that extends > GenericWritable > <http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/GenericWritable.html> > to > wrap both MyAggregatorMessage and MapWritable and then use that as the > method signature for both, and deal with the rest through casting. I've > already used GenericWritable for something else so the implementation would > be straightforward. > > So, I have a few principle questions, I suppose: > > 1) Is there a better way to implement this than to use a GenericWritable > as described above? If any of you have code for your own way to do this, > I'd love to see it, and if not, I'd love to contribute what I come up with > as a MapAggregator (in a generic manner) to the Giraph project if that > would be appropriate. > > 2) Is there anything wrong in principle with this type of solution? In > other words, is there some kind of philosophical or design reason that > having a Map as an aggregator is a bad idea? I know that it might not end > up being very efficient, but as it stands, I'm not seeing any other > solution to my problem; if there's an ordinary kind of workaround that > would be more efficient I'd love to hear it. > > 3) [Less important and more discussion oriented] Why is the API designed > such that these methods must use the same type? It seems like having an > Aggregator<Message, Result> would be useful. > > I apologize for the quite long message, and I appreciate any help you can > offer. If you need any other information, please let me know and I'll be > happy to provide it. In trying to simplify everything I easily could have > made a mistake or left out something important. Thanks in advance. > > Best, > Matthew > http://www.matthewsaltz.com > >