Not sure about what's the rule in case of `b + null = null` but the same code works perfectly in 1.6.1, just tried it..
On Sun, Jun 26, 2016 at 1:24 PM Takeshi Yamamuro <linguin....@gmail.com> wrote: > Hi, > > This behaviour seems to be expected because you must ensure `b + zero() = > b` > The your case `b + null = null` breaks this rule. > This is the same with v1.6.1. > See: > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57 > > // maropu > > > On Sun, Jun 26, 2016 at 6:06 PM, Amit Sela <amitsel...@gmail.com> wrote: > >> Sometimes, the BUF for the aggregator may depend on the actual input.. >> and while this passes the responsibility to handle null in merge/reduce to >> the developer, it sounds fine to me if he is the one who put null in zero() >> anyway. >> Now, it seems that the aggregation is skipped entirely when zero() = >> null. Not sure if that was the behaviour in 1.6 >> >> Is this behaviour wanted ? >> >> Thanks, >> Amit >> >> Aggregator example: >> >> public static class Agg extends Aggregator<Tuple2<String, Integer>, Integer, >> Integer> { >> >> @Override >> public Integer zero() { >> return null; >> } >> >> @Override >> public Integer reduce(Integer b, Tuple2<String, Integer> a) { >> if (b == null) { >> b = 0; >> } >> return b + a._2(); >> } >> >> @Override >> public Integer merge(Integer b1, Integer b2) { >> if (b1 == null) { >> return b2; >> } else if (b2 == null) { >> return b1; >> } else { >> return b1 + b2; >> } >> } >> >> > > > -- > --- > Takeshi Yamamuro >