Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Amit Sela Sun, 26 Jun 2016 03:47:47 -0700

Not sure about what's the rule in case of `b + null = null` but the same
code works perfectly in 1.6.1, just tried it..


On Sun, Jun 26, 2016 at 1:24 PM Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> Hi,
>
> This behaviour seems to be expected because you must ensure `b + zero() =
> b`
> The your case `b + null = null` breaks this rule.
> This is the same with v1.6.1.
> See:
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57
>
> // maropu
>
>
> On Sun, Jun 26, 2016 at 6:06 PM, Amit Sela <amitsel...@gmail.com> wrote:
>
>> Sometimes, the BUF for the aggregator may depend on the actual input..
>> and while this passes the responsibility to handle null in merge/reduce to
>> the developer, it sounds fine to me if he is the one who put null in zero()
>> anyway.
>> Now, it seems that the aggregation is skipped entirely when zero() =
>> null. Not sure if that was the behaviour in 1.6
>>
>> Is this behaviour wanted ?
>>
>> Thanks,
>> Amit
>>
>> Aggregator example:
>>
>> public static class Agg extends Aggregator<Tuple2<String, Integer>, Integer, 
>> Integer> {
>>
>>   @Override
>>   public Integer zero() {
>>     return null;
>>   }
>>
>>   @Override
>>   public Integer reduce(Integer b, Tuple2<String, Integer> a) {
>>     if (b == null) {
>>       b = 0;
>>     }
>>     return b + a._2();
>>   }
>>
>>   @Override
>>   public Integer merge(Integer b1, Integer b2) {
>>     if (b1 == null) {
>>       return b2;
>>     } else if (b2 == null) {
>>       return b1;
>>     } else {
>>       return b1 + b2;
>>     }
>>   }
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

Reply via email to