Multiple count distinct in Apache Tajo

Hyunsik Choi Tue, 12 Aug 2014 16:42:25 -0700

Hi Richards,

Thank you for your interest in Apache Tajo. Yes, Tajo supports multiple
distinct aggregation. Tajo would be nice for your use case. According to my
personal experience about multiple count distinct aggregations, Tajo
outperforms Hive 0.10 up to 3-4 times. Also, Tajo outperforms Hive 0.13 on
Tez up to 1.5 times.


In addition, 0.9.0 release will include lots of performance improvements,
such as per-node shuffle and skeweness handling of hash shuffle. So,
performance comparisons would be even more interesting.

Thanks,
Hyunsik

On Tue, Aug 12, 2014 at 3:33 PM, Richards Peter <[email protected]
<javascript:;>> wrote:
> Hi,
>
> Thanks for open sourcing Apache Tajo. My team is using Apache Hive to
> perform various data aggregations. Now that Apache Tajo is available, I
> would like to evaluate how it compares with Apache Hive for some of our
use
> cases.
>
> One such use case is queries containing multiple count distincts. I found
> the following mail thread about Apache Tajo's support for multiple count
> distinct:
>
>
http://mail-archives.apache.org/mod_mbox/tajo-commits/201405.mbox/%[email protected]%3E
>
> Could you please tell me whether I can expect a better performance for
such
> queries in Apache Tajo than in Apache Hive?
>
> Thanks,
> Richards Peter.

Multiple count distinct in Apache Tajo

Reply via email to