Hi Richards, Thank you for your interest in Apache Tajo. Yes, Tajo supports multiple distinct aggregation. Tajo would be nice for your use case. According to my personal experience about multiple count distinct aggregations, Tajo outperforms Hive 0.10 up to 3-4 times. Also, Tajo outperforms Hive 0.13 on Tez up to 1.5 times.
In addition, 0.9.0 release will include lots of performance improvements, such as per-node shuffle and skeweness handling of hash shuffle. So, performance comparisons would be even more interesting. Thanks, Hyunsik On Tue, Aug 12, 2014 at 3:33 PM, Richards Peter <[email protected] <javascript:;>> wrote: > Hi, > > Thanks for open sourcing Apache Tajo. My team is using Apache Hive to > perform various data aggregations. Now that Apache Tajo is available, I > would like to evaluate how it compares with Apache Hive for some of our use > cases. > > One such use case is queries containing multiple count distincts. I found > the following mail thread about Apache Tajo's support for multiple count > distinct: > > http://mail-archives.apache.org/mod_mbox/tajo-commits/201405.mbox/%[email protected]%3E > > Could you please tell me whether I can expect a better performance for such > queries in Apache Tajo than in Apache Hive? > > Thanks, > Richards Peter.
