Hive on Spark

2017-08-21 Thread peter zhang
Hi All, Has anybody used hive on spark in your production environment? How does it's the stability and performance compared with spark sql? Hope anybody can share your experience. Thanks in advance!

Re: Hive index + Tez engine = no performance gain?!

2017-08-21 Thread Jörn Franke
Parquet has also internal indexes. So no need for Hive index there. For fast ad-hoc queries you can use Tez +llap. Here you could use parquet or convert via CTAS easily to Orc. However you need to check if ORC is faster than Parquet depending on your data, queries and configuration (bloom filters

Hive index + Tez engine = no performance gain?!

2017-08-21 Thread Thai Bui
This seems out of the blue but my initial benchmarks have shown that there's no performance gain when Hive index is used with Tez engine. I'm not sure why, but several posts online have suggested that Tez engine does not support Hive index (bitmap, compact). Is true? If yes, that is sad. I underst

Re: Aug. 2017 Hive User Group Meeting

2017-08-21 Thread dan young
For us out of town folks, where is the location of this meetup? Says Hortonworks but do you have an address? Regards Dano On Mon, Aug 21, 2017, 1:33 PM Xuefu Zhang wrote: > Dear Hive users and developers, > > As reminder, the next Hive User Group Meeting will occur this Thursday, > Aug. 24. Th

Re: Aug. 2017 Hive User Group Meeting

2017-08-21 Thread Xuefu Zhang
Dear Hive users and developers, As reminder, the next Hive User Group Meeting will occur this Thursday, Aug. 24. The agenda is available on the event page ( https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/). See you all there! Thanks, Xuefu On Tue, Aug 1, 2017 at 7:18 PM, Xuefu

RE: Unexpected query result

2017-08-21 Thread Frank Luo
One possibility is that count(*) gives a cached stat, while count(distinct field) actually read data and perform the logic. Try to set the below and test again: set hive.compute.query.using.stats=false; From: Igor Kuzmenko [mailto:f1she...@gmail.com] Sent: Monday, August 21, 2017 10:01 AM To:

Unexpected query result

2017-08-21 Thread Igor Kuzmenko
Runnuning simple '*select count(*) from test_table*' query returned me 500_000 result. But when i run '*select count(distinct field) from test_table*' query result is 500_001. How it coud happen, that in table with 500_000 records have 500_001 unique field values? I'm using Hive from HDP 2.5.0 p