Hi All,
Has anybody used hive on spark in your production environment? How
does it's the stability and performance compared with spark sql?
Hope anybody can share your experience.
Thanks in advance!
Parquet has also internal indexes. So no need for Hive index there.
For fast ad-hoc queries you can use Tez +llap. Here you could use parquet or
convert via CTAS easily to Orc. However you need to check if ORC is faster than
Parquet depending on your data, queries and configuration (bloom filters
This seems out of the blue but my initial benchmarks have shown that
there's no performance gain when Hive index is used with Tez engine. I'm
not sure why, but several posts online have suggested that Tez engine does
not support Hive index (bitmap, compact). Is true? If yes, that is sad.
I underst
For us out of town folks, where is the location of this meetup? Says
Hortonworks but do you have an address?
Regards
Dano
On Mon, Aug 21, 2017, 1:33 PM Xuefu Zhang wrote:
> Dear Hive users and developers,
>
> As reminder, the next Hive User Group Meeting will occur this Thursday,
> Aug. 24. Th
Dear Hive users and developers,
As reminder, the next Hive User Group Meeting will occur this Thursday,
Aug. 24. The agenda is available on the event page (
https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/).
See you all there!
Thanks,
Xuefu
On Tue, Aug 1, 2017 at 7:18 PM, Xuefu
One possibility is that count(*) gives a cached stat, while count(distinct
field) actually read data and perform the logic.
Try to set the below and test again:
set hive.compute.query.using.stats=false;
From: Igor Kuzmenko [mailto:f1she...@gmail.com]
Sent: Monday, August 21, 2017 10:01 AM
To:
Runnuning simple '*select count(*) from test_table*' query returned me
500_000 result.
But when i run '*select count(distinct field) from test_table*' query
result is 500_001.
How it coud happen, that in table with 500_000 records have 500_001 unique
field values?
I'm using Hive from HDP 2.5.0 p