Thank you all for this discussion. Very helpful.

-----Original Message-----
From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal 
Vijayaraghavan
Sent: Thursday, January 28, 2016 7:43 PM
To: user@hive.apache.org
Subject: Re: bloom filter used in 0.14?

> So I am questioning whether it is enabled on the version I am on,
>which is 0.14. Does anyone know?

https://issues.apache.org/jira/browse/HIVE-9188 - fix-version (1.2.0)


The version you are using does not have bloom filter support.

It should be ignoring the parameter and not generating any bloom filter 
streams, when writing.

hive --orcfiledump (in later versions) will print the BLOOM_FILTER as a column 
next to the row index streams.

> Without any optimization, I have to use thousands of mappers to find
>just one id.


Everything else you are doing is appropriate, however be aware that the bloom 
filter index (& row-index) is consulted only *after* a mapper starts up.

So it might still spin up a mapper, but it might exit immediately, which plays 
well into Tez container reuse for very busy clusters - in fact, it might be 
faster in a busy cluster than a completely idle one.

The sorted[1] min-max indicators suggested by Prasanth however are actually 
rolled up to the split-level & can be used to prune splits before being 
scheduled.

Cheers,
Gopal
[1] - only CLUSTER BY needed, not ORDER BY


[”MerkleONE”]<http://www2.merkleinc.com/janfooter>
This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

Reply via email to