Re: [U2] Unidata index and short-circuit evaluation

Rutherford, Marc Fri, 08 Mar 2013 14:15:27 -0800

Jeffery,

I would say #1 does the trick.


Any U2 TCL query evaluates from left to right.  When one term eliminates the 
record the query on that record stops, and the query continues to the next 
record.  So #1 is exactly your most effective, leaving the complex i-descriptor 
to evaluate only the smallest sub-set of records.

#2 and #3 don't save you anything with a first pass, only to repeat the same 
select on the second.  

I typically only use multiple passes when a I have complex nested 'and' and 
'or' clauses.  Unidata is somewhat obscure in the way it wants parentheses 
formatted (I am an old PICK hand).  So I will break up the query into separate 
passes that a human can easily understand.  Since I normally start my query 
with an indexed field I don't sweat the overhead of stacked queries.

Marc Rutherford
Principal Programmer Analyst
Advanced Bionics LLC
661) 362 1754

-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Jeffrey Butera
Sent: Friday, March 08, 2013 1:04 PM
To: U2 Users List
Subject: [U2] Unidata index and short-circuit evaluation

While I'm on a roll...  I often look at how to make queries run faster.  
In short, we index all the commonly used data fields we can and (of
course) it makes world of difference.  However, I have some questions about 
optimal ways to query data using a mix of indexed data, non-indexed data and 
i-descriptors.

Let's say that in a table FOO I want to do SELECT FOO WITH A='foo' AND B='bar' 
AND C='bang'
where:

A = indexed data field
B = non-index data field
C = I-descriptor (assume it's time-consuming: 2 seconds per record)

Which is the optimal way to attack?

1) I could just go for it with:

SELECT FOO WITH A='foo' AND B='bar' AND C='bang'

2) I could do the following:

SELECT FOO WITH A='foo' AND B='bar'
SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT

3) I could do the following:

SELECT FOO WITH A='foo'
SELECT FOO WITH A='foo' AND B='bar' AND C='bang' REQUIRE.SELECT


I've done benchmarks, but really curious about the innards of Unidata and 
how/when it does short-circuit evaluation of AND clauses, etc.

My gut tells me that (3) should be good because it first weeds out bad records 
based solely on an indexed data field, thereby reducing the number of records 
that "C" needs to be evaluated.  Conversely, if it's doing a good job with 
short-circuit evaluation then (3) and (1) shouldn't be terribly different 
because failure of A='foo' would imply that C never gets evaluated.


--
Jeffrey Butera, PhD
Associate Director for Applications and Web Services Information Technology 
Hampshire College
413-559-5556

http://www.hampshire.edu
http://www.facebook.com/hampshirecollegeit

_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] Unidata index and short-circuit evaluation

Reply via email to