"Richard Hipp" wrote...
On Sun, Sep 14, 2014 at 4:31 PM, Simon Slavin <slav...@bigfraud.org> wrote:


On 14 Sep 2014, at 8:58pm, jose isaias cabrera <jic...@cinops.xerox.com>
wrote:

> Yeah, that is what I am using now.  I was trying to get the speed that
supposedly is in the IN clause. :-)  Thanks.

BETWEEN is fast[er] than IN....f


Confirmed. Just to verify this, I ran the following script which times the
same query using both IN and BETWEEN:

.open /home/drh/www/repos/tcl.fossil
CREATE TEMP TABLE t1(id INTEGER PRIMARY KEY, date TEXT);
INSERT INTO t1 SELECT objid, date(mtime) FROM event;
SELECT count(*) FROM t1;
.timer on
SELECT count(*) FROM t1
WHERE date BETWEEN '2014-01-01' AND '2014-01-05';
WITH RECURSIVE
 range(date) AS (SELECT '2014-01-01'
                 UNION ALL
                 SELECT date(date,'+1 day') FROM range
                  WHERE date<'2014-01-05')
SELECT count(*) FROM t1
WHERE date in range;

The t1 table above has 61856 dates in it, only 45 of which are in the
selected range.  Using BETWEEN is about twice the speed of using IN.

That makes sense.  The IN operator (usually) builds a transient table with
all of the values of the right-hand side, then does a membership test for
row of the input table.  The building of the transient table is a one-time
setup action so we can ignore the performance impact of that.  But then it
has to do a binary search of the transient table for each of the 61856 rows
of the input table.  A binary search of the 5-entry "range" table involves
about 3 comparisons, on average, whereas the BETWEEN operator involves
about 1.5 comparisons per row, on average (assuming the first comparison is
false about half the time).

Note that the to queries are not exactly equivalent though. With the IN in
the WHERE clause, a date of '2014-01-03-extra-text' would be excluded
whereas it would be included using the BETWEEN operator.  So one way to
look at it is that the IN operator is being more selective and hence
requires additional comparisons.

Thanks, Dr. Hipp. This is perfect. If I may, this should probably be part of the documentation, as I am always looking for speed and this confirms the result.

josé
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to