On 11/18/15, Nico Williams <nico at cryptonector.com> wrote: > > Now, diff(1) (i.e., the Unix command) can do this very efficiently > because it just moves a "cursor" in both files looking for different > records (lines). >
Umm.. Diff is a little more complicated than that. See https://en.wikipedia.org/wiki/Longest_common_subsequence_problem further niformation. Basically, any efficient implementation of diff(1) needs to have both files completely in memory, because the computation does lots of jumping around and searching within both files. I wrote the diff implementation for Fossil, from scratch, so I have some familiarity with these kinds of things. Would that it were so simple as a couple of moving cursors... That said, given sorted tables like this: CREATE TABLE t1(a,b,c,PRIMARY KEY(a,b)); CREATE TABLE t2(x,y,z,PRIMARY KEY(x,y)); Then the following query: SELECT a, b FROM t1 EXCEPT SELECT x, y FROM t2 ORDER BY 1, 2; Works by running both SELECT statements in parallel - walking cursors down each table - and "merging" the results together in a single pass. Sadly, SQLite currrently does *not* do this if you omit the ORDER BY clause: SELECT a, b FROM t1 EXCEPT SELECT x, y FROM t2; So here is a case where (paradoxically) adding an ORDER BY clause makes a query go faster. I noticed this a few weeks ago and was thinking that I need to do something about it. -- D. Richard Hipp drh at sqlite.org