I've written a Sqlite diff tool that does this fast, but using a different approach. I'm adding a column named md5checksum to each to be diffed table, and add a index to it. A small tool then generates checksums for all rows in the table. Instead of comparing all columns, only the difference of checksums are considered when generating a diff of changed rows. The set of inserted and deleted rows is then applied again in a optimized way to the old db.
Ben Von meinem iPad gesendet > Am 18.11.2015 um 21:11 schrieb Nico Williams <nico at cryptonector.com>: > > Consider two tables with the same columns and primary keys, and then > consider this query: > > SELECT 'added., a.* FROM > (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a > UNION ALL > SELECT 'deleted', d.* FROM > (SELECT b.* FROM b EXCEPT SELECT a.* FROM a) d; > > In other words, the SQL analog of a straightforward diff(1) of two > [sorted] files. I think this is the canonical way to do this in SQL, > though there are variants. > > Now, diff(1) (i.e., the Unix command) can do this very efficiently > because it just moves a "cursor" in both files looking for different > records (lines). > > SQLite3, on the other hand, does four table scans and further scans two > sub-queries' result row-sets. If only the optimizer could recognize > this pattern, then it could surely do as well as diff(1). This sort of > query strikes me as a common enough kind of query that implementing such > an optimization might be worthwhile, though the SQLite3 team will have > better knowledge of that. I'd certainly appreciate such an > optimization. (No, Postgres 9.4.4 doesn't seem to optimize this > substantially better either.) > > A difference UDF would be one way to handle this. Table-valued > functions are half the battle, but one still needs table-valued > arguments to UDFs to make it clean looking: > > -- Outputs two-column rows where the first column denotes addition vs > -- deletion and the second is a row-valued column with the result > -- columns of the corresponding row in the old or new table source: > SELECT diff((SELECT ...), (SELECT ...)); > > Using "eval" and JSON one could define a diff() UDF that takes two > strings, compile the two strings as SQL statements and execute them, > stepping over the results of each just like diff(1) would: > > -- Outputs JSON arrays of addition/deletion marker and column array or > -- object. > SELECT diff('SELECT ...', 'SELECT ...'); > > and/or > > SELECT diff('old_view_or_table', 'new_view_or_table'); > > Perhaps SQL could grow a DIFFERENCE operator for this, though there are > syntactic/semantic issues to deal with. Maybe: > > SELECT d.diff, o.c0, o.c1, .., o.cN > FROM old_source o > DIFFERENCE (/*deletion*/ '<', '>' /*addition*/)) d > SELECT d.diff, n.c0, n.c1, .., n.cN > FROM new_source n; > > One might get a comm(1) equivalent too: > > SELECT d.diff, o.c0, o.c1, .., o.cN > FROM old_source o > DIFFERENCE (/*deletion*/ '<', '>' /*addition*/), '=' /*same*/) d > SELECT d.diff, n.c0, n.c1, .., n.cN > FROM new_source n; > > Nico > -- > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users