On Tue, May 27, 2008 at 2:41 PM, Petite Abeille <[EMAIL PROTECTED]> wrote:
> Hello, > > % sqlite3 -version > 3.5.9 > > I'm trying to figure out a frugal way to handle a unique key > constrain... > > I tried using both 'insert or ignore' and a self join. The self join > seems to be noticeably faster even though 'insert or ignore' would > empirically appear to be the better deal (shorter query plan, less VM > instructions). > > Specifically, given the following DML: > > insert or ignore > into token( name ) > select stage.token as name > from stage > order by stage.token; > > One gets a query plan like such: > > 0|0|TABLE stage > > And 'explain' reports 58 VM instructions. > > > On the other hand, the following self join... > > insert > into token( name ) > select stage.token as name > from stage > left join token on token.name = stage.token > where token.id is null > order by stage.token; > > ... uses a query plan like such: > > 0|0|TABLE stage > 1|1|TABLE token WITH INDEX token_name > > ... and 82 VM instructions. > > Nonetheless, the self join would appear to be around 10% faster than > the 'insert or ignore' flavor. > > Not sure why this is the case though... considering the apparent > overhead incurred by the join. > > Thoughts? > Well, the first thing you should bring away from this experience is that the number of VM instructions isn't really an indicator of how efficient the query is :) Now, I'm not sure exactly why one is faster than the other, especially since you didn't post your exact schema and indices, and I have no idea how many rows there are in either table. But if I had to guess, it's because of the ORDER BY clause. In general, an ORDER BY means that SQLite needs to generate a temporary table with all the rows to be selected/inserted, then sort that temporary table. The INSERT OR IGNORE version has to unconditionally sort the entire 'stage' table; your second query only has to sort those rows in 'stage' that don't already exist in 'table'. If each table fits comfortably in your computer's disk cache, the extra pass won't matter so much. In any case, I invite you to try the following: 1. Add an index: [[ create index stage_token_ix on stage(token); ]] SQLite will use that index to improve the ORDER BY. 2. Try the following variation: insert into token( name ) select stage.token as name from stage where not exists(select 1 from token where token.name = stage.token) order by stage.token; -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users