Re: [sqlite] 'insert or ignore' vs self join?

Stephen Oberholtzer Tue, 27 May 2008 12:07:50 -0700

On Tue, May 27, 2008 at 2:41 PM, Petite Abeille <[EMAIL PROTECTED]>
wrote:


> Hello,
>
> % sqlite3 -version
> 3.5.9
>
> I'm trying to figure out a frugal way to handle a unique key
> constrain...
>
> I tried using both 'insert or ignore' and a self join. The self join
> seems to be noticeably faster even though 'insert or ignore' would
> empirically appear to be the better deal (shorter query plan, less VM
> instructions).
>
> Specifically, given the following DML:
>
> insert      or ignore
> into        token( name )
> select      stage.token as name
> from        stage
> order by    stage.token;
>
> One gets a query plan like such:
>
> 0|0|TABLE stage
>
> And 'explain' reports 58 VM instructions.
>
>
> On the other hand, the following self join...
>
> insert
> into        token( name )
> select      stage.token as name
> from        stage
> left join   token on token.name = stage.token
> where       token.id is null
> order by    stage.token;
>
> ... uses a query plan like such:
>
> 0|0|TABLE stage
> 1|1|TABLE token WITH INDEX token_name
>
> ... and 82 VM instructions.
>
> Nonetheless, the self join would appear to be around 10% faster than
> the 'insert or ignore' flavor.
>
> Not sure why this is the case though... considering the apparent
> overhead incurred by the join.
>
> Thoughts?
>

Well, the first thing you should bring away from this experience is that the
number of VM instructions isn't really an indicator of how efficient the
query is :)

Now, I'm not sure exactly why one is faster than the other, especially since
you didn't post your exact schema and indices, and I have no idea how many
rows there are in either table.
But if I had to guess, it's because of the ORDER BY clause.  In general, an
ORDER BY means that SQLite needs to generate a temporary table with all the
rows to be selected/inserted,
then sort that temporary table.  The INSERT OR IGNORE version has to
unconditionally sort the entire 'stage' table; your second query only has to
sort those rows in 'stage' that don't already exist in 'table'.  If each
table fits comfortably in your computer's disk cache, the extra pass won't
matter so much.

In any case, I invite you to try the following:

1. Add an index: [[ create index stage_token_ix on stage(token);  ]] SQLite
will use that index to improve the ORDER BY.

2. Try the following variation:

insert into        token( name )
select      stage.token as name
from        stage
where not exists(select 1 from token where token.name = stage.token)
order by    stage.token;

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] 'insert or ignore' vs self join?

Reply via email to