Re: [sqlite] 'insert or ignore' vs self join?

Petite Abeille Wed, 28 May 2008 10:23:14 -0700

Hello,

On May 27, 2008, at 9:07 PM, Stephen Oberholtzer wrote:


> Well, the first thing you should bring away from this experience is  
> that the
> number of VM instructions isn't really an indicator of how efficient  
> the
> query is :)

Good point :)

> Now, I'm not sure exactly why one is faster than the other,  
> especially since
> you didn't post your exact schema and indices,

The DDL is rather straightforward:

     create table if not exists token
     (
         id      integer primary key not null,
         name    text not null
     )

     create unique index if not exists token_name on token( name )

http://dev.alt.textdrive.com/browser/HTTP/Finder.ddl#L60

> and I have no idea how many rows there are in either table.

The incoming data set size varies from 108 to 3345 rows, with a  
average size of around 930 rows. The target table size is about 75,148  
rows. Over time, most of the incoming rows will already exist in the  
target table.

> But if I had to guess, it's because of the ORDER BY clause.  In  
> general, an
> ORDER BY means that SQLite needs to generate a temporary table with  
> all the
> rows to be selected/inserted,
> then sort that temporary table.  The INSERT OR IGNORE version has to
> unconditionally sort the entire 'stage' table; your second query  
> only has to
> sort those rows in 'stage' that don't already exist in 'table'.  If  
> each
> table fits comfortably in your computer's disk cache, the extra pass  
> won't
> matter so much.

Ah... yes... the order by clause... good point... indeed removing the  
'order by' from the 'insert or ignore' statement brings down its  
execution time a whisker away from the 'self join' version :)

Thanks for the explaination!

Cheers,

--
PA.
http://alt.textdrive.com/nanoki/
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] 'insert or ignore' vs self join?

Reply via email to