On Sun, 2013-12-01 at 00:40 +0000, Simon Slavin wrote:
> On 30 Nov 2013, at 5:40pm, Tristan Van Berkom <trista...@openismus.com> wrote:
> 
> > So, is there a way that I can tell SQLite forcibly to
> > prioritize the index on email_list.value when making
> > a prefix match ?
> 
> Don't use LIKE or GLOB for prefix matches.  Although you as a human can tell 
> that
> 
> >>> email_list.value LIKE 'eddie%'
> 
> is a prefix match, all the computer sees is pattern-matching.  This makes it 
> try all the available combinations.  Instead use
> 
> email_list.value BETWEEN 'eddie' AND 'eddie}'
> 
> (I chose the close curly bracket because it has a very high code and will 
> sort near the end of the possibiliites.)
> 

I'm not comfortable with this.

Do you have a suggestion which does not make an assumption about what
is the highest value of a character ?

Perhaps there is a definite and clear way to find the highest possible
character ?

Using case sensitive LIKE statements *do* allow SQLite to traverse the
index on email_list.value, perhaps not as optimized as the trick you
propose but at least it does not compromise the user facing API.

This sounds like the kind of trick you can get away with when you
are in control of the data which is added to your DB, and the query
terms which will be issued against it (neither of these apply
in my case).

> Using BETWEEN allows SQLite to use any available index for searching and 
> sorting.  In this case it's equivalent to saying "We don't have to look 
> through all 120 pages of this book, we need only pages 34 to 49.".  In the 
> case of the SELECT you show, modified for my suggestion, a good index would 
> be something like
> 
> CREATE INDEX fiel_uv ON folder_id_email_list (uid, value)
> 
> This allows a specific match on a uid and then partial matching on value, 
> which is what it wants.  This can replace the existing index you mention.
> 

Will this work without using the BETWEEN statement you describe above ?
(I think this is orthogonal to the LIKE statement, but I should still
ask to be sure).

Would this replace:

  o The index created on email_list.uid, for the purpose of 
    optimizing the DELETES involved in replacing existing entries ?

  o The index on email_list.value for optimized prefix searches
    and exact matches ?

  o Both of the above ?

I suppose with this I could just remove the '+' from:
   " ON +email_list.uid = summary.uid " ?

This fancy index definitely interests me, and looks like it
would do the right thing to optimize phone number or email
matches for addressbooks of over 100,000 contacts.

However, as you can see it's not much of an issue at this point:
https://people.gnome.org/~tvb/eds-benchmarks-30-11-2013/filter-by-short-email-address-prefix.png

Even with 200,000 contacts, the short email prefix match is now
down to a couple milliseconds.

I would like to flatten that red line a bit more, but it would
be purely to satisfy my sense of perfectionism ;-)


Thank you very much for sharing your knowledge with me :)

Best,
    -Tristan

> Simon.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to