On Feb 10, 2010, at 2:47 AM, Alexey Pechnikov wrote:

> Hello!
>
> 1. The first query is 10x slower! But the performance may be very  
> similar in theory:
> ===========
> sqlite> select snippet(file_text) from file_text where rowid=9737  
> and file_text match 'london';
> ...
> CPU Time: user 0.156010 sys 0.016001
>
> sqlite> select length(content) from file_text where rowid=9737 and  
> file_text match 'london';
> 1189837
> CPU Time: user 0.016001 sys 0.008001
> ===========
>
> 2. The query below may use only index information but is very slow  
> too:
> ===========
> sqlite> select offsets(file_text) from file_text where rowid=9737  
> and file_text match 'london';
> 0 0 985905 6 0 0 1154740 6 0 0 1663053 6
> CPU Time: user 0.152010 sys 0.008000
> ===========
>
> 3. The offsets() and snippet() functions have no some of the needed  
> checks:
> ===========
> sqlite> select offsets(file_text) from file_text where rowid=9737;
> Segmentation fault
>
> sqlite> select snippet(file_text) from file_text where rowid=9737;
> Segmentation fault
> ===========
>

Thanks for this.

(3) is fixed in fossil now.

The speed of (1) and (2) depends partly on the tokenizer used. They
both eat up a lot of CPU because you have to tokenize the whole
document (technically, snippet only has to tokenize from the start of
the document to the end of the returned snippet - but in your example
it looks like the snippet would be near the end of the document anyhow).

Case (2) unfortunately cannot be done entirely with info from the
full-text index, as the full-text index only indexes terms by token
position within each document, not by byte offset.

Dan.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to