On Feb 10, 2010, at 2:47 AM, Alexey Pechnikov wrote: > Hello! > > 1. The first query is 10x slower! But the performance may be very > similar in theory: > =========== > sqlite> select snippet(file_text) from file_text where rowid=9737 > and file_text match 'london'; > ... > CPU Time: user 0.156010 sys 0.016001 > > sqlite> select length(content) from file_text where rowid=9737 and > file_text match 'london'; > 1189837 > CPU Time: user 0.016001 sys 0.008001 > =========== > > 2. The query below may use only index information but is very slow > too: > =========== > sqlite> select offsets(file_text) from file_text where rowid=9737 > and file_text match 'london'; > 0 0 985905 6 0 0 1154740 6 0 0 1663053 6 > CPU Time: user 0.152010 sys 0.008000 > =========== > > 3. The offsets() and snippet() functions have no some of the needed > checks: > =========== > sqlite> select offsets(file_text) from file_text where rowid=9737; > Segmentation fault > > sqlite> select snippet(file_text) from file_text where rowid=9737; > Segmentation fault > =========== >
Thanks for this. (3) is fixed in fossil now. The speed of (1) and (2) depends partly on the tokenizer used. They both eat up a lot of CPU because you have to tokenize the whole document (technically, snippet only has to tokenize from the start of the document to the end of the returned snippet - but in your example it looks like the snippet would be near the end of the document anyhow). Case (2) unfortunately cannot be done entirely with info from the full-text index, as the full-text index only indexes terms by token position within each document, not by byte offset. Dan. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users