Let's say I have search phrase "TermA TermB"

matchinfo option 'p' would be 2. 

CREATE VIRTUAL TABLE t1 USING fts4(title, content);

matchinfo option 'c' returns 2 for the number of columns. 

Now consider sample data:

|1|""|"TermA"|
|2|"TermA TermB"|"TermA TermA"|
|3|"TermA TermA TermA"|"TermB"|

matchinfo option 'x' would have ('p' * 'c' * 3) bytes of data per row. 

But each of these are aggregate pieces of information. 

For example in the list of (p)hrase terms I'm interested in the one at index 
[0] :: "TermA" in (c)olumn [0] for when matchinfo is looking at row [3]; I'm 
going to need a list of 3 token positions. But if there were more matches I'm 
going to need N token positions. 

So if matchinfo had a 't' option which is the total number of token hits within 
the row and this is an int then we can have option 'q' which would have the 
following data:

int N = matchinfo[T_OFFSET];
for (int i = 0; i < N; i++)
{
    // this is composed on my phone so pardon the poor indenting. 
    int phraseTerm = matchinfo[Q_OFFSET + 3*i];
    int column = matchinfo[Q_OFFSET + 3*i + 1];
    int tokenPosition = matchinfo[Q_OFFSET + 3*i + 2];
}

Again ideally this would be precomputed so matchinfo can maintain its speed in 
forming the BLOB. 

This is similar to how offsets() returns results but the documentation says 
that offsets() is an order of magnitude slower and I'm presuming it is using 
the fts3tokenize() on the matched results and tokenising the data again. 

A quick win would be to make a token_offsets() function that uses the 
fts3tokenise() function to get the values we are after by tokenising the 
results. Technically it'd get the job done but I'd like it to still have the 
speed matchinfo has so the proximity ranking isn't waiting on tokenising 
documents all the time. 

So if it is to be precalculated it will have to be stored in a shadow table 
somewhere and also updated accordingly with FTS4 INSERT, UPDATE and DELETE 
actions. 

Regards

Josh



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Proximity-ranking-with-FTS-tp76149p76156.html
Sent from the SQLite mailing list archive at Nabble.com.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to