Yeah I had thought about using the byte distance between words but you get these instances:
[Example A] |word1|10charword|word2| [Example B] |word1|3charword|4charword|3charword|word2| By using byte distances, both of these score the same, where Example A should score more highly. But it would seem I can use the fts3_tokenizer somehow to get the token positions or that this underlying value is available but just not stored in an accessible manner. I implemented OkapiBM25f [1] but was hoping to implement something like the following proximity ranking [2] as it combines Bag-Of-Words ranking and proximity ranking. Although that article proposes to precalculate the distance pairs for all tokens, I'm happy to accept the TimeCost and calculate on the fly as that SpaceCost won't be worth it. [1] https://github.com/neozenith/sqlite-okapi-bm25 [2] http://infolab.stanford.edu/~theobald/pub/proximity-spire07.pdf -- View this message in context: http://sqlite.1065341.n5.nabble.com/Proximity-ranking-with-FTS-tp76149p76152.html Sent from the SQLite mailing list archive at Nabble.com. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users