On 05/13/2016 09:19 PM, Matt Hamilton wrote: > Hi all, > Anyone know if/how you can call the FTS5 tokeniser functions manually? > e.g. I want to look something up in the fts5vocab table but can't as I need > to split/stem the initial value first before querying the table? > > To illustrate: > > sqlite> CREATE VIRTUAL TABLE ft1 USING fts5(x, tokenize = porter); > sqlite> INSERT INTO ft1 VALUES('running man'); > sqlite> CREATE VIRTUAL TABLE ft1_v_row USING fts5vocab(ft1, row); > sqlite> SELECT * FROM ft1_v_row; > man|1|1 > run|1|1 > sqlite> SELECT count(*) FROM ft1_v_row WHERE term = 'running'; > 0 > sqlite> > > How can I somehow map 'running' => 'run' in order to query the fts5vocab > table to get stats on that term? And how could I tokenise 'running man' => > 'run', 'man' in order to look up multiple tokens?
I think the only way to do that at the moment is from C code using the API in fts5.h: https://www.sqlite.org/fts5.html#section_7 Use xFindTokenizer() to grab a handle for the desired tokenizer module, then xCreate to create an instance and xTokenize to tokenize text. There is example code in the fts5_test_tok.c file: http://sqlite.org/src/artifact/db08af63673c3a7d The example code creates a virtual table module that looks useful enough: CREATE VIRTUAL TABLE ttt USING fts5tokenize('porter'); then: SELECT * FROM ft1_v_row WHERE term IN (SELECT token FROM ttt('running man')); should probably work. More information in fts5_test_tok.c. Dan.