On 05/13/2016 09:19 PM, Matt Hamilton wrote:
> Hi all,
>    Anyone know if/how you can call the FTS5 tokeniser functions manually? 
> e.g. I want to look something up in the fts5vocab table but can't as I need 
> to split/stem the initial value first before querying the table?
>
> To illustrate:
>
> sqlite> CREATE VIRTUAL TABLE ft1 USING fts5(x, tokenize = porter);
> sqlite> INSERT INTO ft1 VALUES('running man');
> sqlite> CREATE VIRTUAL TABLE ft1_v_row USING fts5vocab(ft1, row);
> sqlite> SELECT * FROM ft1_v_row;
> man|1|1
> run|1|1
> sqlite> SELECT count(*) FROM ft1_v_row WHERE term = 'running';
> 0
> sqlite>
>
> How can I somehow map 'running' => 'run' in order to query the fts5vocab 
> table to get stats on that term? And how could I tokenise 'running man' => 
> 'run', 'man' in order to look up multiple tokens?

I think the only way to do that at the moment is from C code using the 
API in fts5.h:

   https://www.sqlite.org/fts5.html#section_7

Use xFindTokenizer() to grab a handle for the desired tokenizer module, 
then xCreate to create an instance and xTokenize to tokenize text.

There is example code in the fts5_test_tok.c file:

   http://sqlite.org/src/artifact/db08af63673c3a7d

The example code creates a virtual table module that looks useful enough:

   CREATE VIRTUAL TABLE ttt USING fts5tokenize('porter');

then:

   SELECT * FROM ft1_v_row WHERE term IN (SELECT token FROM ttt('running 
man'));

should probably work. More information in fts5_test_tok.c.

Dan.




Reply via email to