[sqlite] What encoding format is used in the FTS3 tokenizer? and other tokenizer questions.

Garry Watkins Mon, 04 May 2009 14:22:30 -0700

I am writing a FTS3 tokenizer that works with iPhoneOS using Apple's  
CoreFoundation library.


What encoding is used on inbound insert statements into a FTS3 virtual  
table?  For example I have Japanese text encoded as UTF-8 and passed  
in as UTF-8 insert statement is encoded as UTF-8.  I am not using the  
ICU library (SQLITE_ENABLE_ICU is not defined).

In the above tokenizer I want to eliminate words (stemming), do I just  
not return those words, and move to the next?  Does this actually  
influence the text that is stored, or is this just used for indexing?

I am building a static library, and I want the tokenizer to be  
available for anything that I link it with.  Is there an easy way to  
make that happen?  I currently configured it through adding code into  
the core FTS3 code.  I don't want to do this each time, and would like  
to keep the source separated, so I can easily change versions of SQLite.

Finally, which distro should I use the regular amalgamation or the  
unix amalgamation?  If the latter what options should be passed into  
the config for iphoneOS?

Thanks
Garry

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] What encoding format is used in the FTS3 tokenizer? and other tokenizer questions.

Reply via email to