Hi - I'm using sqlite 3.6.21 with this patch<http://www.sqlite.org/src/ci/6cbbae849990d99b7ffe252b642d6be49d0c7235>, which I found in this forum a few weeks ago. I'm also using a custom tokenizer which I wrote.
My scenario is this: I am storing XHTML in the database, and I want to FTS-enable this content. I only want to index the text contained within the XHTML elements, not the element names or attributes. (e.g. "<dont-index this="or this">index this</...>") My tokenizer skips over element names and attributes, then delegates the element textual content to the Porter tokenizer. On return from the Porter tokenizer, I correct the token offset and length values to be the actual offsets within the document (Porter tokenizer doesn't ever see the whole document, just a string within a tag). I didn't want to ship my tokenizer with my app for two reasons. 1 - I wrote it using an API not available to my client app, 2 - it doesn't make sense because on the client the user will be entering search terms that aren't surrounded by xml tags, which is what my tokenizer expects. Instead, my client registers a tokenizer with the same name as my custom tokenizer, but in fact it is registering a copy of the porter tokenizer. I expected this to work fine - and it appeared to, until I discovered that it was pulling out text in some of the xml attributes - which shouldn't be indexed. It turns out that FTS3 is re-tokenizing the content (not just the search term) on the client (using my copy of the Porter tokenizer) and returning those results. I don't understand why - is this a bug or is this normal behavior? I expected the fts index to retain all of the token offsets/sizes such that they wouldn't have to be recomputed on the client. My workaround is to port my tokenizer so that it runs on the client, and to wrap search terms in dummy xml tags <dummy>like this</dummy>. But I feel I shouldn't have to do this... Any feedback appreciated... Nick Hodapp _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users