Re: [sqlite] Opinions about per-row tokenizers for fts?

2007-09-19 Thread Scott Hess
On 9/19/07, Ralf Junker <[EMAIL PROTECTED]> wrote: > >Regarding per-row versus per-column tokenizers, your suggesting to > >have something like 'content_kr TOKENIZER icu_kr' for each variant is > >reasonable, but I'm not certain what gain it would have over simply > >having separate tables for each

Re: [sqlite] Opinions about per-row tokenizers for fts?

2007-09-19 Thread Ralf Junker
Hello Scott Hess, >I think that if you do not need the ability to customize tokenizer on >a per-row basis, there should be no storage cost compared to the >current implementation. Glad to read this! > > >Regarding per-row versus per-column tokenizers, your suggesting to >have something like

Re: [sqlite] Opinions about per-row tokenizers for fts?

2007-09-18 Thread Scott Hess
[Input == great!] Regarding space usage, my current prototype stores an additional column on %_content, defined like 'tokenizer TEXT DEFAULT NULL'. If NULL, the default tokenizer is used and the per-row cost is negligible. Otherwise, it is the string specifying the tokenizer, which I would

Re: [sqlite] Opinions about per-row tokenizers for fts?

2007-09-18 Thread Ralf Junker
Hello Scott Hess, >In the interests of not committing something that people won't like, my >current proposal would be to add an implicit TOKENIZER column, which will >override the table's default tokenizer for that row. There are a few things I am worried about with this approach: 1. FTS stor

[sqlite] Opinions about per-row tokenizers for fts?

2007-09-17 Thread Scott Hess
As part of doing internationalization work on Gears, it has been determined that it is unlikely that you can just define a global tokenizer that will work for everything. Instead, in some cases you may need to use a specific tokenizer, based on the content being tokenized, or the source of the con