On 9/19/07, Ralf Junker <[EMAIL PROTECTED]> wrote:
> >Regarding per-row versus per-column tokenizers, your suggesting to
> >have something like 'content_kr TOKENIZER icu_kr' for each variant is
> >reasonable, but I'm not certain what gain it would have over simply
> >having separate tables for each
Hello Scott Hess,
>I think that if you do not need the ability to customize tokenizer on
>a per-row basis, there should be no storage cost compared to the
>current implementation.
Glad to read this!
>
>
>Regarding per-row versus per-column tokenizers, your suggesting to
>have something like
[Input == great!]
Regarding space usage, my current prototype stores an additional
column on %_content, defined like 'tokenizer TEXT DEFAULT NULL'. If
NULL, the default tokenizer is used and the per-row cost is
negligible. Otherwise, it is the string specifying the tokenizer,
which I would
Hello Scott Hess,
>In the interests of not committing something that people won't like, my
>current proposal would be to add an implicit TOKENIZER column, which will
>override the table's default tokenizer for that row.
There are a few things I am worried about with this approach:
1. FTS stor
As part of doing internationalization work on Gears, it has been
determined that it is unlikely that you can just define a global
tokenizer that will work for everything. Instead, in some cases you
may need to use a specific tokenizer, based on the content being
tokenized, or the source of the con
5 matches
Mail list logo