On 3/13/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Ion Silvestru <[EMAIL PROTECTED]> wrote:
> To Ralf:
> >As a side effect, the offsets() and snippet() functions stopped working,
> >as they seem to rely on the presence of the full document text in the
> >current implementation.
>
> Did you tested "phrase" searching on the index-only version, didn't this
> kind of search rely on offsets()?

Phrase searches do *not* use the full document text.
But UPDATE and DELETE do, ironically.  Or at least they
used to, unless Scott has changed that in FTS2.

Indeed, phrase searches should continue to work, because since we have
the terms from the query, we can look them up and compare their token
positions in the document (offsets being the character positions of
the tokens).

UPDATE and DELETE need to have the previous document text, because the
docids are embedded in the index, and there is no docid->term index
(or, put another way, the previous document text _is_ the docid->term
index).  Keeping track of that information would probably double the
size of the index.  A thing I've considered doing is to keep deletions
as a special index to the side, which would allow older data to be
deleted during segment merges.  Unfortunately, I suspect that this
would slow things down by introducing another bit of data which needs
to be considered during merges.

Of course, there's no way the current system could generate snippets
without the original text, because doclists don't record the set of
adjacent terms.  That information could be recorded, but it's doubtful
that doing so would be an improvement on simply storing the original
text in the first place.  The current system _does_ have everything
needed to generate the offsets to hits even without the original text,
so the client application could generate snippets, though the code is
not currently in place to expose this information.

Being able to have an index without storing the original data was a
weak goal when fts1 was being developed, but every time we visitted
it, we found that the negatives of that approach were substantial
enough to discourage us for a time.  [The "we" in that sentence means
"me and the various people I run wacky ideas past."]  I'm keeping an
eye out for interesting implementation strategies and the time to
explore them, though.

-scott

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to