Re: simple search api (was Re: mimetype standardisation by testsets)

Jos van den Oever Mon, 27 Nov 2006 13:32:29 -0800

2006/11/27, Joe Shaw <[EMAIL PROTECTED]>:

On Mon, 2006-11-27 at 21:42 +0100, Jos van den Oever wrote:
> Hmm, in Strigi text fragments are returned with every query and the
> results are about as fast as i can type, so I guess this depends on
> the search engine. Since the text fragments are an important part of
> the user experience, I think we should have them.
> At them moment we only return the fragment for the 'content' field though.


Are you storing the full text of the document in the index?

Yes and yes this is big. This will be configurable.

What we've found is storing the full text in the index (a) makes the
index huge and (b) searching slow.  At the same time, extracting the
content from the source document is pretty slow, especially if it's not
a text document.  We've taken to caching the text content of structured
files, but we compress the files to make disk usage a little more
reasonable.  But finding the N terms in a potentially large document
tend to slow down searches quite a bit.

I've not noticed that it slows down searching. At work I have a 1.5 gb
index. No sweat. Extracting text from a source doc is only slow if it
is deep in a zip or tar.
Also you only want the first X hits.
At the moment we send back the complete text content of each hit with
every query. It's for the client to find the highlights then. I've not
had any problems with this. We do limit the size of the stored text
per doc to about 100k.

Cheers,
Jos
_______________________________________________
xdg mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/xdg

Re: simple search api (was Re: mimetype standardisation by testsets)

Reply via email to