Ok, I only have one segment right now, so I've got one of each of these:

.tis file: 730MB
.frq files: 9KB
.prx file: 26KB

If I'm understanding you (and Mike) properly, then even though it's
the prx file that contains the actual position info, you can't get to
that info quickly unless the tis file is also cached in RAM by the OS.

I have to admit I don't know that much about OS disk caching. Can I
more or less pretend that the OS uses a least recently used (LRU)
algorithm? I wonder if Windows (my current platform) and Linux and
friends (apparently most of you all's platforms) behave differently in
this respect.

Cheers,
Chris

> .tis and .frq is used to look up terms and what documents match those
> terms.  .prx files are used for the term positions in each document.

On Thu, Jul 3, 2008 at 3:21 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 3, 2008 at 6:04 PM, Chris Harris <[EMAIL PROTECTED]> wrote:
>> Now I gather that phrase queries are inherently slower than non-phrase
>> queries, but 1-3 orders of magnitude difference seems noteworthy.
>
> Phrase queries could be a couple times slower, but normally not to the
> degree you show here.
>
> The most likely factor is that phrase queries need to look at term
> positions, and those are in a different part of the index that may not
> be cached by the OS (esp if phrase queries are rare in your system).
> You may not even have enough system RAM free to allow caching
> positions also.
>
> Check your index and look at the total size of the .tis files, the
> .frq files, and the .prx files.
> .tis and .frq is used to look up terms and what documents match those
> terms.  .prx files are used for the term positions in each document.

Reply via email to