On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas
<c...@experienceon.com> wrote:
> Hello Michael,
>
> Yes, we are pre-sorting the documents before adding them to the index. We
> have a score associated to every document (not an IR score but a
> document-related score that reflects its "importance"). Therefore, the
> document with the biggest score will have the lowest docid (we add it first
> to the index). We do this in order to apply early termination effectively.
> With the actual coded, we haven't seen much of a difference in terms of
> space when we have the index sorted vs not sorted.

I wouldn't expect that you will see space savings when you sort this way.

The techniques I was mentioning involve sorting documents by other
factors instead (such as grouping related documents from the same
website together: idea being they probably share many of the same
terms): this hopefully creates smaller document deltas that require
less bits to represent.

-- 
lucidimagination.com

Reply via email to