On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas <c...@experienceon.com> wrote: > Hello Michael, > > Yes, we are pre-sorting the documents before adding them to the index. We > have a score associated to every document (not an IR score but a > document-related score that reflects its "importance"). Therefore, the > document with the biggest score will have the lowest docid (we add it first > to the index). We do this in order to apply early termination effectively. > With the actual coded, we haven't seen much of a difference in terms of > space when we have the index sorted vs not sorted.
I wouldn't expect that you will see space savings when you sort this way. The techniques I was mentioning involve sorting documents by other factors instead (such as grouping related documents from the same website together: idea being they probably share many of the same terms): this hopefully creates smaller document deltas that require less bits to represent. -- lucidimagination.com