The reason I'm interested in this is kind of unique. I'm writing a custom
query parser and search component. These components go over the search
results and perform some calculation over it. This calculation depends on
input sorted by a certain value. In this scenario, regular solr sorting is
insufficient as it's performed in post-search, and only collects needed rows
to satisfy the query. The alternative for naturally sorted  index is to sort
all the docs myself, and I wish to avoid this. I use docValues extensively,
it really is a great help.

Erick, I've tried using SortingMergePolicyFactory. It brings me close to my
goal, but it's not quite there. The problem with this approach is that while
each segment is sorted by itself there might be overlapping in ranges
between the segments. For example, lets say that some query results lay in
segments A, B, and C. Each one of the segments is sorted, so the docs coming
from segment A will be sorted in the range 0-50, docs coming from segment B
will be sorted in the range 20-70, and segment C will hold values in the
50-90 range. The query result will be 0-50,20-70, 50-90. Almost sorted, but
not quite there. 

A helpful detail about my data is that the fields I'm interested in sorting
the index by is a timestamp. Docs are indexed more or less in the correct
order. As a result, if the merge policy I'm using will merge only
consecutive segments, it should satisfy my need. TieredMergePolicy does
merge non-consecutive segments so it's clearly a bad fit. I'm hoping to get
some insight about some additional steps I may take so that 
SortingMergePolicyFactory could achieve perfection. 

Thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to