I'm trying to use a Solr query to find the next title in alphabetical order after a given string. The issue I'm facing is that the sort param seems to sort non-alphanumeric characters in a different order from the ordering used by a range filter in the q or fq param. I can't filter the non-alphanumeric characters out because they're integral to the data and it would not be a useful ordering if it were based only on the alphanumeric portion of the strings.

I'm running Solr version 3.5.

In my current approach, I have a field that is a unique string for each document:

<fieldType name="lowerCaseSort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>

<field name="uniqueSortString" type="lowerCaseSort" indexed="true" stored="true"/>

I'm passing the value for the current document in a range to query everything after the current string, sorted ascending:

/select?fl=uniqueSortString&sort=uniqueSortString+asc&q=uniqueSortString:["$1+ZX+Spectrum+HOBETA+format+file"+TO+*]&wt=xml&rows=5&version=2.2

In theory, I expect the first result to be the current item and the second result to be the next one. However, I'm finding that the sort and the range filter seem to use different ordering:

<result name="response" numFound="448" start="0">
<doc>
<str name="uniqueSortString">$1 ZX Spectrum - Emulator</str>
</doc>
<doc>
<str name="uniqueSortString">$1 ZX Spectrum HOBETA format file</str>
</doc>
<doc>
<str name="uniqueSortString">$1 ZX Spectrum Hobetta Picture Format</str>
</doc>
<doc>
<str name="uniqueSortString">$? TR-DOS ZX Spectrum file in HOBETA format</str>
</doc>
<doc>
<str name="uniqueSortString">$A AutoCAD Autosave File ( Autodesk Inc.)</str>
</doc>
</result>

Based on the results ordering, sort believes - precedes H, but the range filter should have excluded that first result if it ordered in the same way. Digging through the code, I think it looks like sorting uses String.compareTo() for ordering on a text/string field. However I haven't been able to track down where the range filter code is. If someone can point me in the right direction to find that code I'd love to look through it. Or, if anyone has suggestions regarding a different approach or changes I can make to this query/field, that would be very helpful.

Thanks for your time.
-Cat Bieber

Reply via email to