I'm trying to use a Solr query to find the next title in alphabetical
order after a given string. The issue I'm facing is that the sort param
seems to sort non-alphanumeric characters in a different order from the
ordering used by a range filter in the q or fq param. I can't filter the
non-alphanumeric characters out because they're integral to the data and
it would not be a useful ordering if it were based only on the
alphanumeric portion of the strings.
I'm running Solr version 3.5.
In my current approach, I have a field that is a unique string for each
document:
<fieldType name="lowerCaseSort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>
<field name="uniqueSortString" type="lowerCaseSort" indexed="true"
stored="true"/>
I'm passing the value for the current document in a range to query
everything after the current string, sorted ascending:
/select?fl=uniqueSortString&sort=uniqueSortString+asc&q=uniqueSortString:["$1+ZX+Spectrum+HOBETA+format+file"+TO+*]&wt=xml&rows=5&version=2.2
In theory, I expect the first result to be the current item and the
second result to be the next one. However, I'm finding that the sort and
the range filter seem to use different ordering:
<result name="response" numFound="448" start="0">
<doc>
<str name="uniqueSortString">$1 ZX Spectrum - Emulator</str>
</doc>
<doc>
<str name="uniqueSortString">$1 ZX Spectrum HOBETA format file</str>
</doc>
<doc>
<str name="uniqueSortString">$1 ZX Spectrum Hobetta Picture Format</str>
</doc>
<doc>
<str name="uniqueSortString">$? TR-DOS ZX Spectrum file in HOBETA
format</str>
</doc>
<doc>
<str name="uniqueSortString">$A AutoCAD Autosave File ( Autodesk Inc.)</str>
</doc>
</result>
Based on the results ordering, sort believes - precedes H, but the range
filter should have excluded that first result if it ordered in the same
way. Digging through the code, I think it looks like sorting uses
String.compareTo() for ordering on a text/string field. However I
haven't been able to track down where the range filter code is. If
someone can point me in the right direction to find that code I'd love
to look through it. Or, if anyone has suggestions regarding a different
approach or changes I can make to this query/field, that would be very
helpful.
Thanks for your time.
-Cat Bieber