Erik Hatcher wrote:

On Aug 19, 2009, at 2:45 PM, Paul Rosen wrote:
You can see the problem here (at least until it's fixed!): http://nines.performantsoftware.com/search/saved?user=paul&name=poem

Hi Paul - that project looks familiar!  :)

Hi Erik! I should hope so! And I've gone a year without having to delve into solr much since it has just plain worked.

Thanks for the speedy reply.

I'm surprised you're not seeing an exception when trying to sort on title given this configuration. Sorting must be done on single valued indexed fields, that have at most a single term indexed per document. I recommend you use copyField to copy title to title_sort and configure a title_sort field as a "string" or a field type that analyzes only to a single term (like simply keyword tokenizing -> lower case filter.

    Erik

I want to double check this (since you probably remember how long it takes to recreate the indexes). I think you're saying to add these two lines, then re-index:

<field name="title_sort" type="string" indexed="true" stored="true"/>
<copyField source="title" dest="title_sort"/>

Now, this is case-sensitive, right? So would this make it case-insensitive?

<fieldtype name="sort_string"class="solr.StrField" sortMissingLast="true">
  <analyzer>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldtype>
<field name="title_sort" type="sort_string" indexed="true" stored="true"/>
<copyField source="title" dest="title_sort"/>

Also, I'm guessing from seeing the current results that this wouldn't collate the characters with diacritical marks correctly. Is there a way to indicate that, for instance, A-grave would sort next to A?

And, while I'm on the subject, I have to do the same thing with the Author field, but unfortunately, that is sometimes "First Last" and sometimes "Last, First". Is there any way to sort those by last name, or do I just have to encourage the index people to be more consistent?

I can think of a fairly simple algorithm, but am not sure where to implement it:

- if the word "and" or "&" appears, just look at the left side of the field (in other words, sort by the first name that appears.) - if there is a comma, but it is part of ", jr." or some other common suffixes like that, ignore it. - otherwise, if there is no comma, sort by the last word, unless it is "jr", "sr", "III", etc., then sort by the word before that.
- otherwise, sort by the first word.

That would get most of the cases.

Thanks,
Paul

Reply via email to