What _is_ happening? Please provide examples of the inputs and outputs that don’t work for you. ‘cause the sort order should be “nothing comes before something" so sorting ascending on a keywordtokenizer+lowecasefilter should give you exactly what you’re asking for with no need for a length field.
Best, Erick > On Mar 25, 2020, at 11:07 AM, matthew sporleder <msporle...@gmail.com> wrote: > > My original goal was to avoid indexing the string length because I > wanted edge ngram to "score" based on how "exact" the match was: > > q=abc > "abc" has a high score > "abcd" has a lower score > "abcde" has an even lower score > > You say sorting by by the original field will do that but in practice > it is not happening so I am probably missing something. > > I *am* getting a close version of what I said above with sorting on > the length, which I added to the index. > > searching for my keyword-lowercase field:abc* + sorting by length is > also working so maybe I can skip the edge ngram field entirely and > just do that but I was hoping the trade some disk space for > performance. This field will get queried a lot. > > > On Wed, Mar 25, 2020 at 10:39 AM Erick Erickson <erickerick...@gmail.com> > wrote: >> >> Why do you want to deal with score at all? Sorting >> overrides score-based sorting. Well, unless you >> specify score as a secondary sort. But since you’re >> sorting by length anyway, trying to score >> based on proximity to the end does nothing. >> >> The weirdness you’re going to get here, though, is >> that the order of the results will not be alphabetical. >> Say you have two docs, one with abcd and one with >> abce. Now say you search on abc. Whether abcd or >> abce comes first is indeterminant. >> >> If you simply stored the keyword-lowercased value >> in a copyfield and sorted on _that_, you wouldn’t have >> this problem. But if you’re really worried about space, >> that might not be an option. >> >> Best, >> Erick >> >>> On Mar 25, 2020, at 9:49 AM, matthew sporleder <msporle...@gmail.com> wrote: >>> >>> Where I landed: >>> >>> <fieldType name="string_ci" class="solr.TextField" >>> sortMissingLast="true" omitNorms="false"> >>> <analyzer> >>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory" /> >>> </analyzer> >>> </fieldType> >>> >>> <fieldType name="edgytext" class="solr.TextField" >>> positionIncrementGap="100"> >>> <analyzer type="index"> >>> <filter class="solr.LowerCaseFilterFactory" /> >>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" >>> maxGramSize="25" /> >>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>> </analyzer> >>> <analyzer type="query"> >>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> </analyzer> >>> </fieldType> >>> >>> >>> <field name="slug" type="string_ci" indexed="true" stored="true" >>> multiValued="false" /> >>> <field name="fayt" type="edgytext" indexed="true" stored="false" >>> omitNorms="false" omitTermFreqAndPositions="false" multiValued="true" >>> /> >>> <field name="qt_len" type="int" indexed="true" stored="true" >>> multiValued="false" /> >>> >>> --- >>> >>> I can then do a search for >>> >>> q=fayt:my_article_slu&sort=qt_len asc >>> >>> to get the shortest/most exact find-as-you-type match. I couldn't get >>> around all results having the same score (can I boost proximity to the >>> end of a string?) in the edge ngram search but I am hoping this is the >>> fastest way to do this type of search since I can avoid wildcards >>> "my_article_slu*" and stuff. >>> >>> More suggestions welcome and thanks for the help. I will re-index >>> with omitNorms=true again to see if I can save a little space. >>> >>> >>> >>> >>> >>> On Tue, Mar 24, 2020 at 11:39 AM matthew sporleder <msporle...@gmail.com> >>> wrote: >>>> >>>> Okay I appreciate you responding. >>>> >>>> Switching "slug" from "string_ci" class="solr.StrField" accomplished >>>> about the same results, which makes sense to me now :) >>>> >>>> The previous definition of string_ci was: >>>> <fieldType name="string_ci" class="solr.TextField" >>>> sortMissingLast="true" omitNorms="true"> >>>> <analyzer> >>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>> <filter class="solr.LowerCaseFilterFactory" /> >>>> </analyzer> >>>> </fieldType> >>>> >>>> So lowercase + KeywordTokenizerFactory; >>>> >>>> I am trying again with omitNorms=false to see if I can get the more >>>> "exact" matches to score better this time around. >>>> >>>> >>>> On Tue, Mar 24, 2020 at 9:54 AM Erick Erickson <erickerick...@gmail.com> >>>> wrote: >>>>> >>>>> Won’t work. String types are totally unanalyzed. Your string_ci fieldType >>>>> is what I was looking for. >>>>> >>>>> No, you shouldn’t kill the lowercasefilter unless you want all of your >>>>> searches will then be case-sensitive. >>>>> >>>>> So you should try: >>>>> >>>>> q=edgy_text:whatever&sort=string_ci asc >>>>> >>>>> Please use the admin>>pick_core>>analysis page when thinking about >>>>> changing your schema, it’ll answer a _lot_ of these questions immediately. >>>>> >>>>> Best, >>>>> Erick >>>>> >>>>>> On Mar 24, 2020, at 8:37 AM, matthew sporleder <msporle...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Oh maybe a schema bug! >>>>>> >>>>>> my string_ci: >>>>>> <fieldType name="string_ci" class="solr.TextField" >>>>>> sortMissingLast="true" omitNorms="true"> >>>>>> <analyzer> >>>>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>>>> <filter class="solr.LowerCaseFilterFactory" /> >>>>>> </analyzer> >>>>>> </fieldType> >>>>>> >>>>>> going to try this instead: >>>>>> <fieldType name="string_lctoken" class="solr.StrField" >>>>>> sortMissingLast="true" omitNorms="true"> >>>>>> <analyzer> >>>>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>>>> <filter class="solr.LowerCaseFilterFactory" /> >>>>>> </analyzer> >>>>>> </fieldType> >>>>>> >>>>>> Then I can probably kill the lowercasefilter on edgeytext: >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 24, 2020 at 7:44 AM Erick Erickson <erickerick...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> Sort by the full field. You’ll need to copy to a field with >>>>>>> keywordTokenizer and lowercaseFilter (string_ci? assuming it’s not >>>>>>> really a :”string”) type. >>>>>>> >>>>>>> Best, >>>>>>> Erick >>>>>>> >>>>>>>> On Mar 24, 2020, at 7:10 AM, matthew sporleder <msporle...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I have added an edge ngram field to my index and get decent results >>>>>>>> with partial words but the results appear randomly sorted and all >>>>>>>> contain the same score. Ideally I would like to sort by shortest >>>>>>>> ngram match within my other qualifiers. >>>>>>>> >>>>>>>> Is there a canonical solution to this? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Matt >>>>>>>> >>>>>>>> p.s. I mostly followed >>>>>>>> https://lucidworks.com/post/auto-suggest-from-popular-queries-using-edgengrams/ >>>>>>>> >>>>>>>> schema bits: >>>>>>>> >>>>>>>> <fieldType name="edgytext" class="solr.TextField" >>>>>>>> positionIncrementGap="100"> >>>>>>>> <analyzer type="index"> >>>>>>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" >>>>>>>> maxGramSize="25" /> >>>>>>>> </analyzer> >>>>>>>> >>>>>>>> <field name="slug" type="string_ci" indexed="true" stored="true" >>>>>>>> multiValued="false" /> >>>>>>>> >>>>>>>> <field name="fayt" type="edgytext" indexed="true" stored="false" >>>>>>>> omitNorms="false" omitTermFreqAndPositions="true" multiValued="true" >>>>>>>> /> >>>>>>>> >>>>>>>> >>>>>>>> <copyField source="slug" dest="fayt" maxChars="65" /> >>>>>>> >>>>> >>