You were right about finding only the Wednesday occurrences at the beginning of the line. But attached (if it works) is a screen capture of my admin UI. But unlike your suspicion, the index text is being parsed properly, it appears. So I'm uncertain where this leads me.

Also attached is the pertinent schema.xml snippet you asked for.

The logtext column in my table contains merely keyboarded text, with the infrequent exception that I add a \uFFFC as a placeholder for images. So, should I be using something besides text_en as the fieldType?

Thanks,
Mark

On 9/21/2015 12:12 PM, Erick Erickson wrote:
bq: However, I discovered that if I search on "Wednesday*" (trailing
asterisk), then I get all the results containing Wednesday that I'm
looking for!

This almost always means you're not searching on the field you think
you're searching on and/or the field isn't being analyzed as you think
(i.e. the fieldType isn't what you expect). If you're really searching
on a fieldType of text_en (and you haven't changed the definition),
then there's something very weird here. FieldTypes are totally
mutable, they are composed of various analysis chains that you (or
someone else) can freely alter, so seeing the <field> definition that
references a type="text_en" is suggestive but not definitive.

I'm going to further guess that when you search on "Wednesday*", all
the matches are at the beginning of the line, and you find docs where
the field has "Wednesday, September...." but not "The party was on
Wednesday".

So let's see the <fieldType> associated with the logtext field. Plus,
the results of adding &debug=true to the query.

But you can get a lot of info a lot faster if you go to the admin UI
screen, select the proper core from the drop-down on the left sied and
go to the "analysis" section. Pick the field (or field type), enter
some text and hit analyze (or uncheck the "verbose" box, that's
largely uninteresting info at this level). That'll show you exactly
how the input document is parsed, exactly how the query is parsed etc.
And be sure to enter something like
"september first was a Wednesday" in the left-hand (index) box, then
just "Wednesday" in the right hand (query) side. My bet: You'll see on
the index side that the input is not broken up, not transformed, etc.

Best,
Erick

    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
	<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
	-->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
	<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
	-->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>
    

Reply via email to