After some debugging, it seems that the search works if the query is phrase search (i.e, enclosed in quotes)
http://localhost:8983/solr/filesearch/select?q=%22not%20to%20or%20be%22&debugQuery=true This works both in case of sow=true or false. Is it mandatory to use phrase search to properly pass the stopwords to the CommonGramsFilter? On Thu, Jan 4, 2018 at 6:08 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > Hi, > > I am looking at this documentation and wondering if it would be better to > optionally skip indexing of original stopwords. > > https://lucene.apache.org/solr/guide/6_6/filter-descriptions > .html#FilterDescriptions-CommonGramsFilter > > http://localhost:8983/solr/filesearch/select?q=not%20to% > 20or%20be&debugQuery=true > > > - parsedquery: "+(-DisjunctionMaxQuery((commongram_field2:to)~0.01) > DisjunctionMaxQuery((commongram_field2:be)~0.01))~1", > > > > Other parameters are: > > > - params: { > - mm: " 1<-0% ", > - q.alt: "*:*", > - ps: "100", > - echoParams: "all", > - sort: "score desc", > - rows: "35", > - version: "2.2", > - q: "not to or be", > - tie: "0.01", > - defType: "edismax", > - qf: "commongram_field2", > - sow: "false", > - wt: "json", > - debugQuery: "true" > } > > > And it doesn't match my document, which has following fields: > > > - id: "9191", > - commongram_field2: "not to or be", > > > > Commongram is defined as: > > <field name="commongram_field2" type="commongaram" indexed="true" > stored="true" omitPositions="false"/> > > <fieldType name="commongaram" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <charFilter class="org.apache.lucene.analysis.icu. > ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" preserveOriginal="0" > splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/> > <filter class="solr.FlattenGraphFilterFactory"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.CommonGramsFilterFactory" > words="stopwords.txt" ignoreCase="true"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.LimitTokenCountFilterFactory" > maxTokenCount="10000" consumeAllTokens="false"/> > <filter class="solr.LengthFilterFactory" min="1" max="255"/> > </analyzer> > <analyzer type="query"> > <charFilter class="org.apache.lucene.analysis.icu. > ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" preserveOriginal="0" > splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.CommonGramsFilterFactory" > words="stopwords.txt" ignoreCase="true"/> > <filter class="solr.LengthFilterFactory" min="1" max="255"/> > </analyzer> > </fieldType> > > > I am not sure what I am missing. I have also set sow=false so that the > whole query string is sent to field's analysis chain instead of sending > word by word. But that didnt' seem to help. > > Thanks > Nawab >