Hi Steven,

List of Stopwords of a language are not fixed, there is no single universal 
list of stop words used by all natural language processing tools .
Ideally stop words should be defined search merchandisers based on their domain 
instead of referring default.

https://en.wikipedia.org/wiki/Stop_words

You are allowed to add  lang/stopwords_<languagecode>.txt

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" 
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" expand="true" 
synonyms="synonyms.txt" ignoreCase="true"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" 
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishPossessiveFilterFactory"/>
      <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>

Regards
Srinivas Meenavalli

-----Original Message-----
From: Steven White [mailto:swhite4...@gmail.com]
Sent: Friday, August 26, 2016 4:02 AM
To: solr-user@lucene.apache.org
Subject: Default stopword list

Hi everyone,

I'm curious, the current "default" stopword list, for English and other 
languages, how was it determined?  And for English, why "I" is not in the 
stopword list?

Thanks in advanced.

Steve
Disclaimer: The contents of this e-mail and attachment(s) thereto are 
confidential and intended for the named recipient(s) only. It shall not attach 
any liability on the originator or Zensar Technologies Limited or its 
affiliates. Any views or opinions presented in this email are solely those of 
the author and may not necessarily reflect the opinions of Zensar Technologies 
Limited or its affiliates. Any form of reproduction, dissemination, copying, 
disclosure, modification, distribution and / or publication of this message 
without the prior written consent of the author of this e-mail is strictly 
prohibited. If you have received this email in error please delete it and 
notify the sender immediately. Before opening any mail and attachments please 
check them for viruses and defect. Zensar Technologies Ltd or its affiliate do 
not accept any liability for virus infected mails.

Reply via email to