Stopwords are commonly occurring words that don't add _much_ value to
search, such as the, an, a and are usually removed during analysis.
Protwords (protected words) are words that would be stemmed by the
English porter stemmer that you do not want to be stemmed.
In the end, removing stopwords may keep your index smaller and can
keep some queries from taking a long time, but they also mean you
can't query for those words. As for protwords, that is something you
would do if you felt the results for those tokens was "off".
Many people use stopwords, many don't. Personally, I don't think
removing them is the right thing to do, as there isn't always a way to
recover them and they do provide meaning, otherwise why would they be
needed in the language? Often, the best thing to do, is keep
stopwords, but handle them intelligently on the query side (in
phrases, etc.). However, since you're a beginner, it probably makes
sense to just throw out stopwords for now.
-Grant
On May 21, 2008, at 1:50 AM, Akeel wrote:
Hi,
I am a beginner to Solr, I have successfully indexed my db in solr.
I want
to know that what are the stopwords and protwords ??? and how much
they have
effect on my search results ?
Thanks in advance.
--
Akeel